All Questions
Tagged with deep-learning or neural-networks
9,985 questions
4
votes
0
answers
115
views
Difference between weight decay and L2 regularization
I'm reading [Ilya Loshchilov's work][1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and $L^2$ norm regularization are the same for SGD but they are ...
1
vote
1
answer
158
views
What is a neural network's depth? Does the input layer count?
Neural network depth = #hidden_layers [+1 [+1]] in different convincing sources.
Question A. There is no leading definition for "network depth"?
= "the number of hidden layers" +0
...
5
votes
2
answers
329
views
Backpropagating regularization term in variational autoencoders
Setup
The variational autoencoder (VAE) loss is given by the following (see here, for example):
$$L = - \sum_{j = 1}^J \frac{1}{2} \left(1 + \log (\sigma_i^2) - \sigma_i^2 - \mu_i^2 \right) - \frac{1}{...
1
vote
0
answers
44
views
Models with Large Per-Layer Dataflow (~10GB)
I'm looking for DNN models (either inference or training) where the data flowing from one layer to the next (i.e., intermediate activations passed between consecutive layers) is at least $2\text{ GB}$ ...
3
votes
1
answer
129
views
Maximum likelihood with regularization
Maximum likelihood estimators (subject to regularity conditions) have very nice asymptotic properties. However with high dimensional data you are unlikely to have sufficient observations for this ...
0
votes
0
answers
72
views
How to improve accuracy on a homemade plant classifier CNN?
I've got a CNN that I've written in Java. I've tested the code on MNIST and got 94% accuracy (not amazing, but evidence that the backpropagation works). However, I've now moved on to trying to get a ...
6
votes
0
answers
168
views
Time series predictions with LSTM
I have collection of TEC data.My data sample for example the day1,day2,day3,day4.
Case1:
I have the following task to do: Training by the consecutive 3 days to predict the each 4th day. Each day data ...
2
votes
0
answers
70
views
Spherical Point Picking with increased density at six octant vertices
Below is the result of a neural network inverse operation on a set of points - the problem was I trained it using uniform point picking on a sphere (via https://mathworld.wolfram.com/...
13
votes
2
answers
627
views
Understanding the Saddle Point Intuition in GANs
I was watching a talk by Tom Goldstien about his work on stabilizing GANs with predictions. He used an interesting visualization comparing SGD to adversarial nets. Intuitively, one is looking for the ...
1
vote
0
answers
60
views
Overfitting problem in classification CNN
So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82
I have been trying a lot of things and different way ...
0
votes
0
answers
42
views
Should I Include Post-Event Data During Training for Time-Series Prediction Models?
I’m working on a time-series prediction problem where my goal is to predict the occurrence of a complication for patients based on sequential data.
🔍 Current Approach:
I have sequential data for each ...
0
votes
0
answers
48
views
How to improve segregation of images in VAEs?
I have developed a VAE to understand if it is able to distinguish lung images of COVID-19, Normal images, or images with Viral Pneumonia. The VAE is composed of CNN encoder and CNN decoder (shown ...
0
votes
0
answers
35
views
Can we set different patience in early stopping criteria based on improvement in validation accuracy of baseline and pruned model
I am doing unstructured feature weight pruning in a CNN. First I trained a baseline model without pruning and stopping criteria based on improvement in the validation accuracy and set the patience 15 ...
2
votes
1
answer
87
views
Question about number of filters in layer in CNN
I have a question about something that is written in the 100 page machine learning book, which is free here:
https://themlbook.com/wiki/doku.php
It is regarding chapter 6 regarding neural networks.
...
0
votes
0
answers
65
views
What's the right way in handling target values for a regressor neural network?
I'm currently handling a particular dataset:
...
0
votes
0
answers
8
views
Why do my ViT and ResNet-18 models not exceed 77% accuracy despite data augmentation and hyperparameter tuning? [duplicate]
I am working on a dataset containing 7,839 images for a three-class classification task. I trained ViT (Vision Transformer) and ResNet-18, but both models achieve an accuracy between 73% and 77%, with ...
0
votes
0
answers
19
views
0
votes
0
answers
46
views
Mitigate the effect of multiple residual connections
I recently came across a question that I couldn't answer based on my prior knowledge (note: not a homework, I already graduated):
Suppose, we use more than one residual connections in a transformer ...
6
votes
4
answers
528
views
Is Bayesian analysis with MCMC a way of quantitative classification?
In my own words, Monte Carlo Markov Chain (MCMC) can be used to make the generation of a posterior distribution computationally accessible in case of many variables:
Given for example patient data ...
0
votes
0
answers
46
views
Ensemble Neural Network - Stacking ensemble neural network accuracy is significantly similar or low compared to base models
Context
I'm trying to create an Ensemble survival neural network with a custom loss function which consist of 3 base models, Random Survival Forest (RSF), Gradient Boosting Survival Model (GBSM) and a ...
3
votes
0
answers
145
views
Why do skip connections cause drastically smoother loss landscapes in neural networks?
I'm reading the paper "Visualizing the Loss Landscape of Neural Nets" by Hao Li et al. In this paper, the authors visualize loss landscapes of neural networks using filter-wise normalized ...
0
votes
1
answer
86
views
Normalizing Flows for univariate data
I am currently reading the literature on Normalizing Flow models and their ability as density estimators. It seems that the entire literature focuses on multivariate data sets and I was wondering ...
3
votes
3
answers
282
views
Why do skip-gram embeddings work?
I have a question about the skip-gram algorithm. For the question to make sense I will describe it. I probably won't desribe it perfect though. I will try to give the explanation that my book uses.
It ...
0
votes
0
answers
58
views
Test of AI model performance doing combined tumour detection and classification
I am working on a radiology AI project that aims to detect and classify lesions (aka abnormalities) in CT scans across multiple organs. The test set (n=200) is a mix of normal scans (50%) and abnormal ...
3
votes
1
answer
80
views
Do deep learning frameworks "look ahead" when calculating gradient in Nesterov optimization?
The whole point behind Nesterov optimization is to calculate the gradient not at the current parameter values $\theta_t$, but at $\theta_t + \beta m$, where $\beta$ is the momentum coefficient and $m$ ...
1
vote
1
answer
213
views
Prove DDPM with an optimal noise prediction network has correct posterior
I am reading through the DDPM paper, and I am trying to understand the following.
Imagine that $\epsilon_{\theta}(x_t,t)$ is our noise predictor. Further imagine that it is fully expressive, i.e., $\...
0
votes
0
answers
134
views
How to properly split train/val sets for time series LSTM prediction with multiple unique items?
I am working on a time series prediction problem using an LSTM model. My dataset consists of 27 different items, each with unique IDs, and roughly the same number of samples per item. There are around ...
1
vote
0
answers
74
views
Geospatial Temperature Interpolation DL Advice
I am currently working on a project which includes the interpolation of temperature values over a geographical area.
Essentially, I am given monthly measurements from X measuring stations scattered ...
3
votes
0
answers
113
views
Machine learning model for ranking that outputs probabilities
Traditionally ML algorithms for ranking take the features as input and then output a "relevance score" which do not have a natural probabilistic interpretation.
For example, suppose we have ...
0
votes
0
answers
45
views
Split a dataset into training and testset method
I have created a dataset for binary classification problems to train a neural network. The training data comes from a set pertaining to a specific environment such as a 2D map environment. For the ...
0
votes
0
answers
25
views
How many layers and neurons must I use [duplicate]
basically i have a project for x ray medical image that classify whether it is normal or not, so i got the data from kaggle as files and there were two folder one for the normal x ray images and ...
3
votes
1
answer
124
views
Check through calculations whether the gradients will explode or vanish
I'm reviewing old exam questions and came across this one:
Consider a regular MLP (multi-layer perceptron) architecture with 10 fully connected layers with ReLU activation function. The input to the ...
1
vote
0
answers
71
views
Implementing a Diffusion Generative Model for Data Augmentation but training loss values are too high
I tried implementing a Diffusion Generative Model to augment my EEG data. My EEG data has 7 channels and around 10,000 rows out of which I have used 3,000 rows for this model I have used my raw z-...
0
votes
0
answers
42
views
Why isn't InfoNCE the same as pairwise InfoNCE?
Let $x$, $x^+$, $x^-_1$ and $x^-_2$ be data points.
The likelihood of this data, i.e. the probability that out of $x^+$, $x^-_1$ and $x^-_2$, the closest match to $x$ is $x^+$ is given by:
$$ \frac{1}{...
0
votes
0
answers
58
views
Can't understand the evaluation approach used in this paper
In this paper, two deep learning models where proposed: Hybrid-AttUnet++ and EH-AttUnet++. The first model, Hybrid-AttUnet++, is simply a modified U-net model, and the second model is an ensemble ...
0
votes
0
answers
75
views
Likert type survey and deep learning
I am new to data. I have a question that I have researched but could not find a clear answer to. I have survey data that I have collected with various Likert types and I want to apply deep learning to ...
0
votes
0
answers
89
views
Analytically solving backpropagation through time for a simple gated RNN
Consider the following simple gated RNN:
\begin{aligned}
c_{t} &= \sigma\bigl(W_{c}\,x_{t} + W_{z}\,z_{t-1}\bigr)
\\[6pt]
z_{t} &= c_{t} \,\odot\, z_{t-1} \;\;+\;\;
(1 - c_{t}) \,\odot\,\...
2
votes
0
answers
55
views
In neural network diagrams, what symbols or nodes represent the 'concatenate' and 'split' operations?
I’m working on a neural network diagram and need to clearly represent the 'concatenate' and 'split' operations for tensors.
I’m not sure which node symbols or icons are typically used for these ...
1
vote
1
answer
80
views
What are the differences between Fully recurrent neural networks Hopfield networks and Elman networks?
I find it hard to understand the differences of the two. From what my understanding is that each hidden neuron in the FCRN influences all other neuron and itself while in the Hopfield it influences ...
1
vote
0
answers
50
views
How does layer normalization handle cases where features have very different scales?
I am currently learning about batch normalization and layer normalization, but I encountered some doubts regarding their implementation and potential implications.
In batch normalization, we normalize ...
1
vote
0
answers
62
views
Do weights update less towards the start of a neural network?
That is, because the error is coming from the end of the neural network (ie at the output layer) and trickles back via backpropagation to the start of the neural network, does that mean that the ...
0
votes
0
answers
35
views
How to get result mask from 1D-CSVM model?
I'm trying to build 1D-CSVM model which is a model for pixel-vise classification (aka way of segmentation) of hyperspectral images and is a combination of CSVM and 1D-CNN . In section D....
0
votes
2
answers
905
views
Which loss function is CLIP actually using?
I am studying CLIP for a while and I came up through the losses like InfoNCE but I actually did find any detail to study it in depth.
As much as I understand CLIP use Sparse CCE.
They take logits like ...
2
votes
2
answers
164
views
Why we don't mask other layers besides the multihead attention in transformers?
Typically when training for NLP tasks, we need to pad our sequences to a max_len, so they can be processed efficiently in a batch-wise manner. However, these padded ...
0
votes
0
answers
40
views
Injective vector-valued activation functions. Injective normalization?
I wish to use a vector-valued activation function in my neural network which is injective that is, maps inputs to unique outputs. I require this property to maintain injectivity of my neural network. ...
0
votes
0
answers
37
views
CNN Model is not learning after some epochs
I have implemented a object detection model from a research paper (code was included in github) and added some changes to it to create a new and better model for my master's thesis. To compare them I ...
0
votes
0
answers
72
views
Implementation of F1-score, IOU and Dice Score
This paper proposes a medical image segmentation hybrid CNN - Transformer model for segmenting organs and lesions in medical images simultaneously. Their model has two output branches, one to output ...
0
votes
0
answers
43
views
Time series forecast with dynamic input features
At the period $T$, I want to forecast the target variables $V_{T+1}, ..., V_{T+60}$. My independent variables are $X$ and $f_1, ..., f_{60}$. $f_i$ is actually a forecast of variable $f$ from the $i$ ...
0
votes
0
answers
76
views
How to Interpret a Non-Converging KL Loss in a Deep Learning-Based Stock Prediction Model?
I am building a stock prediction model using deep learning, and my loss function is defined as:
Loss = MSE + Rank Loss + KL Divergence Loss.
As shown in the training curves, the overall loss is ...
-1
votes
1
answer
116
views
How to troubleshoot loss not converging? [duplicate]
I am using a CNN model for a binary classification task, with a total training data of 24000 sampler (Positive to negative sample ratio: 1:10).
...