Newest 'deep-learning' Questions - Page 2

4 votes

0 answers

115 views

Difference between weight decay and L2 regularization

I'm reading [Ilya Loshchilov's work][1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and $L^2$ norm regularization are the same for SGD but they are ...

Danny Wen

323

asked Apr 6 at 0:43

1 vote

1 answer

158 views

What is a neural network's depth? Does the input layer count?

Neural network depth = #hidden_layers [+1 [+1]] in different convincing sources. Question A. There is no leading definition for "network depth"? = "the number of hidden layers" +0 ...

Convexity

111

asked Apr 1 at 9:04

5 votes

2 answers

329 views

Backpropagating regularization term in variational autoencoders

Setup The variational autoencoder (VAE) loss is given by the following (see here, for example): $$L = - \sum_{j = 1}^J \frac{1}{2} \left(1 + \log (\sigma_i^2) - \sigma_i^2 - \mu_i^2 \right) - \frac{1}{...

Physics Enthusiast

153

asked Apr 1 at 4:35

1 vote

0 answers

44 views

Models with Large Per-Layer Dataflow (~10GB)

I'm looking for DNN models (either inference or training) where the data flowing from one layer to the next (i.e., intermediate activations passed between consecutive layers) is at least $2\text{ GB}$ ...

Abhishek Ghosh

111

asked Apr 1 at 3:20

3 votes

1 answer

129 views

Maximum likelihood with regularization

Maximum likelihood estimators (subject to regularity conditions) have very nice asymptotic properties. However with high dimensional data you are unlikely to have sufficient observations for this ...

UserB1234

147

asked Mar 31 at 15:33

0 votes

0 answers

72 views

How to improve accuracy on a homemade plant classifier CNN?

I've got a CNN that I've written in Java. I've tested the code on MNIST and got 94% accuracy (not amazing, but evidence that the backpropagation works). However, I've now moved on to trying to get a ...

Alistair58

1

asked Mar 30 at 21:07

6 votes

0 answers

168 views

Time series predictions with LSTM

I have collection of TEC data.My data sample for example the day1,day2,day3,day4. Case1: I have the following task to do: Training by the consecutive 3 days to predict the each 4th day. Each day data ...

S. M.

33

asked Mar 30 at 3:37

2 votes

0 answers

70 views

Spherical Point Picking with increased density at six octant vertices

Below is the result of a neural network inverse operation on a set of points - the problem was I trained it using uniform point picking on a sphere (via https://mathworld.wolfram.com/...

Konchog

121

asked Mar 29 at 11:34

13 votes

2 answers

627 views

Understanding the Saddle Point Intuition in GANs

I was watching a talk by Tom Goldstien about his work on stabilizing GANs with predictions. He used an interesting visualization comparing SGD to adversarial nets. Intuitively, one is looking for the ...

Danny Wen

323

asked Mar 27 at 21:58

1 vote

0 answers

60 views

Overfitting problem in classification CNN

So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82 I have been trying a lot of things and different way ...

erodrigu

11

asked Mar 26 at 21:11

0 votes

0 answers

42 views

Should I Include Post-Event Data During Training for Time-Series Prediction Models?

I’m working on a time-series prediction problem where my goal is to predict the occurrence of a complication for patients based on sequential data. 🔍 Current Approach: I have sequential data for each ...

Farzad X

1

asked Mar 25 at 11:40

0 votes

0 answers

48 views

How to improve segregation of images in VAEs?

I have developed a VAE to understand if it is able to distinguish lung images of COVID-19, Normal images, or images with Viral Pneumonia. The VAE is composed of CNN encoder and CNN decoder (shown ...

lrod1994

75

asked Mar 24 at 13:51

0 votes

0 answers

35 views

Can we set different patience in early stopping criteria based on improvement in validation accuracy of baseline and pruned model

I am doing unstructured feature weight pruning in a CNN. First I trained a baseline model without pruning and stopping criteria based on improvement in the validation accuracy and set the patience 15 ...

Syed Dildar Shah

1

asked Mar 23 at 14:03

2 votes

1 answer

87 views

Question about number of filters in layer in CNN

I have a question about something that is written in the 100 page machine learning book, which is free here: https://themlbook.com/wiki/doku.php It is regarding chapter 6 regarding neural networks. ...

user394334

296

asked Mar 22 at 20:44

0 votes

0 answers

65 views

What's the right way in handling target values for a regressor neural network?

I'm currently handling a particular dataset: ...

skiddyboi

1

asked Mar 22 at 15:33

0 votes

0 answers

8 views

Why do my ViT and ResNet-18 models not exceed 77% accuracy despite data augmentation and hyperparameter tuning? [duplicate]

I am working on a dataset containing 7,839 images for a three-class classification task. I trained ViT (Vision Transformer) and ResNet-18, but both models achieve an accuracy between 73% and 77%, with ...

Anis

1

asked Mar 15 at 20:20

0 votes

0 answers

19 views

Why does my deep reinforcement learning not converge at all? [duplicate]

...

waterbrother

1

asked Mar 14 at 2:31

0 votes

0 answers

46 views

Mitigate the effect of multiple residual connections

I recently came across a question that I couldn't answer based on my prior knowledge (note: not a homework, I already graduated): Suppose, we use more than one residual connections in a transformer ...

Green 绿色

201

asked Mar 13 at 2:25

6 votes

4 answers

528 views

Is Bayesian analysis with MCMC a way of quantitative classification?

In my own words, Monte Carlo Markov Chain (MCMC) can be used to make the generation of a posterior distribution computationally accessible in case of many variables: Given for example patient data ...

NicolasBourbaki

161

asked Mar 10 at 16:59

0 votes

0 answers

46 views

Ensemble Neural Network - Stacking ensemble neural network accuracy is significantly similar or low compared to base models

Context I'm trying to create an Ensemble survival neural network with a custom loss function which consist of 3 base models, Random Survival Forest (RSF), Gradient Boosting Survival Model (GBSM) and a ...

Yugan Gogul Muthukumar

1

asked Mar 7 at 7:41

3 votes

0 answers

145 views

Why do skip connections cause drastically smoother loss landscapes in neural networks?

I'm reading the paper "Visualizing the Loss Landscape of Neural Nets" by Hao Li et al. In this paper, the authors visualize loss landscapes of neural networks using filter-wise normalized ...

Danny Wen

323

asked Mar 6 at 20:37

0 votes

1 answer

86 views

Normalizing Flows for univariate data

I am currently reading the literature on Normalizing Flow models and their ability as density estimators. It seems that the entire literature focuses on multivariate data sets and I was wondering ...

Icetime

1

asked Mar 4 at 15:16

3 votes

3 answers

282 views

Why do skip-gram embeddings work?

I have a question about the skip-gram algorithm. For the question to make sense I will describe it. I probably won't desribe it perfect though. I will try to give the explanation that my book uses. It ...

user394334

296

asked Feb 25 at 12:35

0 votes

0 answers

58 views

Test of AI model performance doing combined tumour detection and classification

I am working on a radiology AI project that aims to detect and classify lesions (aka abnormalities) in CT scans across multiple organs. The test set (n=200) is a mix of normal scans (50%) and abnormal ...

Maelstorm

286

asked Feb 23 at 16:02

3 votes

1 answer

80 views

Do deep learning frameworks "look ahead" when calculating gradient in Nesterov optimization?

The whole point behind Nesterov optimization is to calculate the gradient not at the current parameter values $\theta_t$, but at $\theta_t + \beta m$, where $\beta$ is the momentum coefficient and $m$ ...

Antonios Sarikas

881

asked Feb 22 at 22:20

1 vote

1 answer

213 views

Prove DDPM with an optimal noise prediction network has correct posterior

I am reading through the DDPM paper, and I am trying to understand the following. Imagine that $\epsilon_{\theta}(x_t,t)$ is our noise predictor. Further imagine that it is fully expressive, i.e., $\...

Max

11

asked Feb 18 at 5:21

0 votes

0 answers

134 views

How to properly split train/val sets for time series LSTM prediction with multiple unique items?

I am working on a time series prediction problem using an LSTM model. My dataset consists of 27 different items, each with unique IDs, and roughly the same number of samples per item. There are around ...

Rai

43

asked Feb 17 at 15:47

1 vote

0 answers

74 views

Geospatial Temperature Interpolation DL Advice

I am currently working on a project which includes the interpolation of temperature values over a geographical area. Essentially, I am given monthly measurements from X measuring stations scattered ...

Donald M.

21

asked Feb 13 at 15:56

3 votes

0 answers

113 views

Machine learning model for ranking that outputs probabilities

Traditionally ML algorithms for ranking take the features as input and then output a "relevance score" which do not have a natural probabilistic interpretation. For example, suppose we have ...

Ishigami

155

asked Feb 12 at 17:48

0 votes

0 answers

45 views

Split a dataset into training and testset method

I have created a dataset for binary classification problems to train a neural network. The training data comes from a set pertaining to a specific environment such as a 2D map environment. For the ...

Encipher

185

asked Feb 10 at 19:17

0 votes

0 answers

25 views

How many layers and neurons must I use [duplicate]

basically i have a project for x ray medical image that classify whether it is normal or not, so i got the data from kaggle as files and there were two folder one for the normal x ray images and ...

Khalid Alnhdy

1

asked Feb 7 at 20:09

3 votes

1 answer

124 views

Check through calculations whether the gradients will explode or vanish

I'm reviewing old exam questions and came across this one: Consider a regular MLP (multi-layer perceptron) architecture with 10 fully connected layers with ReLU activation function. The input to the ...

Aleksander Wojsz

133

asked Feb 5 at 18:19

1 vote

0 answers

71 views

Implementing a Diffusion Generative Model for Data Augmentation but training loss values are too high

I tried implementing a Diffusion Generative Model to augment my EEG data. My EEG data has 7 channels and around 10,000 rows out of which I have used 3,000 rows for this model I have used my raw z-...

Nachiket Gersappa

asked Feb 4 at 16:56

0 votes

0 answers

42 views

Why isn't InfoNCE the same as pairwise InfoNCE?

Let $x$, $x^+$, $x^-_1$ and $x^-_2$ be data points. The likelihood of this data, i.e. the probability that out of $x^+$, $x^-_1$ and $x^-_2$, the closest match to $x$ is $x^+$ is given by: $$ \frac{1}{...

user357269

443

asked Feb 3 at 17:35

0 votes

0 answers

58 views

Can't understand the evaluation approach used in this paper

In this paper, two deep learning models where proposed: Hybrid-AttUnet++ and EH-AttUnet++. The first model, Hybrid-AttUnet++, is simply a modified U-net model, and the second model is an ensemble ...

AAA_11

1

asked Jan 31 at 11:24

0 votes

0 answers

75 views

Likert type survey and deep learning

I am new to data. I have a question that I have researched but could not find a clear answer to. I have survey data that I have collected with various Likert types and I want to apply deep learning to ...

gizem Boztaş

1

asked Jan 31 at 9:31

0 votes

0 answers

89 views

Analytically solving backpropagation through time for a simple gated RNN

Consider the following simple gated RNN: \begin{aligned} c_{t} &= \sigma\bigl(W_{c}\,x_{t} + W_{z}\,z_{t-1}\bigr) \\[6pt] z_{t} &= c_{t} \,\odot\, z_{t-1} \;\;+\;\; (1 - c_{t}) \,\odot\,\...

kuzzooroo

181

asked Jan 31 at 3:12

2 votes

0 answers

55 views

In neural network diagrams, what symbols or nodes represent the 'concatenate' and 'split' operations?

I’m working on a neural network diagram and need to clearly represent the 'concatenate' and 'split' operations for tensors. I’m not sure which node symbols or icons are typically used for these ...

landings

141

asked Jan 30 at 11:22

1 vote

1 answer

80 views

What are the differences between Fully recurrent neural networks Hopfield networks and Elman networks?

I find it hard to understand the differences of the two. From what my understanding is that each hidden neuron in the FCRN influences all other neuron and itself while in the Hopfield it influences ...

IKNv99

111

asked Jan 26 at 13:33

1 vote

0 answers

50 views

How does layer normalization handle cases where features have very different scales?

I am currently learning about batch normalization and layer normalization, but I encountered some doubts regarding their implementation and potential implications. In batch normalization, we normalize ...

duckprop

11

asked Jan 24 at 17:42

1 vote

0 answers

62 views

Do weights update less towards the start of a neural network?

That is, because the error is coming from the end of the neural network (ie at the output layer) and trickles back via backpropagation to the start of the neural network, does that mean that the ...

Null Six

11

asked Jan 22 at 18:10

0 votes

0 answers

35 views

How to get result mask from 1D-CSVM model?

I'm trying to build 1D-CSVM model which is a model for pixel-vise classification (aka way of segmentation) of hyperspectral images and is a combination of CSVM and 1D-CNN . In section D....

Alex

1

asked Jan 21 at 17:31

0 votes

2 answers

905 views

Which loss function is CLIP actually using?

I am studying CLIP for a while and I came up through the losses like InfoNCE but I actually did find any detail to study it in depth. As much as I understand CLIP use Sparse CCE. They take logits like ...

zahid

9

asked Jan 21 at 6:10

2 votes

2 answers

164 views

Why we don't mask other layers besides the multihead attention in transformers?

Typically when training for NLP tasks, we need to pad our sequences to a max_len, so they can be processed efficiently in a batch-wise manner. However, these padded ...

Antonios Sarikas

881

asked Jan 19 at 18:58

0 votes

0 answers

40 views

Injective vector-valued activation functions. Injective normalization?

I wish to use a vector-valued activation function in my neural network which is injective that is, maps inputs to unique outputs. I require this property to maintain injectivity of my neural network. ...

Atharva

1

asked Jan 19 at 17:56

0 votes

0 answers

37 views

CNN Model is not learning after some epochs

I have implemented a object detection model from a research paper (code was included in github) and added some changes to it to create a new and better model for my master's thesis. To compare them I ...

Esi

1

asked Jan 17 at 5:06

0 votes

0 answers

72 views

Implementation of F1-score, IOU and Dice Score

This paper proposes a medical image segmentation hybrid CNN - Transformer model for segmenting organs and lesions in medical images simultaneously. Their model has two output branches, one to output ...

Ahmed Mohamed

97

asked Jan 16 at 9:19

0 votes

0 answers

43 views

Time series forecast with dynamic input features

At the period $T$, I want to forecast the target variables $V_{T+1}, ..., V_{T+60}$. My independent variables are $X$ and $f_1, ..., f_{60}$. $f_i$ is actually a forecast of variable $f$ from the $i$ ...

Junior MIP

101

asked Jan 15 at 3:24

0 votes

0 answers

76 views

How to Interpret a Non-Converging KL Loss in a Deep Learning-Based Stock Prediction Model?

I am building a stock prediction model using deep learning, and my loss function is defined as: Loss = MSE + Rank Loss + KL Divergence Loss. As shown in the training curves, the overall loss is ...

Geon Shin

1

asked Jan 13 at 16:38

-1 votes

1 answer

116 views

How to troubleshoot loss not converging? [duplicate]

I am using a CNN model for a binary classification task, with a total training data of 24000 sampler (Positive to negative sample ratio: 1:10). ...

Yuhua Wei

9

asked Jan 9 at 7:27

All Questions