Skip to main content

All Questions

Filter by
Sorted by
Tagged with
4 votes
0 answers
115 views

I'm reading [Ilya Loshchilov's work][1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and $L^2$ norm regularization are the same for SGD but they are ...
Danny Wen's user avatar
  • 323
1 vote
1 answer
158 views

Neural network depth = #hidden_layers [+1 [+1]] in different convincing sources. Question A. There is no leading definition for "network depth"? = "the number of hidden layers" +0 ...
Convexity's user avatar
  • 111
5 votes
2 answers
329 views

Setup The variational autoencoder (VAE) loss is given by the following (see here, for example): $$L = - \sum_{j = 1}^J \frac{1}{2} \left(1 + \log (\sigma_i^2) - \sigma_i^2 - \mu_i^2 \right) - \frac{1}{...
Physics Enthusiast's user avatar
1 vote
0 answers
44 views

I'm looking for DNN models (either inference or training) where the data flowing from one layer to the next (i.e., intermediate activations passed between consecutive layers) is at least $2\text{ GB}$ ...
Abhishek Ghosh's user avatar
3 votes
1 answer
129 views

Maximum likelihood estimators (subject to regularity conditions) have very nice asymptotic properties. However with high dimensional data you are unlikely to have sufficient observations for this ...
UserB1234's user avatar
  • 147
0 votes
0 answers
72 views

I've got a CNN that I've written in Java. I've tested the code on MNIST and got 94% accuracy (not amazing, but evidence that the backpropagation works). However, I've now moved on to trying to get a ...
Alistair58's user avatar
6 votes
0 answers
168 views

I have collection of TEC data.My data sample for example the day1,day2,day3,day4. Case1: I have the following task to do: Training by the consecutive 3 days to predict the each 4th day. Each day data ...
S. M.'s user avatar
  • 33
2 votes
0 answers
70 views

Below is the result of a neural network inverse operation on a set of points - the problem was I trained it using uniform point picking on a sphere (via https://mathworld.wolfram.com/...
Konchog's user avatar
  • 121
13 votes
2 answers
627 views

I was watching a talk by Tom Goldstien about his work on stabilizing GANs with predictions. He used an interesting visualization comparing SGD to adversarial nets. Intuitively, one is looking for the ...
Danny Wen's user avatar
  • 323
1 vote
0 answers
60 views

So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82 I have been trying a lot of things and different way ...
erodrigu's user avatar
0 votes
0 answers
42 views

I’m working on a time-series prediction problem where my goal is to predict the occurrence of a complication for patients based on sequential data. 🔍 Current Approach: I have sequential data for each ...
Farzad X's user avatar
0 votes
0 answers
48 views

I have developed a VAE to understand if it is able to distinguish lung images of COVID-19, Normal images, or images with Viral Pneumonia. The VAE is composed of CNN encoder and CNN decoder (shown ...
lrod1994's user avatar
0 votes
0 answers
35 views

I am doing unstructured feature weight pruning in a CNN. First I trained a baseline model without pruning and stopping criteria based on improvement in the validation accuracy and set the patience 15 ...
Syed Dildar Shah's user avatar
2 votes
1 answer
87 views

I have a question about something that is written in the 100 page machine learning book, which is free here: https://themlbook.com/wiki/doku.php It is regarding chapter 6 regarding neural networks. ...
user394334's user avatar
0 votes
0 answers
65 views

I'm currently handling a particular dataset: ...
skiddyboi's user avatar
0 votes
0 answers
8 views

I am working on a dataset containing 7,839 images for a three-class classification task. I trained ViT (Vision Transformer) and ResNet-18, but both models achieve an accuracy between 73% and 77%, with ...
Anis's user avatar
  • 1
0 votes
0 answers
46 views

I recently came across a question that I couldn't answer based on my prior knowledge (note: not a homework, I already graduated): Suppose, we use more than one residual connections in a transformer ...
Green 绿色's user avatar
6 votes
4 answers
528 views

In my own words, Monte Carlo Markov Chain (MCMC) can be used to make the generation of a posterior distribution computationally accessible in case of many variables: Given for example patient data ...
NicolasBourbaki's user avatar
0 votes
0 answers
46 views

Context I'm trying to create an Ensemble survival neural network with a custom loss function which consist of 3 base models, Random Survival Forest (RSF), Gradient Boosting Survival Model (GBSM) and a ...
Yugan Gogul Muthukumar's user avatar
3 votes
0 answers
145 views

I'm reading the paper "Visualizing the Loss Landscape of Neural Nets" by Hao Li et al. In this paper, the authors visualize loss landscapes of neural networks using filter-wise normalized ...
Danny Wen's user avatar
  • 323
0 votes
1 answer
86 views

I am currently reading the literature on Normalizing Flow models and their ability as density estimators. It seems that the entire literature focuses on multivariate data sets and I was wondering ...
Icetime's user avatar
3 votes
3 answers
282 views

I have a question about the skip-gram algorithm. For the question to make sense I will describe it. I probably won't desribe it perfect though. I will try to give the explanation that my book uses. It ...
user394334's user avatar
0 votes
0 answers
58 views

I am working on a radiology AI project that aims to detect and classify lesions (aka abnormalities) in CT scans across multiple organs. The test set (n=200) is a mix of normal scans (50%) and abnormal ...
Maelstorm's user avatar
  • 286
3 votes
1 answer
80 views

The whole point behind Nesterov optimization is to calculate the gradient not at the current parameter values $\theta_t$, but at $\theta_t + \beta m$, where $\beta$ is the momentum coefficient and $m$ ...
Antonios Sarikas's user avatar
1 vote
1 answer
213 views

I am reading through the DDPM paper, and I am trying to understand the following. Imagine that $\epsilon_{\theta}(x_t,t)$ is our noise predictor. Further imagine that it is fully expressive, i.e., $\...
Max's user avatar
  • 11
0 votes
0 answers
134 views

I am working on a time series prediction problem using an LSTM model. My dataset consists of 27 different items, each with unique IDs, and roughly the same number of samples per item. There are around ...
Rai's user avatar
  • 43
1 vote
0 answers
74 views

I am currently working on a project which includes the interpolation of temperature values over a geographical area. Essentially, I am given monthly measurements from X measuring stations scattered ...
Donald M.'s user avatar
3 votes
0 answers
113 views

Traditionally ML algorithms for ranking take the features as input and then output a "relevance score" which do not have a natural probabilistic interpretation. For example, suppose we have ...
Ishigami's user avatar
  • 155
0 votes
0 answers
45 views

I have created a dataset for binary classification problems to train a neural network. The training data comes from a set pertaining to a specific environment such as a 2D map environment. For the ...
Encipher's user avatar
  • 185
0 votes
0 answers
25 views

basically i have a project for x ray medical image that classify whether it is normal or not, so i got the data from kaggle as files and there were two folder one for the normal x ray images and ...
Khalid Alnhdy's user avatar
3 votes
1 answer
124 views

I'm reviewing old exam questions and came across this one: Consider a regular MLP (multi-layer perceptron) architecture with 10 fully connected layers with ReLU activation function. The input to the ...
Aleksander Wojsz's user avatar
1 vote
0 answers
71 views

I tried implementing a Diffusion Generative Model to augment my EEG data. My EEG data has 7 channels and around 10,000 rows out of which I have used 3,000 rows for this model I have used my raw z-...
user avatar
0 votes
0 answers
42 views

Let $x$, $x^+$, $x^-_1$ and $x^-_2$ be data points. The likelihood of this data, i.e. the probability that out of $x^+$, $x^-_1$ and $x^-_2$, the closest match to $x$ is $x^+$ is given by: $$ \frac{1}{...
user357269's user avatar
0 votes
0 answers
58 views

In this paper, two deep learning models where proposed: Hybrid-AttUnet++ and EH-AttUnet++. The first model, Hybrid-AttUnet++, is simply a modified U-net model, and the second model is an ensemble ...
AAA_11's user avatar
  • 1
0 votes
0 answers
75 views

I am new to data. I have a question that I have researched but could not find a clear answer to. I have survey data that I have collected with various Likert types and I want to apply deep learning to ...
gizem Boztaş's user avatar
0 votes
0 answers
89 views

Consider the following simple gated RNN: \begin{aligned} c_{t} &= \sigma\bigl(W_{c}\,x_{t} + W_{z}\,z_{t-1}\bigr) \\[6pt] z_{t} &= c_{t} \,\odot\, z_{t-1} \;\;+\;\; (1 - c_{t}) \,\odot\,\...
kuzzooroo's user avatar
  • 181
2 votes
0 answers
55 views

I’m working on a neural network diagram and need to clearly represent the 'concatenate' and 'split' operations for tensors. I’m not sure which node symbols or icons are typically used for these ...
landings's user avatar
  • 141
1 vote
1 answer
80 views

I find it hard to understand the differences of the two. From what my understanding is that each hidden neuron in the FCRN influences all other neuron and itself while in the Hopfield it influences ...
IKNv99's user avatar
  • 111
1 vote
0 answers
50 views

I am currently learning about batch normalization and layer normalization, but I encountered some doubts regarding their implementation and potential implications. In batch normalization, we normalize ...
duckprop's user avatar
1 vote
0 answers
62 views

That is, because the error is coming from the end of the neural network (ie at the output layer) and trickles back via backpropagation to the start of the neural network, does that mean that the ...
Null Six's user avatar
0 votes
0 answers
35 views

I'm trying to build 1D-CSVM model which is a model for pixel-vise classification (aka way of segmentation) of hyperspectral images and is a combination of CSVM and 1D-CNN . In section D....
Alex's user avatar
  • 1
0 votes
2 answers
905 views

I am studying CLIP for a while and I came up through the losses like InfoNCE but I actually did find any detail to study it in depth. As much as I understand CLIP use Sparse CCE. They take logits like ...
zahid's user avatar
  • 9
2 votes
2 answers
164 views

Typically when training for NLP tasks, we need to pad our sequences to a max_len, so they can be processed efficiently in a batch-wise manner. However, these padded ...
Antonios Sarikas's user avatar
0 votes
0 answers
40 views

I wish to use a vector-valued activation function in my neural network which is injective that is, maps inputs to unique outputs. I require this property to maintain injectivity of my neural network. ...
Atharva's user avatar
0 votes
0 answers
37 views

I have implemented a object detection model from a research paper (code was included in github) and added some changes to it to create a new and better model for my master's thesis. To compare them I ...
Esi's user avatar
  • 1
0 votes
0 answers
72 views

This paper proposes a medical image segmentation hybrid CNN - Transformer model for segmenting organs and lesions in medical images simultaneously. Their model has two output branches, one to output ...
Ahmed Mohamed's user avatar
0 votes
0 answers
43 views

At the period $T$, I want to forecast the target variables $V_{T+1}, ..., V_{T+60}$. My independent variables are $X$ and $f_1, ..., f_{60}$. $f_i$ is actually a forecast of variable $f$ from the $i$ ...
Junior MIP's user avatar
0 votes
0 answers
76 views

I am building a stock prediction model using deep learning, and my loss function is defined as: Loss = MSE + Rank Loss + KL Divergence Loss. As shown in the training curves, the overall loss is ...
Geon Shin's user avatar
-1 votes
1 answer
116 views

I am using a CNN model for a binary classification task, with a total training data of 24000 sampler (Positive to negative sample ratio: 1:10). ...
Yuhua Wei's user avatar