Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
0 answers
40 views

Suppose a hypernetwork $\mathcal{H}$ takes a latent variable $z \sim p_z(z)$, where $p_z$ is Gaussian, and outputs the parameters of another neural network $f$. In particular, each weight $w_i$ of $f$ ...
rando's user avatar
  • 360
0 votes
0 answers
19 views

I'm reading the Deep Learning book by Goodfellow, Bengio, and Courville (Chapter 8 section 8.7.1 on Batch Normalization, page 315). The authors use a simple example of a deep linear network without ...
spierenb's user avatar
1 vote
0 answers
24 views

I really want to play around with RNNs. Trying to build an AI assistant with RNNs to run on my machine as I'm always obsessed with RNNs model... To make the performance good, I think I need to do some ...
jupyter's user avatar
  • 111
0 votes
0 answers
9 views

What are other optimization tools that are powerful enough to improve the accuracy performance of the neural network model? Please give me recent tools that are powerful
bbadyalina's user avatar
0 votes
0 answers
17 views

I’m using an ensemble of M = 5 deep neural networks, each evaluated with T = 100 Monte Carlo dropout samples at test time to estimate predictive uncertainty. The model performs binary classification (...
Solomon123's user avatar
0 votes
0 answers
26 views

I’m working on a video classification task with a long-tailed dataset where a few classes have many samples while most classes have very few. More specifically, my dataset has around 9k samples and 3....
Olivia's user avatar
  • 191
1 vote
0 answers
26 views

The paper "Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition" makes the following claim (top of page 4): It is clear that smoothness is not the right ...
Chris's user avatar
  • 322
2 votes
0 answers
25 views

In the paper "Deep Residual Learning for Image Recognition", it's been mentioned that "When deeper networks are able to start converging, a degradation problem has been exposed: with ...
Vignesh N's user avatar
0 votes
0 answers
24 views

I am new to GAIN (generative adversarial imputation network). I am trying to use GAIN to impute missing values. I have a quesiton about the values of the losses for the discriminator. Are the values ...
JonathonSoong's user avatar
0 votes
0 answers
40 views

A key element in Bayesian neural networks is finding the probability of a set of weights, so that it can be applied to Bayes rule. I cannot think of many ways of doing this, for P(w) (also sometimes ...
user494234's user avatar
1 vote
1 answer
58 views

The Google Deepmind paper "Weight Uncertainty in Neural Networks" features the following algorithm: Note that the $\frac{∂f(w,θ)}{∂w}$ term of the gradients for the mean and standard ...
user494234's user avatar
1 vote
1 answer
113 views

From the above, I am trying to derive the below: However, I do not see why the $q_\theta(w)$ has been omitted from $\log p(D)$, in equation 17 and 18. Here is my attempt to derive the above: $$\begin{...
user494234's user avatar
3 votes
0 answers
60 views

I am modelling the the sequence $\{(a_t,y_t)\}_t$ as follows: $$ \begin{cases} Y_{t+1} &= g_\nu(X_{t+1}) + \alpha V_{t+1}\\ X_{t+1} &= X_t + \mu_\xi(a_t) + \sigma_\psi(a_t)Z_{t+1}\\ X_0 &= ...
Uomond's user avatar
  • 51
0 votes
0 answers
65 views

Basically, the question above: in RL, people typically encode the state as a tensor consisting of a plane with "channels", i.e. original Alpha Zero paper. These channels are typically one-...
FriendlyLagrangian's user avatar
0 votes
0 answers
38 views

I am currently learning about flow matching models and wanted to test whether or not we could train a flow matching model on just two time steps 0 and 0.5 and sampling at only those two time steps to ...
Bill Wang's user avatar
1 vote
0 answers
77 views

I have video, audio, and text data. The intent is to use the multimodal for binary classification. However, the data is not paired (i.e The audio and text are not from the same video recording). I've ...
myts999's user avatar
  • 13
0 votes
0 answers
26 views

Stochastic-Weight-Averaging (SWA) claims that deep learning MLE points in "flatter loss regions" improve generalization to holdout data. This is a famous paper in deep learning with 2000+ ...
profPlum's user avatar
  • 593
0 votes
0 answers
47 views

An interesting question I stumbled upon today is this: Suppose I train models $m_1, m_2, m_3, \ldots, m_N$, where each $m_i$, $i = 1, \ldots, N$ is associated with a hyperparameter $i$. All models are ...
Your neighbor Todorovich's user avatar
1 vote
1 answer
136 views

This question is inspired by a blog post by https://www.argmin.net/p/in-defense-of-typing-monkeys and several rumors I've heard from other people who works in machine learning. The gist of it is that ...
Your neighbor Todorovich's user avatar
0 votes
0 answers
34 views

I'm training an LSTM to predict a binary anomaly sequence from multi-dimensional, irregularly sampled input sequences. While CNNs perform adequately, I'm struggling to get good performance from my ...
klobaska soslaninou's user avatar
2 votes
0 answers
67 views

I’m working with a dataset of streetlights, where each row represents a streetlight. Each streetlight has a type (LED, Incandescent, Unknown), an address, and a street name. I am trying to predict ...
setty's user avatar
  • 161
1 vote
0 answers
32 views

I'm developing an AI for a 1v1 game. I have already programmed a system for generating these rewards. Currently, I have some heuristics and am using linear weights tuned with a genetic algorithm to ...
vbxr's user avatar
  • 11
4 votes
1 answer
92 views

We were discussing universal approximation theorems for neural networks and showed that the triangular function $$ h(x) = \begin{cases} x+1, & x \in [-1,0] \\ 1-x, & x \in [0,1] \\ 0, & \...
CharComplexity's user avatar
1 vote
0 answers
29 views

I am building a model to classify short videos of a person doing sign language (1-2 seconds, 30 fps, 512x512) into n labels I find that no matter which model I use (transformers, 3D CNN, ...) or ...
Jimmy's user avatar
  • 11
0 votes
1 answer
97 views

I’ve seen many retrieval-augmented generation (RAG) pipelines return highly relevant context chunks — and yet fail catastrophically on multi-hop reasoning. For example, even when the source document ...
PSBigBig's user avatar
1 vote
0 answers
93 views

My question is regarding the paradigm of deep learning, I do not get where does the cost functions come from? For example for a classification task are we treating the encoder as the expected value of ...
Kavalali's user avatar
  • 373
0 votes
0 answers
69 views

I'm researching the statistical convergence properties of a recursive system that arises during the training of custom neural network structure. My specific question is: How can I prove convergence of ...
Guillaume's user avatar
2 votes
0 answers
35 views

I am working on a project where I am doing Unsupervised Anomaly Detection on employee expenses on HCP transfer Of Value. I am trying to use Graph Neural Network to detect anomalies with proper ...
Sanket Maiti's user avatar
1 vote
0 answers
30 views

First and foremost, I am looking for a practical answer to the simplest test case, sketched below. In general I would also be interested in any motivated, rational heuristics on the optimal layout for ...
Smerdjakov's user avatar
0 votes
0 answers
83 views

I’ve seen some tutorials and papers that count the number of layers in a way that I find a bit confusing. For example, consider a model like this: ...
aliiiiiiiiiiiiiiiiiiiii's user avatar
0 votes
0 answers
59 views

I want to know if the following problem has a name and also I'd like to get some papers to read on the subject. Suppose I have a model to learn, say $A$ and this has a huge numbers of parameters to ...
user8469759's user avatar
3 votes
1 answer
129 views

I am in regression task and I consider a vanilla MultiLayerPerceptron $f_{\theta} : \mathcal{X} \rightarrow \mathbb{R}$ with non polynomial activation functions and the last layer is just a linear ...
arthur_elbrdn's user avatar
0 votes
0 answers
44 views

I'm trying to wrap my head around the problem of same-sign gradients when using sigmoid activation function in a deep neural network. The problem emerges from the fact that sigmoid can only be ...
John's user avatar
  • 1
1 vote
1 answer
84 views

Are VAEs considered explainable AI? To me, they are because the latent variables are interpretable, e.g, you change one and you might see its effects on the head rotation (for a dataset of faces, for ...
Link's user avatar
  • 63
1 vote
0 answers
37 views

Lets say that I want to train a network where the input is an image of a small part of an object. For eg: image of a building with corners and some part of exterior walls and some part of roof. I want ...
user146290's user avatar
1 vote
0 answers
59 views

I am training a neural network using R-Torch for a regression problem. My dataset has 22 features, and I currently have a neural network composed of one hidden layer and one output layer. My question ...
Adverse Effect's user avatar
0 votes
0 answers
19 views

I have a custom database made of bw superficial defects images. The database is quite far from classical CV Dataset, like CIFAR or ImageNet. I know form supervised Deep Learning that the correct ...
Jonny_92's user avatar
  • 161
3 votes
1 answer
132 views

I have trouble understanding the minimization of the KL divergence. In this link https://www.ibm.com/think/topics/variational-autoencoder They say, "One obstacle to using KL divergence for ...
Link's user avatar
  • 63
0 votes
0 answers
38 views

Let $\mathbf{x}_k\in \mathrm{R}^{n\times 1}$ be an $n$-dimensional input to multi-layer perceptron(MLP) at time $t = k$. The output is $\mathbf{x}_{k+1}\in \mathrm{R}^{n\times 1}$ at time $t = k+1$. ...
user146290's user avatar
4 votes
1 answer
89 views

In an LSTM(regression), the output gate is defined as: $$o_t = \sigma\left(W_o x_t + U_o h_{t-1} + b_o \right),$$ where: $W_o \in \mathbb{R}^{m \times d}$ is the input weight matrix, $U_o \in \mathbb{...
Marie's user avatar
  • 135
3 votes
2 answers
125 views

I will use the answer here as an example: https://stats.stackexchange.com/a/370732/78063 It says "which means that you choose a number of time steps $N$, and unroll your network so that it ...
Baron Yugovich's user avatar
0 votes
0 answers
41 views

Batch Normalization (BN) is a technique to accelerate the convergence when training neural networks. It is also assumed to act as a regularizer, since the the mean and standard deviation are ...
Antonios Sarikas's user avatar
1 vote
1 answer
59 views

I have an LSTM model to predict a variable by considering multiple variables. (Say the target variable is river discharge and the predictors are rainfall, temperature, evapotranspiration etc.) There ...
DWijesena's user avatar
4 votes
1 answer
89 views

I was going through the algorithm for Stochastic Gradient decent in mulilayer network from the book Machine Learning by Tom Mitchell, and it shows the formulae for weight update rule. However, I dont ...
Machine123's user avatar
1 vote
1 answer
46 views

I'm implementing a FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) optimizer in PyTorch for training neural networks with sparse regularization. My implementation doesn't seem to be working as ...
Maxou's user avatar
  • 21
10 votes
3 answers
2k views

Consider a neural network with 2 or more layers. After we update the weights in layer 1, the input to layer 2 ($a^{(1)}$) has changed, so ∂z/∂w is no longer correct, as z has changed to z* and z* $\...
Yaron's user avatar
  • 109
-1 votes
1 answer
74 views

What does "arg min" stand for in the following? $$c^*= \arg \min_c \|x-g(c)\|_2 \tag{2.54}$$
RSStepheni's user avatar
1 vote
1 answer
98 views

Setting: I'm training a neural network for classification purposes. This neural network leverages a transformer-based architecture and leverages PU-learning. PU-learning is a setting where you solely ...
Fred's user avatar
  • 31
4 votes
1 answer
218 views

In standard machine learning settings with cross-sectional data, it's common to assume that data points are independently and identically distributed (i.i.d.) from some fixed data-generating process (...
spie227's user avatar
  • 242
0 votes
0 answers
55 views

I’m testing a new random data augmentation technique in a neural network. There are two main sources of randomness: Network initialization and training (e.g., random parameter initialization, ...
desert_ranger's user avatar

1
2 3 4 5
200