All Questions
Tagged with deep-learning or neural-networks
9,985 questions
1
vote
0
answers
85
views
Maxout activation function vs ReLU (Number of weights)
From what I understood, Maxout function works quite differently from ReLU.
ReLU function is max(0, x), so the input x is (W_T x + b)
Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
0
votes
0
answers
60
views
Advice on fine-tuning an email classifier for a Pharma company
I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model.
The model I'm using is Longformer (because it has ...
1
vote
0
answers
43
views
Why expectation term in VAE loss not implemented in practical?
According to VAE paper : https://arxiv.org/pdf/1906.02691 (eq 2.10, page 21)
VAE loss contain the expectation term. If I understand correctly, we need sample more than 1 result from an input x and ...
1
vote
1
answer
117
views
Should classical/traditional ML techniques such as polynomial regression/decision trees/random forests SIGNIFICANTLY outperform RNN in timeseries? [closed]
I have a dataset of numerous years of buoy wave height measurements including features such as measured significant wave height, numerical model predictions, peak wave period, mean wave period, and ...
2
votes
1
answer
218
views
Why does having a smaller set of weight help with generalization?
When I studied machine learning for the first time, I learned that we need to use l2 regularization to improve generalization. The reason is based on the polynomial regression experiment in Chris ...
5
votes
1
answer
176
views
What's the statistical historical precedence for generalisation beyond overfitting?
A recent work shows generalisation beyond overfitting for overparametrized systems [*]. Is there any precedence from statistics literature or is this a new phenomenon for deep learning?
[*] Grokking: ...
1
vote
0
answers
54
views
Model still overfits after hyperparameter tuning, dataset balancing and convolution layering
I am trying to classify either an image of 25x25 px stacked together as 50x25 px is the same(1) or different(0). I am using keras to create the NN layers. There are 10,000 instances of both 1s and 0s ...
1
vote
0
answers
42
views
Neural network for complex-valued data
I have some complex data, and I want to teach a neural network to perform certain operations on it.
Now, since Tensorflow does not allow complex inputs, I separated my data into real and imaginary ...
0
votes
1
answer
42
views
Neural network Logic Errors [duplicate]
I was working on a neural network project that uses this dataset; https://archive.ics.uci.edu/dataset/320/student+performance.
The data has two main types of data. It has binaries (0 or 1) and ...
0
votes
0
answers
46
views
Neural network with just one Dense layer
I have a model which works well with just one Dense layer. My model has an input layer, a Dense layer, and then one Reshape layer, which reshapes the output into the desired form. Normally, neural ...
5
votes
1
answer
687
views
Are epochs the same as data duplication?
Epochs, the number of times training is repeated on the original data, are absolutely necessary for neural networks where there are often many more parameters than original instances.
What is the ...
1
vote
0
answers
54
views
References Request -- Research in Computer Vision
Apologies if this is not the correct forum for this request, but I figured I'd give it a shot. I'm looking for research in computer vision on shape recognition -- any shape. Most of the computer ...
0
votes
1
answer
72
views
Confusion about neural network in stochastic control problem
I am a neural network newbie. I would like to attempt to implement the following architecture deep learning a stochastic control problem, taken from this paper. Here $s_t$, $a_t$, $c_t$ and $\xi_t$ ...
0
votes
0
answers
58
views
Using a different activation functions within a layer?
As an experiment, one could try to have n different activations/neurons/units in a layer.
One to adapt the automated backpropagation algorithms from deep learning ...
0
votes
0
answers
75
views
Designing a Neural Network to predict a player's input in a 2D Fighting Game using game state information
I am currently trying to design a NN with the goal of broadly imitating the behavior of certain players in a specific 2D fighting game.
The game in question records "replays" of each game ...
2
votes
2
answers
727
views
How does two tower model map to shared embedding space for two different type of entity?
A canonical example is say you have user and merchandise
user (feature: age, location....)
merchandise (feature: type, size, .....)
And you want to create embedding to map user and merchandise to same ...
1
vote
0
answers
90
views
Is it possible to explain regression or classification, interpolation and generation using a single model structure?
Neural network is established as an universal approximator of all machine learning models. Further, double descent phenomenon in a neural network propagates the journey of regression to interpolation ...
2
votes
0
answers
58
views
DNN quantile regression
I have a PyTorch model, the purpose of which is to predict quantiles over the output given an input. The output in this case is service time (minutes) for machine maintenance. The inputs detail ...
6
votes
2
answers
604
views
Hessian of the softmax function
Problem
Let $\mathbf{x} \in \mathbb{R}^n$ and $\mathbf{c} \in \mathbb{R}^n$, and consider a softmax function $\sigma: \mathbb{R}^n \to \mathbb{R}^n$
Find representation of the Hessian of $f=\mathbf{c}...
0
votes
0
answers
138
views
Should the target be standardized in gradient descent?
Suppose that we have a general loss function that depends on some parameters $w$ (e.g. neural network weights):
$$L_w =\frac{1}{N} \sum_i \ell(\hat{y}_i, y_i)$$
Is it beneficial to standardize the ...
4
votes
1
answer
159
views
Uncertainty of ANN outputs as distribution parameters
It is not an uncommon practice to train neural network models via negative log likelihood $-\mathcal{L}(x, y_{true}, \mu, \sigma)$ to estimate both a location ($\mu$) and a scale ($\sigma$), such that ...
1
vote
0
answers
41
views
Two variants of Nesterov Accelerated Gradient: are they equivalent?
I was puzzled to find that the description of the Nesterov Accelerated Gradient on Paperswithcode, namely:
$v_t = \beta * v_{t-1} \color{red}{+} \eta * ∇ J(\theta \color{red}{-} \beta * v_{t-1})$
$\...
2
votes
1
answer
92
views
Why Does the Sigmoid Output Layer in a Binary Feedforward Neural Network Represent the Probability of the Positive Class (Label = 1)?
I'm a beginner who just started to study deep learning. I recently learned that in a feedforward neural network with a binary output and a Bernoulli distribution, the output of the sigmoid function ...
1
vote
0
answers
43
views
Use latest data in training a timeseries model
I am training a global timeseries deep learning model.
I have split the data for training, validation(to select the best hyperparameters), and test(to test on out of sample data).
There are only 3 ...
1
vote
0
answers
48
views
Calculate gradient with chain rule using additions [closed]
I am taking Karpathy's course, specifically I am on the first video. There is a step in the development of micrograd that I don't fully understand. Specifically in this section, when he talks about ...
1
vote
1
answer
110
views
NER With Custom Tags, How to Approach
I am building a "field tagger" for documents. Basically, a document, in my case something like a proposal or sales quote, would have a bunch of entities scattered throughout it, and we want ...
0
votes
0
answers
86
views
Augmenting data for LSTM
The problem:
I have a datset with monthly economic indicators alongside monthly stock price, containing 434 total observations.
I have attempted to fit an LSTM onto the data, but it seems to ...
1
vote
0
answers
47
views
Why is the threshold term incorporated into the weight vector in linear classifiers?
In the context of linear classifiers, such as the perceptron or logistic regression, I understand that the decision boundary is defined by a linear combination of input features and weights, plus a ...
1
vote
0
answers
27
views
energy efficiency and time for a SIMD broadcast systolic data flow deep neural network
Im trying to understand why memfetch is multiplied by km(L+1)/8 instead of NMACS which is 8 and also what is meant by systolic clk increment
Consider a fully connected layer. Let
• X ∈ RK×L real 32bit ...
2
votes
1
answer
69
views
What would be the convolutional layer output by keras.layers.Conv2D when conv output is fractional?
I have input ($n=224$), strides ($s=4$), filter size ($k=11$) and no padding which gives me a fractional conv output:
$$\texttt{conv output} = (n-k+2p)/s + 1 = 54....
1
vote
0
answers
60
views
Output from Model A as Training Data into Model B
Not sure this is the right place to ask this question, but I'm having a disagreement with a colleague on this idea.
Let's say we have a dataset comprised of "unclean" strings. The end goal ...
5
votes
1
answer
469
views
What is the best epoch to evaluate the test images?
I created a training, a validation and a test set for an image classification task. Then, I did training using the training and did evaluation on validation set. So, the next step is to evaluate the ...
2
votes
2
answers
78
views
how long should i run a training to realize how well an NN model is doing?
Suppose I am manually tuning the hyperparameters of an NN model.
How many epochs of training should I run at a minimum to realize that the model won't give me the desired accuracy I need before ...
1
vote
0
answers
46
views
Modeling for a data set that has different number of factors for each row (not binomial) [closed]
The modeling issue I'm having is that the categorical variable for each row has different number of factors. If I can reshape the data by products (a,b,c,.....~cost, hoursum, numPod, numDate), so that ...
1
vote
1
answer
101
views
Weight initialisation for neural networks - should they be different for each observations or the same?
I am implementing myself a Neural Network with feedforward and backpropagation with gradient descent to understand better how things work.
After setting up the entire algorithm, I still have a huge ...
2
votes
1
answer
105
views
The meaning of linear transformation in a batch norm revisited
I'm reading BatchNorm Wikipedia page, where they explain that BatchNorm.
I think the actual formulas are easier than words in this case. The norm statistics are calculated as:
$$\large{\displaystyle \...
1
vote
1
answer
116
views
How can different models based on different sets of predictors be combined to significantly improve the model performance?
I have two machine learning models for predicting some continuous variable $y$, say $y=f_1(X_1, \theta_1)$ and $y=f_2(X_2, \theta_2)$, and these models are of the same type (ANN). $X_1$ and $X_2$ ...
1
vote
1
answer
110
views
NeRF vs mesh for text-to-3d generation
There seem to be multiple aproaches to generating 3d objects from text prompt. What's confusing is that some of them are generating NeRFs (https://arxiv.org/pdf/2308.16512), other's are generating ...
-1
votes
1
answer
59
views
How does a neural network differentiate between a neuron that outputs 0 and a dropped-out one?
How does a network differentiate between a neuron with output 0 and a dropped-out neuron (this neuron might output a non-zero value but due to dropout it outputs 0)?
1
vote
1
answer
116
views
How can I learn and remove the linear trend in the residuals against the true response values generated by an ordinary neural network?
I built a neural network using PyTorch to predict y (a continuous variable) based on X consisting of m (=20) features. I found that the residuals (y_predicted – y_true) for the test data set show a ...
1
vote
0
answers
46
views
Are these generated from my code the so called feature maps?
I assume that the way people build which activations detect specific pieces from an image is by executing the network and extracting the results at each layer; when the output is from a convolutional ...
1
vote
0
answers
113
views
Why doesn't Kaiming/He weight Initialization seek a 50/50 compromise for forward and backward pass?
Sorry, please let me know if I'm off, but it seems that He initialization aims to either maintain a constant variance through the forward pass or through the backward pass.
It seems the idea is that, ...
1
vote
0
answers
44
views
Neural networks with uncertainties in training data
I have used Flax to train a neural network to fit a model to some data. All of the data points have a known uncertainty, as in each row has both a value and an uncertainty. (To be more explicit: the ...
2
votes
0
answers
95
views
derivative of Logistic Regression (sigmoid) [closed]
I am having difficulty figuring out, why I get different answer from the professor. we are tasked with finding the deriative of the logistic regression cost function with the sigmoid function:
$$ L(w│...
1
vote
1
answer
102
views
(Multivariate) anomaly detection of (redundant) sensor data
I’m currently working on my master thesis and I’m looking for some inputs for the following situation:
I have data of 2-20 sensors all measuring the same variable at 1-3 different locations in 15mins-...
1
vote
0
answers
46
views
Getting accurate Uncertainty from MFVI?
I wanted to know if there has been any research on methods to improve the accuracy of Mean-Field Variantional Inference (which doesn't discard the mean-field approximation). Apparently it is known to ...
0
votes
0
answers
51
views
Why Do AR-NN Models Have Tighter Confidence Intervals Compared to Linear AR Models?
I have conducted a forecast for the following data series using different autoregressive models: Intercept-only, AR1, AR2, ARIMA BIC, ARIMA AIC, and AR-NN. Using the point forecasts, the AR1 model is ...
4
votes
1
answer
254
views
Choosing Between Intercept-Only and AR-NN Models: Justified to not use the model with the lowest RMSE/MAE?
I have created two autoregressive models for forecasting: a basic intercept-only model and an AR-NN (autoregressive neural network) model. Both models show similar performance based on recursive one-...
3
votes
1
answer
134
views
What probability distribution is learned in this specific case? [duplicate]
I keep reading papers and blogposts where the training of a neural network is defined as learning some underlying probability distribution of the data.
Imagine that you write CNN that outputs whether ...
1
vote
0
answers
63
views
Can normalizing flows approximate bounded distributions in deep learning?
I’m exploring the use of normalizing flows in deep learning for generative modeling and I have a specific requirement: my target distributions are bounded (for example, between 0 and 1). I understand ...