Newest 'deep-learning' Questions - Page 4

1 vote

0 answers

85 views

Maxout activation function vs ReLU (Number of weights)

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...

kite

11

asked Sep 29, 2024 at 4:23

0 votes

0 answers

60 views

Advice on fine-tuning an email classifier for a Pharma company

I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model. The model I'm using is Longformer (because it has ...

Bhashwar Sengupta

1

asked Sep 24, 2024 at 16:03

1 vote

0 answers

43 views

Why expectation term in VAE loss not implemented in practical?

According to VAE paper : https://arxiv.org/pdf/1906.02691 (eq 2.10, page 21) VAE loss contain the expectation term. If I understand correctly, we need sample more than 1 result from an input x and ...

Manh

125

asked Sep 22, 2024 at 18:52

1 vote

1 answer

117 views

Should classical/traditional ML techniques such as polynomial regression/decision trees/random forests SIGNIFICANTLY outperform RNN in timeseries? [closed]

I have a dataset of numerous years of buoy wave height measurements including features such as measured significant wave height, numerical model predictions, peak wave period, mean wave period, and ...

Donald M.

21

asked Sep 11, 2024 at 5:32

2 votes

1 answer

218 views

Why does having a smaller set of weight help with generalization?

When I studied machine learning for the first time, I learned that we need to use l2 regularization to improve generalization. The reason is based on the polynomial regression experiment in Chris ...

Fraïssé

1,708

asked Sep 9, 2024 at 2:07

5 votes

1 answer

176 views

What's the statistical historical precedence for generalisation beyond overfitting?

A recent work shows generalisation beyond overfitting for overparametrized systems [*]. Is there any precedence from statistics literature or is this a new phenomenon for deep learning? [*] Grokking: ...

patagonicus

2,789

asked Sep 7, 2024 at 23:15

1 vote

0 answers

54 views

Model still overfits after hyperparameter tuning, dataset balancing and convolution layering

I am trying to classify either an image of 25x25 px stacked together as 50x25 px is the same(1) or different(0). I am using keras to create the NN layers. There are 10,000 instances of both 1s and 0s ...

Squish

111

asked Sep 3, 2024 at 0:30

1 vote

0 answers

42 views

Neural network for complex-valued data

I have some complex data, and I want to teach a neural network to perform certain operations on it. Now, since Tensorflow does not allow complex inputs, I separated my data into real and imaginary ...

rand1

111

asked Aug 29, 2024 at 11:56

0 votes

1 answer

42 views

Neural network Logic Errors [duplicate]

I was working on a neural network project that uses this dataset; https://archive.ics.uci.edu/dataset/320/student+performance. The data has two main types of data. It has binaries (0 or 1) and ...

Matthew Gregg

1

asked Aug 22, 2024 at 18:36

0 votes

0 answers

46 views

Neural network with just one Dense layer

I have a model which works well with just one Dense layer. My model has an input layer, a Dense layer, and then one Reshape layer, which reshapes the output into the desired form. Normally, neural ...

rand1

111

asked Aug 21, 2024 at 12:17

5 votes

1 answer

687 views

Are epochs the same as data duplication?

Epochs, the number of times training is repeated on the original data, are absolutely necessary for neural networks where there are often many more parameters than original instances. What is the ...

Mitch

2,099

asked Aug 19, 2024 at 14:12

1 vote

0 answers

54 views

References Request -- Research in Computer Vision

Apologies if this is not the correct forum for this request, but I figured I'd give it a shot. I'm looking for research in computer vision on shape recognition -- any shape. Most of the computer ...

luddite

141

asked Aug 15, 2024 at 16:32

0 votes

1 answer

72 views

Confusion about neural network in stochastic control problem

I am a neural network newbie. I would like to attempt to implement the following architecture deep learning a stochastic control problem, taken from this paper. Here $s_t$, $a_t$, $c_t$ and $\xi_t$ ...

Anthony

542

asked Aug 15, 2024 at 14:19

0 votes

0 answers

58 views

Using a different activation functions within a layer?

As an experiment, one could try to have n different activations/neurons/units in a layer. One to adapt the automated backpropagation algorithms from deep learning ...

user306937

asked Aug 14, 2024 at 8:23

0 votes

0 answers

75 views

Designing a Neural Network to predict a player's input in a 2D Fighting Game using game state information

I am currently trying to design a NN with the goal of broadly imitating the behavior of certain players in a specific 2D fighting game. The game in question records "replays" of each game ...

Michael

1

asked Aug 13, 2024 at 10:37

2 votes

2 answers

727 views

How does two tower model map to shared embedding space for two different type of entity?

A canonical example is say you have user and merchandise user (feature: age, location....) merchandise (feature: type, size, .....) And you want to create embedding to map user and merchandise to same ...

ktt

21

asked Aug 9, 2024 at 4:37

1 vote

0 answers

90 views

Is it possible to explain regression or classification, interpolation and generation using a single model structure?

Neural network is established as an universal approximator of all machine learning models. Further, double descent phenomenon in a neural network propagates the journey of regression to interpolation ...

Lakshman

81

asked Aug 9, 2024 at 3:50

2 votes

0 answers

58 views

DNN quantile regression

I have a PyTorch model, the purpose of which is to predict quantiles over the output given an input. The output in this case is service time (minutes) for machine maintenance. The inputs detail ...

jbuddy_13

3,970

asked Aug 7, 2024 at 22:21

6 votes

2 answers

604 views

Hessian of the softmax function

Problem Let $\mathbf{x} \in \mathbb{R}^n$ and $\mathbf{c} \in \mathbb{R}^n$, and consider a softmax function $\sigma: \mathbb{R}^n \to \mathbb{R}^n$ Find representation of the Hessian of $f=\mathbf{c}...

moreblue

1,585

asked Aug 3, 2024 at 23:17

0 votes

0 answers

138 views

Should the target be standardized in gradient descent?

Suppose that we have a general loss function that depends on some parameters $w$ (e.g. neural network weights): $$L_w =\frac{1}{N} \sum_i \ell(\hat{y}_i, y_i)$$ Is it beneficial to standardize the ...

Antonios Sarikas

881

asked Aug 2, 2024 at 23:09

4 votes

1 answer

159 views

Uncertainty of ANN outputs as distribution parameters

It is not an uncommon practice to train neural network models via negative log likelihood $-\mathcal{L}(x, y_{true}, \mu, \sigma)$ to estimate both a location ($\mu$) and a scale ($\sigma$), such that ...

Miles

167

asked Jul 31, 2024 at 20:11

1 vote

0 answers

41 views

Two variants of Nesterov Accelerated Gradient: are they equivalent?

I was puzzled to find that the description of the Nesterov Accelerated Gradient on Paperswithcode, namely: $v_t = \beta * v_{t-1} \color{red}{+} \eta * ∇ J(\theta \color{red}{-} \beta * v_{t-1})$ $\...

Jérémie Wenger

11

asked Jul 30, 2024 at 12:44

2 votes

1 answer

92 views

Why Does the Sigmoid Output Layer in a Binary Feedforward Neural Network Represent the Probability of the Positive Class (Label = 1)?

I'm a beginner who just started to study deep learning. I recently learned that in a feedforward neural network with a binary output and a Bernoulli distribution, the output of the sigmoid function ...

wruskrappy

137

asked Jul 24, 2024 at 8:12

1 vote

0 answers

43 views

Use latest data in training a timeseries model

I am training a global timeseries deep learning model. I have split the data for training, validation(to select the best hyperparameters), and test(to test on out of sample data). There are only 3 ...

seeker

11

asked Jul 22, 2024 at 23:38

1 vote

0 answers

48 views

Calculate gradient with chain rule using additions [closed]

I am taking Karpathy's course, specifically I am on the first video. There is a step in the development of micrograd that I don't fully understand. Specifically in this section, when he talks about ...

Guillermo Álvarez

43

asked Jul 20, 2024 at 18:27

1 vote

1 answer

110 views

NER With Custom Tags, How to Approach

I am building a "field tagger" for documents. Basically, a document, in my case something like a proposal or sales quote, would have a bunch of entities scattered throughout it, and we want ...

redbull_nowings

31

asked Jul 18, 2024 at 17:17

0 votes

0 answers

86 views

Augmenting data for LSTM

The problem: I have a datset with monthly economic indicators alongside monthly stock price, containing 434 total observations. I have attempted to fit an LSTM onto the data, but it seems to ...

altayir1

1

asked Jul 18, 2024 at 10:51

1 vote

0 answers

47 views

Why is the threshold term incorporated into the weight vector in linear classifiers?

In the context of linear classifiers, such as the perceptron or logistic regression, I understand that the decision boundary is defined by a linear combination of input features and weights, plus a ...

Narges Ghanbari

121

asked Jul 17, 2024 at 5:49

1 vote

0 answers

27 views

energy efficiency and time for a SIMD broadcast systolic data flow deep neural network

Im trying to understand why memfetch is multiplied by km(L+1)/8 instead of NMACS which is 8 and also what is meant by systolic clk increment Consider a fully connected layer. Let • X ∈ RK×L real 32bit ...

Mohamed Insaf

11

asked Jul 11, 2024 at 13:43

2 votes

1 answer

69 views

What would be the convolutional layer output by keras.layers.Conv2D when conv output is fractional?

I have input ($n=224$), strides ($s=4$), filter size ($k=11$) and no padding which gives me a fractional conv output: $$\texttt{conv output} = (n-k+2p)/s + 1 = 54....

Shri

23

asked Jul 10, 2024 at 5:35

1 vote

0 answers

60 views

Output from Model A as Training Data into Model B

Not sure this is the right place to ask this question, but I'm having a disagreement with a colleague on this idea. Let's say we have a dataset comprised of "unclean" strings. The end goal ...

setty

161

asked Jul 9, 2024 at 16:48

5 votes

1 answer

469 views

What is the best epoch to evaluate the test images?

I created a training, a validation and a test set for an image classification task. Then, I did training using the training and did evaluation on validation set. So, the next step is to evaluate the ...

cancan

53

asked Jul 7, 2024 at 12:16

2 votes

2 answers

78 views

how long should i run a training to realize how well an NN model is doing?

Suppose I am manually tuning the hyperparameters of an NN model. How many epochs of training should I run at a minimum to realize that the model won't give me the desired accuracy I need before ...

user366312

2,077

asked Jul 4, 2024 at 2:42

1 vote

0 answers

46 views

Modeling for a data set that has different number of factors for each row (not binomial) [closed]

The modeling issue I'm having is that the categorical variable for each row has different number of factors. If I can reshape the data by products (a,b,c,.....~cost, hoursum, numPod, numDate), so that ...

rocknRrr

121

asked Jul 3, 2024 at 17:22

1 vote

1 answer

101 views

Weight initialisation for neural networks - should they be different for each observations or the same?

I am implementing myself a Neural Network with feedforward and backpropagation with gradient descent to understand better how things work. After setting up the entire algorithm, I still have a huge ...

umbe1987

307

asked Jul 2, 2024 at 14:06

2 votes

1 answer

105 views

The meaning of linear transformation in a batch norm revisited

I'm reading BatchNorm Wikipedia page, where they explain that BatchNorm. I think the actual formulas are easier than words in this case. The norm statistics are calculated as: $$\large{\displaystyle \...

Mah Neh

173

asked Jun 30, 2024 at 11:00

1 vote

1 answer

116 views

How can different models based on different sets of predictors be combined to significantly improve the model performance?

I have two machine learning models for predicting some continuous variable $y$, say $y=f_1(X_1, \theta_1)$ and $y=f_2(X_2, \theta_2)$, and these models are of the same type (ANN). $X_1$ and $X_2$ ...

tunar

563

asked Jun 30, 2024 at 3:01

1 vote

1 answer

110 views

NeRF vs mesh for text-to-3d generation

There seem to be multiple aproaches to generating 3d objects from text prompt. What's confusing is that some of them are generating NeRFs (https://arxiv.org/pdf/2308.16512), other's are generating ...

zlenyk

196

asked Jun 27, 2024 at 9:56

-1 votes

1 answer

59 views

How does a neural network differentiate between a neuron that outputs 0 and a dropped-out one?

How does a network differentiate between a neuron with output 0 and a dropped-out neuron (this neuron might output a non-zero value but due to dropout it outputs 0)?

Antonios Sarikas

881

asked Jun 25, 2024 at 19:59

1 vote

1 answer

116 views

How can I learn and remove the linear trend in the residuals against the true response values generated by an ordinary neural network?

I built a neural network using PyTorch to predict y (a continuous variable) based on X consisting of m (=20) features. I found that the residuals (y_predicted – y_true) for the test data set show a ...

tunar

563

asked Jun 25, 2024 at 2:59

1 vote

0 answers

46 views

Are these generated from my code the so called feature maps?

I assume that the way people build which activations detect specific pieces from an image is by executing the network and extracting the results at each layer; when the output is from a convolutional ...

Mah Neh

173

asked Jun 21, 2024 at 17:44

1 vote

0 answers

113 views

Why doesn't Kaiming/He weight Initialization seek a 50/50 compromise for forward and backward pass?

Sorry, please let me know if I'm off, but it seems that He initialization aims to either maintain a constant variance through the forward pass or through the backward pass. It seems the idea is that, ...

riley

11

asked Jun 15, 2024 at 5:53

1 vote

0 answers

44 views

Neural networks with uncertainties in training data

I have used Flax to train a neural network to fit a model to some data. All of the data points have a known uncertainty, as in each row has both a value and an uncertainty. (To be more explicit: the ...

rhombidodecahedron

3,172

asked Jun 14, 2024 at 16:55

2 votes

0 answers

95 views

derivative of Logistic Regression (sigmoid) [closed]

I am having difficulty figuring out, why I get different answer from the professor. we are tasked with finding the deriative of the logistic regression cost function with the sigmoid function: $$ L(w│...

Ofek nourian

21

asked Jun 14, 2024 at 11:49

1 vote

1 answer

102 views

(Multivariate) anomaly detection of (redundant) sensor data

I’m currently working on my master thesis and I’m looking for some inputs for the following situation: I have data of 2-20 sensors all measuring the same variable at 1-3 different locations in 15mins-...

Alexander

11

asked Jun 13, 2024 at 11:05

1 vote

0 answers

46 views

Getting accurate Uncertainty from MFVI?

I wanted to know if there has been any research on methods to improve the accuracy of Mean-Field Variantional Inference (which doesn't discard the mean-field approximation). Apparently it is known to ...

profPlum

593

asked Jun 10, 2024 at 16:26

0 votes

0 answers

51 views

Why Do AR-NN Models Have Tighter Confidence Intervals Compared to Linear AR Models?

I have conducted a forecast for the following data series using different autoregressive models: Intercept-only, AR1, AR2, ARIMA BIC, ARIMA AIC, and AR-NN. Using the point forecasts, the AR1 model is ...

Joe94

537

asked May 31, 2024 at 9:48

4 votes

1 answer

254 views

Choosing Between Intercept-Only and AR-NN Models: Justified to not use the model with the lowest RMSE/MAE?

I have created two autoregressive models for forecasting: a basic intercept-only model and an AR-NN (autoregressive neural network) model. Both models show similar performance based on recursive one-...

Joe94

537

asked May 29, 2024 at 8:36

3 votes

1 answer

134 views

What probability distribution is learned in this specific case? [duplicate]

I keep reading papers and blogposts where the training of a neural network is defined as learning some underlying probability distribution of the data. Imagine that you write CNN that outputs whether ...

Mah Neh

173

asked May 28, 2024 at 9:32

1 vote

0 answers

63 views

Can normalizing flows approximate bounded distributions in deep learning?

I’m exploring the use of normalizing flows in deep learning for generative modeling and I have a specific requirement: my target distributions are bounded (for example, between 0 and 1). I understand ...

Felipe Vieira

11

asked May 24, 2024 at 0:56

All Questions