Skip to main content

All Questions

Filter by
Sorted by
Tagged with
1 vote
0 answers
85 views

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
kite's user avatar
  • 11
0 votes
0 answers
60 views

I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model. The model I'm using is Longformer (because it has ...
Bhashwar Sengupta's user avatar
1 vote
0 answers
43 views

According to VAE paper : https://arxiv.org/pdf/1906.02691 (eq 2.10, page 21) VAE loss contain the expectation term. If I understand correctly, we need sample more than 1 result from an input x and ...
Manh's user avatar
  • 125
1 vote
1 answer
117 views

I have a dataset of numerous years of buoy wave height measurements including features such as measured significant wave height, numerical model predictions, peak wave period, mean wave period, and ...
Donald M.'s user avatar
2 votes
1 answer
218 views

When I studied machine learning for the first time, I learned that we need to use l2 regularization to improve generalization. The reason is based on the polynomial regression experiment in Chris ...
Fraïssé's user avatar
  • 1,708
5 votes
1 answer
176 views

A recent work shows generalisation beyond overfitting for overparametrized systems [*]. Is there any precedence from statistics literature or is this a new phenomenon for deep learning? [*] Grokking: ...
patagonicus's user avatar
  • 2,789
1 vote
0 answers
54 views

I am trying to classify either an image of 25x25 px stacked together as 50x25 px is the same(1) or different(0). I am using keras to create the NN layers. There are 10,000 instances of both 1s and 0s ...
Squish's user avatar
  • 111
1 vote
0 answers
42 views

I have some complex data, and I want to teach a neural network to perform certain operations on it. Now, since Tensorflow does not allow complex inputs, I separated my data into real and imaginary ...
rand1's user avatar
  • 111
0 votes
1 answer
42 views

I was working on a neural network project that uses this dataset; https://archive.ics.uci.edu/dataset/320/student+performance. The data has two main types of data. It has binaries (0 or 1) and ...
Matthew Gregg's user avatar
0 votes
0 answers
46 views

I have a model which works well with just one Dense layer. My model has an input layer, a Dense layer, and then one Reshape layer, which reshapes the output into the desired form. Normally, neural ...
rand1's user avatar
  • 111
5 votes
1 answer
687 views

Epochs, the number of times training is repeated on the original data, are absolutely necessary for neural networks where there are often many more parameters than original instances. What is the ...
Mitch's user avatar
  • 2,099
1 vote
0 answers
54 views

Apologies if this is not the correct forum for this request, but I figured I'd give it a shot. I'm looking for research in computer vision on shape recognition -- any shape. Most of the computer ...
luddite's user avatar
  • 141
0 votes
1 answer
72 views

I am a neural network newbie. I would like to attempt to implement the following architecture deep learning a stochastic control problem, taken from this paper. Here $s_t$, $a_t$, $c_t$ and $\xi_t$ ...
Anthony's user avatar
  • 542
0 votes
0 answers
58 views

As an experiment, one could try to have n different activations/neurons/units in a layer. One to adapt the automated backpropagation algorithms from deep learning ...
user avatar
0 votes
0 answers
75 views

I am currently trying to design a NN with the goal of broadly imitating the behavior of certain players in a specific 2D fighting game. The game in question records "replays" of each game ...
Michael's user avatar
2 votes
2 answers
727 views

A canonical example is say you have user and merchandise user (feature: age, location....) merchandise (feature: type, size, .....) And you want to create embedding to map user and merchandise to same ...
ktt's user avatar
  • 21
1 vote
0 answers
90 views

Neural network is established as an universal approximator of all machine learning models. Further, double descent phenomenon in a neural network propagates the journey of regression to interpolation ...
Lakshman's user avatar
2 votes
0 answers
58 views

I have a PyTorch model, the purpose of which is to predict quantiles over the output given an input. The output in this case is service time (minutes) for machine maintenance. The inputs detail ...
jbuddy_13's user avatar
  • 3,970
6 votes
2 answers
604 views

Problem Let $\mathbf{x} \in \mathbb{R}^n$ and $\mathbf{c} \in \mathbb{R}^n$, and consider a softmax function $\sigma: \mathbb{R}^n \to \mathbb{R}^n$ Find representation of the Hessian of $f=\mathbf{c}...
moreblue's user avatar
  • 1,585
0 votes
0 answers
138 views

Suppose that we have a general loss function that depends on some parameters $w$ (e.g. neural network weights): $$L_w =\frac{1}{N} \sum_i \ell(\hat{y}_i, y_i)$$ Is it beneficial to standardize the ...
Antonios Sarikas's user avatar
4 votes
1 answer
159 views

It is not an uncommon practice to train neural network models via negative log likelihood $-\mathcal{L}(x, y_{true}, \mu, \sigma)$ to estimate both a location ($\mu$) and a scale ($\sigma$), such that ...
Miles's user avatar
  • 167
1 vote
0 answers
41 views

I was puzzled to find that the description of the Nesterov Accelerated Gradient on Paperswithcode, namely: $v_t = \beta * v_{t-1} \color{red}{+} \eta * ∇ J(\theta \color{red}{-} \beta * v_{t-1})$ $\...
Jérémie Wenger's user avatar
2 votes
1 answer
92 views

I'm a beginner who just started to study deep learning. I recently learned that in a feedforward neural network with a binary output and a Bernoulli distribution, the output of the sigmoid function ...
wruskrappy's user avatar
1 vote
0 answers
43 views

I am training a global timeseries deep learning model. I have split the data for training, validation(to select the best hyperparameters), and test(to test on out of sample data). There are only 3 ...
seeker's user avatar
  • 11
1 vote
0 answers
48 views

I am taking Karpathy's course, specifically I am on the first video. There is a step in the development of micrograd that I don't fully understand. Specifically in this section, when he talks about ...
Guillermo Álvarez's user avatar
1 vote
1 answer
110 views

I am building a "field tagger" for documents. Basically, a document, in my case something like a proposal or sales quote, would have a bunch of entities scattered throughout it, and we want ...
redbull_nowings's user avatar
0 votes
0 answers
86 views

The problem: I have a datset with monthly economic indicators alongside monthly stock price, containing 434 total observations. I have attempted to fit an LSTM onto the data, but it seems to ...
altayir1's user avatar
1 vote
0 answers
47 views

In the context of linear classifiers, such as the perceptron or logistic regression, I understand that the decision boundary is defined by a linear combination of input features and weights, plus a ...
Narges Ghanbari's user avatar
1 vote
0 answers
27 views

Im trying to understand why memfetch is multiplied by km(L+1)/8 instead of NMACS which is 8 and also what is meant by systolic clk increment Consider a fully connected layer. Let • X ∈ RK×L real 32bit ...
Mohamed Insaf's user avatar
2 votes
1 answer
69 views

I have input ($n=224$), strides ($s=4$), filter size ($k=11$) and no padding which gives me a fractional conv output: $$\texttt{conv output} = (n-k+2p)/s + 1 = 54....
Shri's user avatar
  • 23
1 vote
0 answers
60 views

Not sure this is the right place to ask this question, but I'm having a disagreement with a colleague on this idea. Let's say we have a dataset comprised of "unclean" strings. The end goal ...
setty's user avatar
  • 161
5 votes
1 answer
469 views

I created a training, a validation and a test set for an image classification task. Then, I did training using the training and did evaluation on validation set. So, the next step is to evaluate the ...
cancan's user avatar
  • 53
2 votes
2 answers
78 views

Suppose I am manually tuning the hyperparameters of an NN model. How many epochs of training should I run at a minimum to realize that the model won't give me the desired accuracy I need before ...
user366312's user avatar
  • 2,077
1 vote
0 answers
46 views

The modeling issue I'm having is that the categorical variable for each row has different number of factors. If I can reshape the data by products (a,b,c,.....~cost, hoursum, numPod, numDate), so that ...
rocknRrr's user avatar
  • 121
1 vote
1 answer
101 views

I am implementing myself a Neural Network with feedforward and backpropagation with gradient descent to understand better how things work. After setting up the entire algorithm, I still have a huge ...
umbe1987's user avatar
  • 307
2 votes
1 answer
105 views

I'm reading BatchNorm Wikipedia page, where they explain that BatchNorm. I think the actual formulas are easier than words in this case. The norm statistics are calculated as: $$\large{\displaystyle \...
Mah Neh's user avatar
  • 173
1 vote
1 answer
116 views

I have two machine learning models for predicting some continuous variable $y$, say $y=f_1(X_1, \theta_1)$ and $y=f_2(X_2, \theta_2)$, and these models are of the same type (ANN). $X_1$ and $X_2$ ...
tunar's user avatar
  • 563
1 vote
1 answer
110 views

There seem to be multiple aproaches to generating 3d objects from text prompt. What's confusing is that some of them are generating NeRFs (https://arxiv.org/pdf/2308.16512), other's are generating ...
zlenyk's user avatar
  • 196
-1 votes
1 answer
59 views

How does a network differentiate between a neuron with output 0 and a dropped-out neuron (this neuron might output a non-zero value but due to dropout it outputs 0)?
Antonios Sarikas's user avatar
1 vote
1 answer
116 views

I built a neural network using PyTorch to predict y (a continuous variable) based on X consisting of m (=20) features. I found that the residuals (y_predicted – y_true) for the test data set show a ...
tunar's user avatar
  • 563
1 vote
0 answers
46 views

I assume that the way people build which activations detect specific pieces from an image is by executing the network and extracting the results at each layer; when the output is from a convolutional ...
Mah Neh's user avatar
  • 173
1 vote
0 answers
113 views

Sorry, please let me know if I'm off, but it seems that He initialization aims to either maintain a constant variance through the forward pass or through the backward pass. It seems the idea is that, ...
riley's user avatar
  • 11
1 vote
0 answers
44 views

I have used Flax to train a neural network to fit a model to some data. All of the data points have a known uncertainty, as in each row has both a value and an uncertainty. (To be more explicit: the ...
rhombidodecahedron's user avatar
2 votes
0 answers
95 views

I am having difficulty figuring out, why I get different answer from the professor. we are tasked with finding the deriative of the logistic regression cost function with the sigmoid function: $$ L(w│...
Ofek nourian's user avatar
1 vote
1 answer
102 views

I’m currently working on my master thesis and I’m looking for some inputs for the following situation: I have data of 2-20 sensors all measuring the same variable at 1-3 different locations in 15mins-...
Alexander's user avatar
1 vote
0 answers
46 views

I wanted to know if there has been any research on methods to improve the accuracy of Mean-Field Variantional Inference (which doesn't discard the mean-field approximation). Apparently it is known to ...
profPlum's user avatar
  • 593
0 votes
0 answers
51 views

I have conducted a forecast for the following data series using different autoregressive models: Intercept-only, AR1, AR2, ARIMA BIC, ARIMA AIC, and AR-NN. Using the point forecasts, the AR1 model is ...
Joe94's user avatar
  • 537
4 votes
1 answer
254 views

I have created two autoregressive models for forecasting: a basic intercept-only model and an AR-NN (autoregressive neural network) model. Both models show similar performance based on recursive one-...
Joe94's user avatar
  • 537
3 votes
1 answer
134 views

I keep reading papers and blogposts where the training of a neural network is defined as learning some underlying probability distribution of the data. Imagine that you write CNN that outputs whether ...
Mah Neh's user avatar
  • 173
1 vote
0 answers
63 views

I’m exploring the use of normalizing flows in deep learning for generative modeling and I have a specific requirement: my target distributions are bounded (for example, between 0 and 1). I understand ...
Felipe Vieira's user avatar