Newest 'cross-entropy' Questions

2 votes

1 answer

276 views

Maximum Likelihood, Cross-Entropy, and Conditional Empirical Distributions for Conditional Models

I came across this article: “MSE is Cross Entropy at Heart: Maximum Likelihood Estimation Explained” which states: "When training a neural network, we are trying to find the parameters of a ...

spie227

242

asked Feb 15 at 12:18

3 votes

1 answer

107 views

What is the meaning of the Variance of Cross-Entropy Loss?

Usually we use the average of cross entropy loss over all test examples as an index, can we use the variance of cross entropy loss as an index also?

Bayesian Hat

171

asked Jan 18 at 12:21

0 votes

0 answers

50 views

Optimal Importance Sampling

Suppose we want to estimate $$r = \mathbb{E}_{x\backsim p(x)} [f(x)]$$ via importance sampling i.e. $$r = \mathbb{E}_{x\backsim q(x)} \left[\frac{f(x)p(x)}{q(x)}\right]$$ Now wikipedia says that ...

Lazy Guy

35

asked Jan 16 at 18:21

4 votes

2 answers

665 views

Relationship between AUC and Cross-entropy

I understand that AUC measures the model's ability to rank the subjects (see Why is ROC AUC equivalent to the probability that two randomly-selected samples are correctly ranked?). In contrast, binary ...

iRum

41

asked Dec 2, 2024 at 20:24

2 votes

1 answer

205 views

Why is it ok NOT to use soft labels in classification? [duplicate]

I have, in some sense, an opposite question to Is it okay to use cross entropy loss function with soft labels? which is why is it ok NOT to use soft labels in classification? Let's say you have a ...

YuseqYaseq

161

asked Nov 26, 2024 at 7:50

0 votes

0 answers

66 views

How do we derive the standard cross entropy loss from negative log likelihood in a supervised (conditional) learning setting?

I know that when optimizing neural networks (supervised) that cross entropy loss is equivalent to negative log likelihood is euivalent to MLE but I can't get all the math together. I am trying to ...

Meem12

11

asked Sep 4, 2024 at 10:23

1 vote

0 answers

29 views

Why are we using KL divergence over cross entropy? [duplicate]

I read this question Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function? and I cannot fully understand the answer. If we're using KL divergence for the ...

COTHE

11

asked Aug 31, 2024 at 8:48

4 votes

1 answer

530 views

Link between Cross-entropy and MLE

There are numerous material that show the relationship between MLE and cross-entropy. Typically, these are the steps taken to show the relationship for a I.I.D data generating process $D = (X,Y)$: $$ ...

spie227

242

asked Jul 11, 2024 at 15:13

2 votes

1 answer

188 views

Which form of cross-entropy loss is correct?

For classification problems with more than two classes, I've seen these two forms of cross-entropy loss: -$\sum_k y_k \log(a_k)$ -$\sum_k y_k \log(a_k) + (1-y_k) \log(1-a_k)$ Here $y_i$ are the true ...

theQman

707

asked Feb 19, 2024 at 12:25

2 votes

0 answers

212 views

Why is cross-entropy increasing with accuracy? [closed]

I'm making an implementation of the softmax regression and I'm struggling to understand the nature behind the problem of increasing value of Cross-Entropy: $H(y_i, p_i)=-\sum_{i=1}^C y_i log(p_i)$, ...

JoshJohnson

153

asked Dec 2, 2023 at 12:56

2 votes

1 answer

1k views

Understanding shannon entropy and computation with scipy.stats.entropy

I am trying to understand the shannon entropy better. By definition, the shannon entropy is calculated as H = -sum(pk * log(pk)). I am using the scipy.stats.entropy formula and I am running the ...

GGChe

185

asked Nov 3, 2023 at 11:22

0 votes

1 answer

334 views

How (or can) you formulate the Fisher information matrix in terms of a loss function, specifically cross-entropy loss?

I recently saw the following formulation of the Fisher information matrix in a paper on Transformer pruning: $$ \mathcal{I} := \frac{1}{|D|} \sum_{(x,y) \in D} \left( \frac{\partial \mathcal{L}(x,y;1)}...

premed

1

asked Oct 22, 2023 at 7:21

0 votes

0 answers

97 views

Concentration Inequality for cross-entropy

I am currently trying to estimate the cross-entropy between two distributions with densities $p$ and $q$. $$ \ell = -\mathbb{E}_{x\sim p(x) }[\log q(x)] $$ I am using a Monte-Carlo estimate: $$ \hat{\...

Nick Bishop

131

asked Sep 20, 2023 at 15:12

0 votes

0 answers

133 views

Minimizing cross entropy over a restricted domain?

Suppose $f(x;q)$ is the true distribution. The support of the random variable $X$ is $\Omega$. Suppose, I am interested in a particular subset of $\Xi \subset \Omega$. I would like to minimize the ...

entropy

19

asked Aug 18, 2023 at 14:15

2 votes

1 answer

605 views

What exactly is the problem with overconfident predictions?

Say I have a neural network that classifies images by training to minimise cross-entropy loss with one-hot encoded training labels. It is often seen that such neural networks are 'overconfident', with ...

Danny Duberstein

151

asked Aug 14, 2023 at 12:57

0 votes

1 answer

98 views

Is CE(X, Y) equivalent to H(X) + H(Y)?

From my understanding mutual information can be defined in the following ways: [1]: $I(X;Y)=H(X)+H(Y)-H(X,Y)$ where $H(X), H(Y)$ are marginal entropies and $H(X,Y)$ is the joint entropy. [2]: $I(X;Y)=...

Rui

3

asked Jul 18, 2023 at 12:22

1 vote

0 answers

63 views

Decision boundary for Cross entropy loss and Least square loss

We can see the source in this paper. My question is that why cross entropy loss has a boundary line in slope but least square loss has horizontal boundary. Can somebody explain?

batuman

483

asked Jul 6, 2023 at 0:30

0 votes

1 answer

143 views

Derivation of cross entropy loss in machine learning

Given a dataset $\mathcal{D} = \{ (x_1, y_1),\cdots, (x_n, y_n)\}$, let's say we want to approximate the conditional probability $p(y|x)$, and we parameterized it as $p_{\theta}(y|x)$. So,for a ...

UESTCfresh

101

asked Jul 1, 2023 at 22:48

2 votes

1 answer

530 views

can we use binary cross entropy with labels -1 and 1?

Binary cross entropy is written as follows: \begin{equation} \mathcal{L} = -y\log\left(\hat{y}\right)-(1-y)\log\left(1-\hat{y}\right) \end{equation} In every reference that I read, when using binary ...

andryan86

147

asked May 26, 2023 at 10:23

4 votes

1 answer

923 views

Meaning of non-{0,1} labels in binary cross entropy?

Binary cross entropy is normally used in situations where the "true" result or label is one of two values (hence "binary"), typically encoded as 0 and 1. However, the documentation ...

R.M.

1,098

asked Apr 6, 2023 at 19:08

1 vote

1 answer

216 views

Calculating KL divergence with entropy and cross entropy for VAEs

When looking at implementations of VAE's online, specifically the KL divergence loss, the formula used is: $$ KL\hspace{1mm} Loss = -\frac{1}{2}(1+\log{\sigma^2}-\mu^2-\sigma^2) $$ or some variation ...

pyrrosk

33

asked Mar 19, 2023 at 16:18

1 vote

1 answer

140 views

Very balanced dataset and a multiclass classification problem, no context behind the inputs. Which evaluation metric to use?

I have constructed a simple neural network model, for a classification problem, with 10 target classes where an input (with some number of features) is to be classified to only one of the 10 classes. ...

creamedcheese83

13

asked Mar 4, 2023 at 21:09

1 vote

0 answers

28 views

Log base in Cross Entropy Loss [duplicate]

What is the base for the logarithm used in the cross entropy loss (while doing multiclass classification's backpropagation)? Is it e, 2, or 10?

Sachin

111

asked Mar 2, 2023 at 13:37

2 votes

1 answer

1k views

Why is the cross entropy of the same probability distribution not 0?

From what I've been reading, if there is no underlying difference between the 2 probabilities distributions we would have perfect entropy. I'm putting an example below. Can anybody explain why the ...

julian lagier

33

asked Feb 21, 2023 at 18:55

1 vote

0 answers

139 views

How does the cross entropy loss function interact with the final layer of a neural network?

I am having trouble understanding how the result of categorical cross entropy loss can be used to calculate the gradient for all of the weights. The output of cross entropy function is the sum of all ...

Nick

33

asked Feb 18, 2023 at 6:12

2 votes

0 answers

299 views

Understanding intuitive difference between KL divergence and Cross entropy

I know there are related questions already asked, for example this one. I also know the following: KL divergence $D_{KL}(P\Vert Q)$ is given as: $$\begin{align} D_{KL}(P\Vert Q) & = -\sum_xP(x)\...

Mahesha999

285

asked Jan 22, 2023 at 13:36

2 votes

1 answer

478 views

Derivative error with respect to bias in binary cross entropy

I will do research using NN with 1 hidden layer. To calculate loss using binary cross entropy and for the activation function using sigmoid. I found the derivative formula from Sadowski, 2016 (link: ...

Andryan

47

asked Dec 26, 2022 at 22:51

5 votes

1 answer

332 views

Does logistic regression try to predict the true conditional P(Y|X)?

Consider a binary classification dataset (X, Y), generated according to some unknown distribution $P(X, Y)$. I have a question about models which output probabilities by minimizing the cross-entropy ...

usual me

1,267

asked Nov 20, 2022 at 5:16

4 votes

2 answers

599 views

Understanding StatQuest video: why cross entropy is used over Sum Squared Error

I was watching cross entropy video from StatQuest. While explaining why to use cross entropy over SSE in multi output scenario with softmax output activation, Josh gives this graph of both losses: He ...

Mahesha999

285

asked Nov 18, 2022 at 9:01

2 votes

0 answers

27 views

Relate cross-entropy formal definition to the cross-entropy loss [duplicate]

Cross entropy for a random variable $x \sim p$ and a distribution $q$ is defined as: $$H(p,q) = -\sum_{x\in\mathcal{X}} p(x)\log q(x) = \mathbb{E}(\log q(x))$$ $\mathcal{X}$ is all possible values ...

rando

360

asked Oct 19, 2022 at 23:04

0 votes

0 answers

138 views

Add Bias to classification after training

I have a dataset with classes [a, b] where during training I have made sure that the dataset is equally balanced. I have trained the network using cross-entropy loss with equal importance. I am able ...

JakobVinkas

61

asked Oct 19, 2022 at 19:32

1 vote

0 answers

79 views

Genetic Algorithm as engine for Variational Inference?

I'm curious if anyone has used, heard of, or otherwise considered using Genetic Algorithms as an engine for Variational Inference (VI)? My understanding of VI is that it's an optimization algorithm, ...

jbuddy_13

3,970

asked Oct 12, 2022 at 22:30

2 votes

2 answers

242 views

Surprisal in rankings

I'm looking for some metric of surprisal when comparing ranked lists - things along the lines of (eg) the rankings in a marathon race, or the times in the race. Intuitively, in a race with 100 people, ...

Alex I

1,183

asked Sep 15, 2022 at 18:10

4 votes

0 answers

616 views

Cross-entropy vs dot product

Performance of classification algorithms is quantified by comparing the predicted probability distribution of the labels $q$ to the true probability $p$, which is commonly a vector of zeros for all ...

Aleksejs Fomins

2,191

asked Sep 8, 2022 at 13:51

1 vote

1 answer

287 views

How do machine learning algorithms handle classification labels?

I am working on a domain adaptation problem, where the default is a classification problem. I have worked exclusively with regression problems until now, so I am kind of thrown for a loop when it ...

Scott

13

asked Sep 2, 2022 at 14:18

1 vote

0 answers

276 views

How to explain the high accuracy and F1 score on the test set with a huge binary crossentropy loss?

I'll provide a little of introduction based on my example. I have a small collection of RGB (but 'gray-looking') brain MRI photos, divided into 2 classes: healthy and tumor. My data split looks like ...

Karolina Świergała

11

asked Aug 15, 2022 at 19:46

2 votes

1 answer

263 views

Why is it called the cross-entropy of q relative to p, not p relative to q?

I'm looking into the definition of cross entropy from wikipedia. https://en.wikipedia.org/wiki/Cross_entropy Cross entropy is not symmetric, so I think for sure it shouldn't be called cross entropy ...

user900476

433

asked Jul 30, 2022 at 4:48

2 votes

1 answer

902 views

Calculating the variance of softmax

I'm working through Dive Into Deep Learning right now and am struggling with the following question: We can explore the connection between exponential families and the softmax in some more depth. ...

jimac82

31

asked Jul 28, 2022 at 19:00

2 votes

1 answer

442 views

Cross entropy of a random variable or a probability distribution function? [duplicate]

I'm looking into the wikipedia page of cross entropy. https://en.wikipedia.org/wiki/Cross_entropy $$H(p,q)=-\sum_{x\in \mathcal{X}} p(x)\log q(x)$$ It can be written as $$H(p,q) = H(p) + D_{KL} (p||q)$...

user900476

433

asked Jul 28, 2022 at 1:58

0 votes

0 answers

55 views

What performance we get with same data combines with different datasets?

Suppose that we have dataset of special kind of cat. We are going to train a model on combination of the cat a the car! Suppose that in this model we will get a performance ( precision,recall or...) X ...

Mahdi Amrollahi

189

asked Jul 16, 2022 at 11:00

0 votes

0 answers

20 views

GoogleNet-LSTM, cross entropy loss does not decrease [duplicate]

...

makala

11

asked Jun 26, 2022 at 16:44

3 votes

0 answers

778 views

Is there an empirical rule for selecting the value of label smoothing?

I am wondering if there is any emperical rule for selecting the value of label smoothing when training a neural network. Let's define smoothed prediction targets in relation to a value $\epsilon$ to ...

thiaamak

31

asked Jun 22, 2022 at 22:57

1 vote

0 answers

737 views

Final Layer and Inference with CE vs BCE

I have read a similar question here: 1 neuron BCE loss VS 2 neurons CE loss that suggests there is no difference between softmax cross entropy loss and binary cross entropy loss, when choosing between ...

Anonymous

181

asked Jun 6, 2022 at 14:24

0 votes

0 answers

448 views

Why most works on Cityscapes don't use weighted cross-entropy?

Weight Cross-Entroy (WCE) helps to handle an imbalanced dataset, and Cityscapes is quite imbalanced as seen below: If we check the best benchmarks on this dataset, most of the works use bare CE as a ...

Rafael Toledo

166

asked Jun 1, 2022 at 20:38

3 votes

1 answer

2k views

Impact of L1 and L2 regularisation with cross-entropy loss

When we are dealing with Mean Square Error (MSE) loss function in optimization problems, we often add $L_1$ or $L_2$ penalty terms (or a combination of both) to the MSE loss function while training. ...

Aravind G.

143

asked May 26, 2022 at 15:38

0 votes

0 answers

330 views

How can I get the Binary Cross Entropy from the Cross Entropy function for GANs

I got the definition of log-likelihood by Goodfellow's Deep Learning book: \begin{equation} \label{eq:loglikelihood} \theta_{ML} = {argmax}\sum_{i=1}^{m} \log p_{model}(x_i; \theta). \end{...

Lucas Lima de Sousa

1

asked May 18, 2022 at 2:09

2 votes

1 answer

335 views

XGBoost Objective Derivation Problem

This is the loss function of XGBoost. This is the Second-order approximation of the loss function. Note: \begin{equation} L^{(t)} \text{: cross entropy loss function.} \end{equation} \begin{equation}...

ChrisChu

23

asked May 8, 2022 at 12:28

2 votes

1 answer

180 views

Likelihood and cross-entropy: continuous case

I think it's pretty clear to me that average log-likelihood is equivalent to negative cross-entropy for discrete distributions, as shown here: $$\frac{1}{N}\log\mathcal{L}(\theta) = \frac{1}{N}\log \...

Alex Zakharov

21

asked Apr 10, 2022 at 0:17

16 votes

2 answers

2k views

Disadvantages of using a regression loss function in multi-class classification

Given $k > 2$ classes, consider the following loss function $$ \sum_i||y^{(i)} - \hat y^{(i)}||^2 $$ Here $y^{(i)} \in \{0,1\}^k$ is the $i^{th}$ one-hot encoded true label and $\hat y^{(i)} \in [0,...

helperFunction

397

asked Mar 18, 2022 at 8:00

1 vote

1 answer

4k views

Confused with binary cross-entropy vs categorical cross-entropy

I have a dataset with 10 input categorical features and one output categorical feature with class 0 and 1. X_train follows a 3D array so I have done label encoding beforehand on the dataset. I have ...

be_real

113

asked Feb 27, 2022 at 13:29

Questions tagged [cross-entropy]