How does the cross entropy loss function interact with the final layer of a neural network?

Ask Question

Asked 2 years, 9 months ago

Modified 2 years, 9 months ago

Viewed 139 times

I am having trouble understanding how the result of categorical cross entropy loss can be used to calculate the gradient for all of the weights.

The output of cross entropy function is the sum of all of the negative log likelihoods multiplied by the one-hot encoded vector of the actual(desired) result of the neural net. Once I get this information I do not understand what I am supposed to do with it.

To calculate the gradient of cross entropy loss, many sources on the internet have told me to use this formula:

//The variable actualOutput being the one-hot encoded vector of desired outputs based on input
for(int i = 0; i < softmaxOutput.size(); i++)
{
   softmaxOutput[i] -= actualOutput[i];
}

Then to take this result and pipe it in to the softmax derivative function. I was very confused by this, because it does not involve the result of the loss function at all during backward propagation. Even worse of a problem for me is that when I implement this it does not work. Am I looking at this problem the right way?

Here is some of my code for the entropy loss formula I am using made specifically to solve MNIST:

class mnist_entropy_loss
{
    private:
        double *pred = new double[10];
        vector<vector<double>> dist = {{1, 0, 0, 0, 0, 0, 0, 0, 0, 0},
                                       {0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
                                       {0, 0, 1, 0, 0, 0, 0, 0, 0, 0},
                                       {0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
                                       {0, 0, 0, 0, 1, 0, 0, 0, 0, 0},
                                       {0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
                                       {0, 0, 0, 0, 0, 0, 1, 0, 0, 0},
                                       {0, 0, 0, 0, 0, 0, 0, 1, 0, 0},
                                       {0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
                                       {0, 0, 0, 0, 0, 0, 0, 0, 0, 1}};
    public:
        mnist_entropy_loss();
        double calculateLoss(double*, char);
        double* calculateGradient(double*, char);
};

double mnist_entropy_loss::calculateLoss(double *input, char label)
{
    vector<double> actual = this->dist[label];
    double loss = 0;
    for(int i = 0; i < 10; i++)
    {
        loss += -(actual[i] * log(input[i])); 
    }

    return loss;
}

double *mnist_entropy_loss::calculateGradient(double *input, char label)
{
    vector<double> actual = this->dist[label]
    for(int i = 0; i < 10; i++)
    {
        //I find it confusing that the loss function output does not seem to have anything
        //to do with this.
        input[i] -= actual[i];
    }
    return input;
}

I am piping the results of the calculateGradient formula into the backward function. I am not sure if I am approaching this correctly. Many of the articles that I have read and videos that I have watched show me the same derivatives and formulas, but infer different uses for them. I am just confused about the flow of the data at this part of the network.

My network runs properly without cross entropy, so I know that's not the problem. Am I handling the data correctly here? Do you know of any good resources for this?

asked Feb 18, 2023 at 6:12

Nick

334 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How does the cross entropy loss function interact with the final layer of a neural network?

0

Your Answer

Hot Network Questions

How does the cross entropy loss function interact with the final layer of a neural network?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions