I'm a beginner who just started to study deep learning. I recently learned that in a feedforward neural network with a binary output and a Bernoulli distribution, the output of the sigmoid function represents the probability that the label is 1. I`m curious why it cannot be the other way round (probability of the label being 0). Is it just for the convenience?
$\begingroup$
$\endgroup$
2
-
3$\begingroup$ Related...you don't even have to use the 0/1 convention. $\endgroup$Dave– Dave2024-07-24 16:27:11 +00:00Commented Jul 24, 2024 at 16:27
-
$\begingroup$ The post helped greatly. Thanks! $\endgroup$wruskrappy– wruskrappy2024-07-24 23:31:53 +00:00Commented Jul 24, 2024 at 23:31
Add a comment
|
1 Answer
$\begingroup$
$\endgroup$
It is a convention.
Ultimately, what is important is that the objective function, likely to be log-likelihood in your context has to be computed based on the convention that you have chosen.
It would be great to follow the convention that most have adopted to reduce the risk of miscommunication/ misinterpretation.