Why does batch normalization make lower layers 'useless' in purely linear networks?

Ask Question

Asked 12 days ago

Modified 12 days ago

Viewed 19 times

I'm reading the Deep Learning book by Goodfellow, Bengio, and Courville (Chapter 8 section 8.7.1 on Batch Normalization, page 315). The authors use a simple example of a deep linear network without activation functions:

$\hat{y} = x · w_1 · w_2 · w_3 · ... · w_l$

They explain how batch normalization helps prevent exploding/vanishing gradients by normalizing activations at each layer. However, they then state:

"Batch normalization has thus made this model significantly easier to learn. In this example, the ease of learning of course came at the cost of making the lower layers useless. In our linear example, the lower layers no longer have any harmful effect, but they also no longer have any beneficial effect. This is because we have normalized out the first- and second-order statistics, which is all that a linear network can influence."

Given that stacking linear layers is already "useless" (reducible to one layer) even without batch normalization, what additional "uselessness" does batch normalization introduce? The book seems to suggest batch normalization specifically makes them useless, and I do not understand why.

asked Nov 18 at 15:20

spierenb

112 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Why does batch normalization make lower layers 'useless' in purely linear networks?

0

Your Answer

Hot Network Questions

Why does batch normalization make lower layers 'useless' in purely linear networks?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions