3
$\begingroup$

We am trying to understand the number of points that a neural network of a particular size can interpolate. I think this may be isomorphic to its degree of freedom? We are not interested in whether particular optimization methods would reach them, just if there is a theoretical bound.

To be precise: take a "neural network" with one hidden layer

$$ f(x) \equiv W_2 \cdot \sigma(W_1 \cdot x + b_1) + b_2 $$ Where,

  • $f : \mathbb{R} \to \mathbb{R}$
  • $\sigma(\cdot) = max(0, \cdot)$ element-wise (i.e.ReLU)
  • $W_1 \in \mathbb{R^N}$
  • $b_1 \in \mathbb{R^N}$
  • $W_2 \in \mathbb{R^N}$
  • $b_2 \in \mathbb{R}$
  • $\theta \in \{b_1, W_1, b_2, W_2\} \in \mathbb{R}^{3N+1}$

Note for a given $N$ there are $3N+1$ parameters.

Question: For a fixed $N$, what is the maximum number of points in $\mathbb{R}$ where there is always a $\theta$ which can interpolate them? For other functional forms like orthogonal polynomials, the number of points is always the number of parameters, but isn't it lower than $3N+1$ due to the collinearity of the bias?

$\endgroup$
9
  • 1
    $\begingroup$ This is two layer (one-hidden layer nn), and $W_1$ is $N\times N$. For interpolation, f(x) should be equal to y for each x, y pair. For one layer NN, this boils down to solving the linear equation $WX+b\mathbf{1}=Y$, which certainly has its limits. For the two layer case, I believe there is still limit, but not sure if it's provable. $\endgroup$ Commented Feb 5, 2022 at 18:06
  • $\begingroup$ Oops sorry, yes meant one hidden layer. So there may not be a closed form solution in the above case? $\endgroup$ Commented Feb 5, 2022 at 22:09
  • $\begingroup$ Your notation says there are $N$ parameters in $W_1$, so are we only considering the where each $x$ is scalar? $\endgroup$ Commented Feb 7, 2022 at 22:07
  • $\begingroup$ yes, Sorry, I thought the $f : R \to R$ made that unambiguous? Do you think I should add in $x \n R$ to make it even clearer? $\endgroup$ Commented Feb 7, 2022 at 22:12
  • $\begingroup$ Oh, I see that now. I missed it the first time. Basically, we can choose $N,W_1, b_1$ such that $\sigma(W_1 x + b_1)$ is a basis. Then we know that $W_2, b_2$ are just estimated from a regression. Do you think you can take it from here? $\endgroup$ Commented Feb 7, 2022 at 22:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.