In this lecture, we start our discussion by transitioning from thinking about a Radial Basis Function Neural Network as a kind of modification of a single neuron/perceptron (for better separability) to a whole neural network itself, with input, hidden (latent variable), and output layers. That lets us introduce Universal Approximation Theorems (UAT), like the one from Cybenko that say that virtually any function can be approximated by a neural network with an (arbitrarily large) single hidden layer so long as the activation functions are non-polynomial. We use the elusiveness of finding such a hypothetical neural network to motivate a more pragmatic "deep" architecture (i.e., with more than 1 hidden layer) with simpler activation functions that are more amenable to generic training approaches. That lets us introduce the Multi-Layer Perceptron (MLP) feed-forward neural network and the corresponding backpropagation method (gradient descent) used to optimize it relative to a differentiable (sum-of-squares in this case) loss function. Backpropagation is basically the "chain rule" unrolled over a neural network, allowing for errors in supervised learning to be "propagated backwards" to update gradients and then new values (after a gradient-descent step) to propagate forwards to calculate the new errors (starting the process again). We will pick up on backpropagation in the next lecture (and introduce CNN's and other architectures, some of which will not be purely feed forward).
Whiteboard notes for this lecture can be found at: https://www.dropbox.com/s/44tpg5v81k1erja/IEE598-Lecture7C-2022-03-31-Deep_Neural_Networks-UAT_MLP_and_Backpropagation.pdf?dl=0