Thursday, March 31, 2022

Lecture 7C (2022-03-31): Deep Neural Networks (UAT, MLP, and Backpropagation)

In this lecture, we start our discussion by transitioning from thinking about a Radial Basis Function Neural Network as a kind of modification of a single neuron/perceptron (for better separability) to a whole neural network itself, with input, hidden (latent variable), and output layers. That lets us introduce Universal Approximation Theorems (UAT), like the one from Cybenko that say that virtually any function can be approximated by a neural network with an (arbitrarily large) single hidden layer so long as the activation functions are non-polynomial. We use the elusiveness of finding such a hypothetical neural network to motivate a more pragmatic "deep" architecture (i.e., with more than 1 hidden layer) with simpler activation functions that are more amenable to generic training approaches. That lets us introduce the Multi-Layer Perceptron (MLP) feed-forward neural network and the corresponding backpropagation method (gradient descent) used to optimize it relative to a differentiable (sum-of-squares in this case) loss function. Backpropagation is basically the "chain rule" unrolled over a neural network, allowing for errors in supervised learning to be "propagated backwards" to update gradients and then new values (after a gradient-descent step) to propagate forwards to calculate the new errors (starting the process again). We will pick up on backpropagation in the next lecture (and introduce CNN's and other architectures, some of which will not be purely feed forward).

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/s/44tpg5v81k1erja/IEE598-Lecture7C-2022-03-31-Deep_Neural_Networks-UAT_MLP_and_Backpropagation.pdf?dl=0



Tuesday, March 29, 2022

Lecture 7B (2022-03-29): Introduction to Neural Networks: SLP, RBFNN, and MLP

In this lecture, we continue to discuss the basic artificial neuron as a generalized linear model for statistical inference. We start with the problem of binary classification from a (single-layer) perceptron (SLP), which can use a threshold activation function to accurately predict membership among two classes so long as those classes are linearly separable. In the lecture, we introduce the geometric interpretation of the classification process as thresholding the level of agreement between the neural-network weight vector and the feature vector. From there, we consider radial basis function neural networks (RBFNN's) which transform the feature space to allow for more sophisticated inferences (e.g., classification for problems that are not linearly separable, function approximation, time-series prediction, etc.). The RBFNN is our first example of a single-hidden-layer neural network, which is our entry point to multi-layer perceptrons (MLP's) that we will discuss next time.

Whiteboard lecture notes can be found at: https://www.dropbox.com/s/dkbm3uw290gol4o/IEE598-Lecture7B-2022-03-29-Introduction_to_Neural_Networks-RBF_MLP_Backpropagation.pdf?dl=0



Thursday, March 24, 2022

Lecture 7A (2022-03-24): Introduction to Neural Networks

In this lecture, we introduce artificial neural networks (ANN's) and the neurobiological foundations that inspired them. We start with a description of the multipolar neuron, with many synapses and one axon, and focus on chemical synapses between axons and dendrites. We cover resting potential, synaptic potentials, and (traveling/propagating) action potentials. We then transition to the simple artificial neuron model (the basis of modern ANN's) as a function of a weighted sum of inputs and a bias term. The artificial neuron is portrayed as an alternative representation of generalized linear modeling (GLM) from statistics, with the activation function playing a similar role to the link function in GLM. We then discuss several common activation functions -- Heaviside (threshold), linear, rectified linear (ReLU), logistic (sinusoid), and hyberbolic tangent (tanh). We will pick up next time seeing how the single artificial neuron can be used for binary classification in linearly separable feature spaces, with a geometric interpretation of the weight vector as a line separating two classes of feature vectors.

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/s/vlrmqwatkerl7hf/IEE598-Lecture7A-2022-03-24-Introductoin_to_Neural_Networks.pdf?dl=0



Tuesday, March 22, 2022

Lecture 6D (2022-03-22): Distributed AI & Swarm Intelligence, Part 4 - Particle Swarm Optimization (PSO)

In this lecture, we cover the canonical Particle Swarm Optimization (PSO). We start with its functional motivations from the metaheuristic optimization of artificial neural networks and the mechanistic motivations from flocking graphics models ("boids") and related collective motion models from physics (Vicsek model). We then describe the motion rules for PSO and then briefly discuss more modern variants of PSO that have adaptive parameters (as in adaptive inertia) and use network effects to balance exploration and exploitation (e.g., "LBEST" versions of PSO).

Whiteboard lecture notes for this lecture can be found at: https://www.dropbox.com/s/m6b69fy8b695qgm/IEE598-Lecture6D-2022-03-22-Distributed_AI_and_Swarm_Intelligence-Part_4-Particle_Swarm_Optimization_PSO_and_friends.pdf?dl=0



Thursday, March 17, 2022

Lecture 6C (2022-03-17): Distributed AI and Swarm Intelligence, Part 3 - BFO and Intro to Classical Particle Swarm Optimization (PSO) & Its Motivations

This lecture outlines the structure of the Bacterial Foraging Optimization (BFO) metaheuristic for engineering design optimization (EDO) problems. BFO shares some features with other Swarm Intelligence/Distributed AI algorithms, such as Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO), while also borrowing ideas from population-based evolutionary metaheuristics (such as the Genetic Algorithm (GA)). BFO is based on the "run" and "tumble" chemotactic movement of E. coli bacteria, which sense local nutrient concentration gradients as well as chemical communication from other E. coli. After we cover BFO, we pivot to introducing Particle Swarm Optimization (PSO), a more popular swarm-based optimization metaheuristic that moves in virtual space (similar to BFO) but has an inertial component that allows each particle to build up momentum and thus be resistant to sudden changes. We will discuss PSO in more detail in the next lecture.

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/s/kh5ixieys70v2ib/IEE598-Lecture6C-2022-03-17-Distributed_AI_and_Swarm_Intelligence-Part_3-BFO_and_Intro_to_Classical_PSO.pdf?dl=0



Tuesday, March 15, 2022

Lecture 6B (2022-03-15): Distributed AI and Swarm Intelligence, Part 2 - ACO and Introduction to Bacterial Foraging Optimization (BFO)

After a few brief comments about student mini-projects related to multi-objective evolutionary algorithms, this lecture covers Ant Colony Optimization (ACO), with a particular focus on the Ant System (AS) prototype that came before it. This optimization metaheuristic was built originally to solve combinatorial (discrete) optimization problems mimicking how some ants lay pheromonal foraging trails, reinforcing good subgraphs through a network of possible solution candidates to a design problem. The class closes with an introduction to Bacterial Foraging Optimization (BFO), which is inspired by "running and tumbling" flagellated bacteria that can climb local concentration gradients and get out of local traps by using social information. During the class, we also introduce the concept of "stigmergy" (i.e., coordination through modification of the surrounding environment).

Whiteboard lecture notes available at: https://www.dropbox.com/s/shpheh73u1tngyw/IEE598-Lecture6B-2022-03-15-Distributed_AI_and_Swarm_Intelligence-Part_2-ACO_and_Bacterial_Foraging_Optimization.pdf?dl=0



Thursday, March 3, 2022

Lecture 5D/6A (2022-03-03): Simulated Annealing Wrap-up and Distributed AI and Swarm Intelligence, Part 1 - Ant Colony Optimization (ACO)

In this lecture, we wrap up our introduction of Simulated Annealing and then move on to an introduction to Distributed Artificial Intelligence and Swarm/Collective Intelligence. We start with some clarifications about entropy related to why the logarithm is used. We then revisit basic Monte Carlo integration, based on the Law of Large Numbers, and how it motivates the need for a Boltzmann distribution sampler. We then outline the Metropolis–Hastings algorithm (for Markov Chain Monte Carlo (MCMC) methods). This allows us to finally describe the Simulated Annealing (SA) algorithm, which combines the Metropolis algorithm with a temperature annealing schedule. After wrapping up the discussion of SA, we move on to introducing Distributed AI and Swarm Intelligence. This discussion starts with a description of "Ant System (AS)", the prototype version of what eventually became Ant Colony Optimization (ACO). We will pick up with more details of AS/ACO for combinatorial optimization in the next lecture.

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/s/ngsm9w4am1i6224/IEE598-Lecture6A-2022-03-03-Distributed_AI_and_Swarm_Intelligence-Part_1-Ant_Colony_Optimization.pdf?dl=0



Tuesday, March 1, 2022

Lecture 5C (2022-03-01): From MCMC Sampling to Optimization by Simulated Annealing

In this lecture, we continue our march toward the basic algorithm for simulated annealing (a popular optimization metaheuristic). We start with the Metropolis algorithm, which was one of the first Markov Chain Monte Carlo approaches for numerical integration. We then generalize the Metropolis algorithm to the Metropolis–Hastings algorithm, which replaces the Boltzmann distribution with any desired probability distribution to sample from. That gives us an opportunity to talk about Markov Chain Monte Carlo (MCMC) in general and discuss its pros and cons. We then start to introduce the simulated annealing algorithm, which will make use of the Metropolis algorithm. We will finish off simulated annealing (SA) in the next lecture.

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/s/84upgxenjkmxgld/IEE598-Lecture5C-2022-03-01-From_MCMC_Sampling_to_Opt_by_Simulated_Annealing.pdf?dl=0



Popular Posts