Thursday, April 16, 2026

Lecture 7C (2026-04-16): Recurrent Networks and Temporal Supervision

In this lecture, we finish up our coverage of supervised learning of feedforward multi-layer perceptrons with a discussion of how the Convolutional Neural Network imposes an inductive bias that simplifies training and pays off for images but may not work so well for text strings. We then shift our focus to recurrent networks with temporal supervision, which may help to provide a solution when highly local inductive biases aren't effective (as in for text and time-series analysis). We discuss several coincidence detectors from neuroscience in the context of hearing and vision, and we use them to motivate Time Delay Neural Networks (TDNNs) as our bridge to Recurrent Neural Networks (RNNs). This allows for analogies to be made to Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters. We close by transitioning from a basic output-feedback configuration to a generic RNN with hidden states but effectively no "layers." We will pick up next time with backpropagation-through-time (BPTT), Long Short Term Memory (LSTM), reservoir computing (Echo State Networks, ESN's), and an introduction to reinforcement learning. Interactive demonstration widgets related to this lecture can be found at:

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/scl/fi/pi8vxjrn6gbftpdab977w/IEE598-Lecture7C-2026-04-16-Recurrent_Networks_and_Temporal_Supervision-Notes.pdf?rlkey=1xltyg1ttcpyqhvtdquczxjr5&dl=0



Tuesday, April 14, 2026

Lecture 7B (2026-04-14): Feeding Forward from Neurons to Networks (SLP, RBFNN, MLP, and CNN)

In this lecture, we move from the basics of learning foundations from the last lecture into models of neurons that can be combined to form machine learning tools. We start with the single-layer perceptron (SLP), explain where the term "weights" comes, and describe how it can linearly separate a space. We then introduce a hidden layer of receptive field units (RFU's) and discuss how Radial Basis Function Neural Networks use Gaussian or Logistic RBF's as nonlinear projections into high-dimensional space that Cover's theorem suggests should be more likely to e linearly separable. After demonstrating how RBFNN's work, we then introduce Cybenko's Universal Approximation Theorem (UAT) and use it to motivate looking for other (and deeper) latent structures. That leads us to the Multi-Layer Perceptron (MLP), backpropagation, and the Convolutional Neural Network.

Interactive widgets referenced in this lecture include:

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/scl/fi/t2aoepucn0swlkvisococ/IEE598-Lecture7B-2026-04-14-Feeding_Forward_from_Neurons_to_Networks-Notes.pdf?rlkey=s5pr1zdrnup2ca1nthf7zxp3n&dl=0



Lecture 7A (2026-04-09): Neural Foundations of Learning

In this lecture, we prepare to discuss artificial and spiking neural networks -- bio-inspired information processing mechanisms inspired by the central nervous system and models of learning in psychology. We open with a discussion of the relationship between learning, memory, and neuroplasticity and then introduce a canonical model of a neuron that is the basis of the mechanisms thought to underly neuroplasticity. We discuss the different ways in which neuroplasticity supports working, short-term, and long-term memory. We introduce Hebbian learning (and briefly mention spike-timing-dependent plasticity, STDP) as a foundational learning paradigm that, when combined with neuromodluation and specialized circuits, can implement all forms of learning described in the lecture. Those forms of learning include non-associative learning (habituation and sensitization), associative learning (classical and operant conditioning), and latent learning. We map each of those to machine learing paradigms including unsupervised learning, self-supervised learning/pre-training, reinforcement learning, and supervised learning. In the next lecture, we will directly model the canonical neuron with a signle-layer perceptron and start to build statistical models based on this artificial neuron model. Interactive demonstrations mentioned in this video:

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/scl/fi/x4t0y6q9rblrn78o8ns2r/IEE598-Lecture7A-2026-04-09-Neural_Foundations_of_Learning-Notes.pdf?rlkey=im6unlrptbfppqeds2y9gpga7&dl=0



Sunday, April 5, 2026

Lecture 6B (2026-04-07): Bacterial Foraging Optimization and Ant Colony Optimization

Closing out the Swarm Intelligence unit, this lecture pivots from Particle Swarm Optimization (PSO) to two examples of stigmergic swarm optimization – Bacterial Foraging Optimization (BFO) and Ant Colony Optimization (ACO). Stigmergy is the act of indirection through modifications of the environment, as in leaving chemical trails or depositing chemical gradients, as opposed to direct communication between one individual and another. BFO solves continuous optimization problems similar to PSO but uses attractants and repellants to modify the environment as opposed to directly informing others of information about discovered solutions. The repellants in BFO along with its reproduction and elimination–dispersal phases help to ensure it searches globally over a space as opposed to the more concentrated search of PSO. ACO also uses chemical coordination, but it is developed for combinatorial optimization problems. Although ACO was originally developed for the Traveling Salesman Problem (TSP), we discuss ACO first in a simpler layered model that better matches the foraging paths of real ants before briefly discussing the application to the TSP. We close with a brief mention of more complex recruitment dynamics in real ants, where trail laying plus noise can provide the ability to track changing feeder distributions and how one-on-one recruitment by some ants and bees can lead to different distributions of recruits across options (similar to changing the temperature in a softmax).

Interactive demonstrations referenced in this lecture can be found at:

Whiteboard notes for this lecture can be found at:
https://www.dropbox.com/scl/fi/fqm4jcfr1mkxsnz8ng61r/IEE598-Lecture6B-2026-04-07-Bacterial_Foraging_Optimization_and_Ant_Colony_Optimization-Notes.pdf?rlkey=q4omc6oyot9vrq8nnq3etx6k4&dl=0



Thursday, April 2, 2026

Lecture 5E/6A (2026-04-04): Parallel Tempering and Swarm Intelligence through Social Cohesion (Particle Swarm Optimization)

In this lecture, we finish our unit on physics-inspired ML and optimization by covering Parallel Tempering (PT), which combines multiple, parallel Metropolis–Hastings MCMC samplers each with different temperatures (rather than using an annealing schedule, as in Simulated Annealing (SA)). We then pivot toward motivating why certain problem sets, like optimizing high-dimensional weights of neural networks, may not be well suited by the optimization metaheuristics discussed so far in the course. We use this as an opportunity to introduce Swarm Intelligence and the Particle Swarm Optimization (PSO) algorithm, which is particularly good at finding and exploring local optima in spaces with many similarly performing local optima. We explore how PSO was inspired by the Boids Model from Craig Reynolds (in computer graphics) and how it overlaps with the Vicsek model (from statistical physics). We also show how PSO really depends on is social information but, under the influence of social information, tends to very quickly purge the diversity in its solution candidates. Online interactive demonstration modules associated with this lecture can be found at:

Whiteboard notes for this lecture can be found at: https://www.dropbox.com/scl/fi/7jwuytadieywwilqazjq5/IEE598-Lecture5E_6A-2026-04-02-Parallel_Tempering_and_Particle_Swarm_Optimization-Notes.pdf?rlkey=p1pr7cs241okovkgjnevvhdp5&dl=0



Tuesday, March 31, 2026

Lecture 5D (2026-03-31): Metropolis–Hastings Markov Chain Monte Carlo and Simulated Annealing/Parallel Tempering

In this lecture, we start with a reminder that the Boltzmann–Gibbs distribution is the maximal entropy (MaxEnt) distribution of physical microstates when the average energy is fixed at a temperature at thermal equilibrium. We then move toward motivations where it would be useful to sample microstates from such a distribution. First, we introduce Monte Carlo methods for parameter estimation, and we pivot toward applications of Monte Carlo sampling for numerical integration. This leads us back to physics applications where integration using the Boltzmann–Gibbs is much more practical. This gives the opportunity to introduce Metropolis–Hastings Markov Chain Monte Carlo (MCMC) sampling, which allows for sampling from the Boltzmann–Gibbs and more. After discussing connections to importance sampling (from stochastic simulation) and Bayesian/MCMC statistics, we introduce Simulated Annealing, which combines Metropolis–Hastings sampling with an annealing schedule for temperature. We close with a very brief introduction to Parallel Tempering, which swaps out the annealing schedule for parallel MCMC samplers that periodically swap states based on their relative energies. We will pick up with Parallel Tempering in the next lecture.

On-line simulations referenced in this lecture can be found at:

Whiteboard notes for this lecture can be found at:
https://www.dropbox.com/scl/fi/s5dcgqrvm4qzz4y0fs64a/IEE598-Lecture5D-2026-03-31-Markov_Chain_Monte_Carlo_Metropolis_and_Simulated_Annealing_Parallel_Tempering-Notes.pdf?rlkey=v2m33lhh7sjhwogffotbyq3k7&dl=0



Thursday, March 26, 2026

Lecture 5C (2026-03-26): Boltzmann–Gibbs and other Maximum Entropy Distributions

In this lecture, we start by reviewing the formal definition of Shannon entropy/information in both is discrete and continuous (differential entropy) forms. We then transition to discussing several different MaxEnt distributions and the constraints that they are associated with. Ultimately, this brings us to the Boltzmann–GIbbs distribution and several applications of it. Throughout the lecture, different interactive demonstrations were used (and can be accessed directly at the links below).

Demonstrations referenced in this lecture can be found at:

Softmax Visualizer: https://tpavlic.github.io/asu-bioinspired-ai-and-optimization/softmax/softmax_temperature_explorer.html

MaxEnt Explorer (SDM and NLP): https://tpavlic.github.io/asu-bioinspired-ai-and-optimization/maxent/maxent_demo.html

Boltzmann Distribution via Random Exchanges of Conserved Quantity: https://tpavlic.github.io/asu-b]]ioinspired-ai-and-optimization/boltzmann_maxent/boltzmann_maxent_random_exchange.html

Beta Distribution Explorer: https://tpavlic.github.io/asu-bioinspired-ai-and-optimization/boltzmann_maxent/beta_spacings.html

Whiteboard notes for this lecture can be found at:
https://www.dropbox.com/scl/fi/zwdrab929yg47jm67vope/IEE598-Lecture5C-2026-03-26-Boltzmann-Gibbs_and_other_MaxEnt_Distributions-Notes.pdf?rlkey=3zka62o08gnw8z38r7lknjsqf&dl=0



Popular Posts