Thursday, April 17, 2025

Lecture 7D (2025-04-17): Reinforcement Learning – Active Learning in Rewarding Environments

In this lecture, we introduce reinforcement learning (RL) with motivations from animal behavior and connections to optimization metaheurisitcs such as Ant Colony Optimization (ACO) and Simulated Annealing (SA). We start by returning to a simple model of pheromone-trail-based foraging by ants (reminiscent of Ant Colony Optimization (ACO)) and formalize the components of the ant action in terms of quality tables for (state, action) pairs, as would be used in RL. We then introduce the quality Q(s,a) function and Q-learning, including two different methods of exploration (epsilon-greedy and softmax) with connections to how different species of ants respond to pheromones. We discuss Deep Q Networks (DQN's), as a connection to neural networks, and then move on to motivating an interpretation of the discount factor using Charnov's Marginal Value Theorem (MVT) of optimal foraging theory (OFT). We close with a discussion of the Matching Law from psychology and how a group of RL agents will converge to a social version of the Matching Law, the Ideal Free Distribution (IFD). Next time, we will cover unsupervised and self-supervised learning, which are approaches where the learning happens even without reward.

Whiteboard notes for this lecture can be found at:
https://www.dropbox.com/scl/fi/gyux79ukkcs0n7buizfr1/IEE598-Lecture7D-2025-04-17-Reinforcement_Learning-Active_Learning_in_Rewarding_Environments-Notes.pdf?rlkey=ix5qf4a5yz97ppsx97h6sphao&dl=0



No comments:

Post a Comment

Popular Posts