In this lecture, we continue to discuss nature-inspired algorithms that come from the non-biological, physics realm. Maximum entropy (MaxEnt) is portrayed through a number of more concrete examples as a modeling framework that generates a probability distribution that is constrained by what information is formally known and is otherwise maximal in its degrees of freedom (thereby not introducing additional biases). It is applied to a simple Natural Language Processing (NLP) example, and then we pivot to talking about MaxEnt in terms of physical systems and how the Boltzmann–Gibbs distribution results. We then use the Boltzmann–Gibbs distribution as a motivation for the Metropolis (and then Metropolis–Hastings) Markov Chain Monte Carlo (MCMC) algorithm and predict how we will change the MH algorithm into the Simulated Annealing (SA) algorithm from Kirkpatrick et al. (1983). We will cover the SA more concretely in the next lecture.
Whiteboard notes for this lecture can be found at: