Cocalc 05 Multivariate Gaussians Ipynb

Leo Migdal

-Dec 4, 2025, 3:52 AM

📚 The CoCalc Library - books, templates and other resources This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub. The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book! < In Depth: k-Means Clustering | Contents | In-Depth: Kernel Density Estimation > The k-means clustering model explored in the previous section is simple and relatively easy to understand, but its simplicity leads to practical challenges in its application.

In particular, the non-probabilistic nature of k-means and its use of simple distance-from-cluster-center to assign cluster membership leads to poor performance for many real-world situations. In this section we will take a look at Gaussian mixture models (GMMs), which can be viewed as an extension of the ideas behind k-means, but can also be a powerful tool for estimation... Use of the term "non-parametric" in the context of Bayesian analysis is something of a misnomer. This is because the first and fundamental step in Bayesian modeling is to specify a full probability model for the problem at hand. It is rather difficult to explicitly state a full probability model without the use of probability functions, which are parametric. Bayesian non-parametric methods do not imply that there are no parameters, but rather that the number of parameters grows with the size of the dataset.

In fact, Bayesian non-parametric models are infinitely parametric. What if we chose to use Gaussian distributions to model our data? There would not seem to be an advantage to doing this, because normal distributions are not particularly flexible distributions in and of themselves. However, adopting a set of Gaussians (a multivariate normal vector) confers a number of advantages. First, the marginal distribution of any subset of elements from a multivariate normal distribution is also normal: Also, conditionals distributions of a subset of a multivariate normal distribution (conditional on the remaining elements) are normal too:

A Gaussian process generalizes the multivariate normal to infinite dimension. It is defined as an infinite collection of random variables, any finite subset of which have a Gaussian distribution. Thus, the marginalization property is explicit in its definition. Another way of thinking about an infinite vector is as a function. When we write a function that takes continuous values as inputs, we are essentially specifying an infinte vector that only returns values (indexed by the inputs) when the function is called upon to do... By the same token, this notion of an infinite-dimensional Gaussian as a function allows us to work with them computationally: we are never required to store all the elements of the Gaussian process, only...

The last chapter ended by discussing some of the drawbacks of the Discrete Bayesian filter. For many tracking and filtering problems our desire is to have a filter that is unimodal and continuous. That is, we want to model our system using floating point math (continuous) and to have only one belief represented (unimodal). For example, we want to say an aircraft is at (12.34, -95.54, 2389.5) where that is latitude, longitude, and altitude. We do not want our filter to tell us "it might be at (1.65, -78.01, 2100.45) or it might be at (34.36, -98.23, 2543.79)." That doesn't match our physical intuition of how the world... So we desire a unimodal, continuous way to represent probabilities that models how the real world works, and that is very computationally efficient to calculate.

As you might guess from the chapter name, Gaussian distributions provide all of these features. To understand Gaussians we first need to understand a few simple mathematical computations. We start with a random variable x. A random variable is a variable whose value depends on some random process. If you flip a coin, you could have a variable ccc, and assign it the value 1 for heads, and 0 for tails. That a random value.

It can be the height of the students in a class. That may not seem random to you, but chances are you cannot predict the height of the student Reem Nassar because her height is not deterministically determined. For a specific classroom perhaps the heights are Another example of a random variable would be the result of rolling a die. A less obvious example would be the position of an aircraft - the aircraft does deterministically respond to the control inputs, but it is also buffeted by random winds and travels through randomly distributed... The coin toss and die roll are examples of discrete random variables.

That is, the outcome of any given event comes from a discrete set of values. The roll of a six sided die can never produce a value of 7 or 3.24, for example. In contrast, the student heights are continuous; they can take on any value within biological limits. For example, heights of 1.7, 1.71, 1.711, 1.7111, 1.71111,.... are all possible. Do not confuse the measurement of the random variable with the actual value.

If we can only measure the height of a person to 0.1 meters we would only record values from 0.1, 0.2, 0.3...2.7, yielding 27 discrete choices. Nonetheless a person's height can vary between any arbitrary real value between those ranges, and so height is a continuous random variable. The techniques in the last chapter are very powerful, but they only work with one variable or dimension. Gaussians represent a mean and variance that are scalars - real numbers. They provide no way to represent multidimensional data, such as the position of a dog in a field. You may retort that you could use two Kalman filters from the last chapter.

One would track the x coordinate and the other the y coordinate. That does work, but suppose we want to track position, velocity, acceleration, and attitude. These values are related to each other, and as we learned in the g-h chapter we should never throw away information. Through one key insight we will achieve markedly better filter performance than was possible with the equations from the last chapter. In this chapter I will introduce you to multivariate Gaussians - Gaussians for more than one variable, and the key insight I mention above. Then, in the next chapter we will use the math from this chapter to write a complete filter in just a few lines of code.

In the last two chapters we used Gaussians for a scalar (one dimensional) variable, expressed as N(μ,σ2)\mathcal{N}(\mu, \sigma^2)N(μ,σ2). A more formal term for this is univariate normal, where univariate means 'one variable'. The probability distribution of the Gaussian is known as the univariate normal distribution What might a multivariate normal distribution be? Multivariate means multiple variables. Our goal is to be able to represent a normal distribution across multiple dimensions.

I don't necessarily mean spatial dimensions - it could be position, velocity, and acceleration. Consider a two dimensional case. Let's say we believe that x=2x = 2x=2 and y=17y = 17y=17. This might be the x and y coordinates for the position of our dog, it might be the position and velocity of our dog on the x-axis, or the temperature and wind speed at... It doesn't really matter. We can see that for NNN dimensions, we need NNN means, which we will arrange in a column matrix (vector) like so:

Therefore for this example we would have There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.

Cocalc 05 Multivariate Gaussians Ipynb

People Also Search

📚 The CoCalc Library - Books, Templates And Other Resources

In Particular, The Non-probabilistic Nature Of K-means And Its Use

In Fact, Bayesian Non-parametric Models Are Infinitely Parametric. What If

A Gaussian Process Generalizes The Multivariate Normal To Infinite Dimension.

The Last Chapter Ended By Discussing Some Of The Drawbacks