Probabilistic Programming Languages

1. Probabilistic Programming Languages#

Graphical models#

The original lecture is about Bayesian models as graphical models. The professor talks a little bit about why they’re a useful way to think about the relationships between random variables, the Markovian property, and independence of nodes. A related concept is plate notation.

He cites Machine Learning: A Probabilistic Perspective, which is the old version of Probabilistic Machine Learning: An Introduction mentioned in the other recommended resources section.

Starting with this unit, we will be using Probabilistic Programming Languages (PPLs) to run our models. Check out Chapter 10 of Bayesian Modeling and Computation in Python for a look at what goes into creating a PPL and the components involved.

Missing example#

In the original unit 6 code archive, there’s a file called DeMere.odc that doesn’t seem to have anything to do with the rest of the unit. You can find a Python implementation here (right-click and Save Link As…).

BUGS#

The course lecture examples use OpenBUGS from this point forward. I’ve mirrored the last version of OpenBUGS on this site:

Windows, version 3.2.3
Linux, version 3.2.3

I don’t recommend attempting to get BUGS working on a Mac.

If you do want to use BUGS-style models, I recommend NIMBLE or MultiBUGS. These can both handle most lecture models without modification. However, 95% of students successfully use PyMC to complete our course at this point.

PyMC#

Some history#

In 2003, Chris Fonnesbeck started writing PyMC (Wiecki et al. [2023]) as a graduate student at the University of Georgia—partly out of frustration with WinBUGS. Eventually, PyMC3 became popular as a Python-based alternative to Stan. Both used implementations of the NUTS algorithm for sampling (Hoffman and Gelman [2011]). PyMC3 was based on Theano, a tensor library for doing machine learning-related math. In 2016, PyMC was sponsored by NumFOCUS, an organization started in part by the authors of NumPy, Matplotlib, and IPython and others to promote open scientific computing.

I started converting the course examples to PyMC3 in 2022. Later that year, PyMC version 4.0 was released. The backend switched to Aesara, a rewrite of Theano based on JAX. Towards the end of 2022, the PyMC devs forked Aesara to create Pytensor over some governance conflicts, and the current major version, PyMC v5, came out.

There have been lots of changes and I’ve had to update the code examples a few times. In fact, there are still some that need to be updated that used to work in PyMC3 or v4. I’m going to copy the practice of using the watermark extension as in the PyMC Example Gallery to show the exact versions each example was last run on.

Installation#

This site is currently using PyMC version 5. The installation instructions below are based on the ones at the PyMC website. I’m going to go into more detail here, though.

Virtual environment#

I highly recommend using Miniforge and Mamba to manage the PyMC environment. Students seem to have issues with Anaconda every semester. If you have Anaconda, Conda, or Mamba already, please make sure you only use the conda-forge channel as a source.

Note

For MacOS, make sure you have the Xcode command line tools installed with xcode-select --install in your terminal.

If you don’t have Conda already, install miniforge. The Windows instructions are here.
Confirm that you aren’t using the Anaconda default channels.
Install pymc to a new environment. If you’re using miniforge you can leave off the -c conda-forge part, because it will use that channel by default. If you aren’t using miniforge, definitely leave that in there and also replace mamba with conda.

mamba create -c conda-forge -n your_pymc_env_name "pymc>=5"
mamba activate your_pymc_env_name

Optional: Install these useful packages to your new environment.

For using Jupyter Lab or Jupyter notebooks:

mamba install -c conda-forge jupyterlab ipywidgets 

For using nutpie, a much faster implementation of the NUTS sampling algorithm:

mamba install -c conda-forge numba nutpie 

Once you’ve imported PyMC, if you’re getting a warning on Windows about g++ not being available, run:

mamba install -c conda-forge m2w64-toolchain

For new or specialized PyMC features, try the PyMC Extras package.

mamba install -c conda-forge pymc-extras

Troubleshooting#

If your model is running very slowly or you’re having other issues getting PyMC to work, there could be many causes. First, make sure you have followed the installation instructions above. Second, make sure you are actually using the newly created environment!

PyMC works great with Jupyter Notebook or Lab, but sometimes people have installation issues depending on how they’ve installed everything and how they’re opening the notebooks.

If you’re using Jupyter, make sure it’s pointing to the correct kernel. You’ll want to use your new pymc environment as the kernel.

To double-check that you’re using the right environment, try the following from your terminal with that environment activated. If you’re using Windows, I think the equivalent command is where rather than which.

mamba activate your_pymc_env_name
which python

The output should be something like /Users/aaron/mambaforge/envs/pymc/bin/python. You can see from the folder structure that I’m using an environment named “pymc.”

Now if you start Jupyter Lab or Notebook from this environment, your kernel will show as Python 3 (ipykernel). You can confirm that you’re using the right environment by executing the same thing in a code cell, prepended by an exclamation point.

!which python

/Users/aaron/mambaforge/envs/pymc/bin/python

You can see that for me, that points to the Python installation in my pymc environment folder which is what I want. If you’re using Jupyter through another editor like VSCode, you will need to select the correct environment as your kernel manually.

Using scripts instead of notebooks#

If you prefer to use .py scripts that will be run directly from the command line, be aware that for multiprocessing to work correctly, you must use the if __name__ == '__main__': Python idiom.

Other debugging tips#

Try running a minimal model first, like the taste of cheese example to make sure your installation is working okay.
Use Google Colab to see if it’s your model or your installation. You can open any of the notebooks on this site directly in Colab using the pop-up link from the rocket ship icon on the top of the page. Colab will likely be somewhat slower than your machine if you’re using the free version, but it works reliably well for newer versions of PyMC.
In the lecture examples, the professor often uses 100,000 or more samples. Don’t do that in PyMC! The NUTS sampler, which is PyMC’s default, can explore the posterior much more efficiently. Start with 1,000 or fewer when first testing out your model, then increase to a comfortable number.