1. Probabilistic Programming Languages#
Graphical models#
The original lecture is about thinking of Bayesian models as graphical models. The professor talks a little bit about why they’re a useful way to think about the relationships between random variables, the Markovian property, and independence of nodes.
He cites Machine Learning: A Probabilistic Perspective, which is the old version of Probabilistic Machine Learning: An Introduction mentioned in the other recommended resources section.
Starting with this unit, we will be using Probabilistic Programming Languages (PPLs) to run our models. Check out Chapter 10 of Bayesian Modeling and Computation in Python for a look at what goes into creating a PPL and the components involved.
Missing example#
In the original unit 6 code archive, there’s a file called DeMere.odc
that doesn’t seem to have anything to do with the rest of the unit. You can find a Python implementation here (right-click and Save Link As…).
PyMC#
Some history#
In 2003, Chris Fonnesbeck started writing PyMC (Wiecki et al. [2023]) as a graduate student at the University of Georgia—partly out of frustration with WinBUGS. Eventually, PyMC3 became popular as a Python-based alternative to Stan. Both used implementations of the NUTS algorithm for sampling (Hoffman and Gelman [2011]). PyMC3 was based on Theano, a tensor library for doing machine learning-related math. In 2016, PyMC was sponsored by NumFOCUS, an organization started in part by the authors of NumPy, Matplotlib, and IPython and others to promote open scientific computing.
I started converting the course examples to PyMC3 in 2022. Later that year, PyMC version 4.0 was released. The backend switched to Aesara, a rewrite of Theano based on JAX. Towards the end of 2022, the PyMC devs forked Aesara to create Pytensor over some governance conflicts, and the current major version, PyMC v5, came out.
There have been lots of changes and I’ve had to update the code examples a few times. In fact, there are still some that need to be updated that used to work in PyMC3 or v4. I’m going to copy the practice of using the watermark extension as in the PyMC Example Gallery to show the exact versions each example was last run on.
Installation#
This site is currently using PyMC version 5. Please follow the instructions at the PyMC website when installing. You don’t need nutpie, blackjax, or numpyro for this course’s examples, but you are welcome to try them!
Troubleshooting#
If your model is running very slowly or you’re having other issues getting PyMC to work, there could be many causes.
Missing dependencies#
This is by far the most common problem I’ve come across for ISYE 6420 students. Lots of people will just do pip install pymc
, or use PyCharm, or some other installation method. You might get a warning like this:
WARNING (pytensor.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
WARNING (pytensor.configdefaults): g++ not detected! PyTensor will be unable to compile C-implementations and will default to Python. Performance may be severely degraded. To remove this warning, set PyTensor flags cxx to an empty string.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
I highly recommend installing using a Conda environment as in the official installation instructions. You don’t need to get the full Conda installation, let alone Anaconda. Try Miniconda or Mamba for a more minimal install.
If you’re having a similar issue, please try the following with a fresh environment:
conda create --prefix your-environment-name -c conda-forge -c nodefaults pytensor
conda activate your-environment-name
conda install -c conda-forge pymc
Installing PyTensor first seems to help make sure the correct dependencies are involved. And the extra nodefaults command is just for making extra sure you’re using the Conda Forge channel rather than the defaults. You can then activate the environment and install Py C.
BLAS implementation issues#
BLAS (Basic Linear Algebra Subprograms) is a standard specification for the low-level linear algebra routines. There are many implementations for different operating systems and hardware. NumPy or PyMC may give you a warning about Some people have had success speeding up their PyMC runtimes by changing implementations.
For example, on ARM-based Macs (the ones using M1/M2 processors), you could try installing and switching to Apple’s BLAS implementation as in this PyMC forum post:
conda install "libblas=*=*accelerate"
Note
This may be unnecessary—this BLAS library should now be the default on ARM-based Macs.
Jupyter Notebook/Lab issues#
PyMC works great with Jupyter Notebook or Lab, but sometimes people have installation issues depending on how they’ve installed everything and how they’re opening the notebooks.
If you’re using Jupyter, make sure it’s pointing to the correct kernel. You’ll want to use your pymc environment as the kernel. One way to do this is to install Jupyter on that same environment using
conda install jupyterlab
then launch using
jupyter notebook
or
jupyter lab
from the terminal with that environment activated. Your kernel will show as Python 3 (ipykernel)
, but you can try executing this in a code cell to see which Python installation you’re using (if you’re using Windows, I think the equivalent command is where
:
!which python
/Users/aaron/mambaforge/envs/pymc/bin/python
You can see that for me, that points to the Python installation in my pymc environment folder which is what I want. If you’re using Jupyter through another editor like VSCode, you will need to select the correct environment as your kernel manually.
Using scripts instead of notebooks#
If you prefer to use .py
scripts that will be run directly from the command line, be aware that for multiprocessing to work correctly, you must use the if __name__ == '__main__':
Python idiom.
Other debugging tips#
Try running a minimal model first, like the taste of cheese example to make sure your installation is working okay.
Use Google Colab to see if it’s your model or your installation. You can open any of the notebooks on this site directly in Colab using the pop-up link from the rocket ship icon on the top of the page. Colab will likely be somewhat slower than your machine if you’re using the free version, but it works reliably well for newer versions of PyMC.
Some students will directly translate BUGS models to PyMC and then use the same number of samples, like 100,000 or more. Don’t do that! You need far fewer samples when using the NUTS sampler, which is PyMC’s default. Start with 3,000 or fewer when first testing out your model.