Welcome to my field notes!

Field notes are notes I leave myself as I go through my day to day work. The hope is that other people will also find these notes useful. Note that these notes are unfiltered and unverified.

Model Intepretability


TJ Palanca


August 7, 2022

A discussion of black box models vs highly interpretable causal models, as well as the different conclusions and processes that underpin those models.

Limits of Machine Learning

Machine learning can be extremely useful, but only in the following (arguably limited) cases:

  1. All the information available is present in the training data, and the model is not expected to extrapolate beyond the bounds of the training data (no surprises). This naturally means that in order to have sufficient exploration of the input space you will need lots of training data. This also means why Go and self driving are key aspects where machine learning is excelling, because all of the information relevant is collected.
    • i.e. in a cat vs dog sampler, everything that’s needed to determine whether a cat and a dog is in the picture is contained in the pixel data of the picture)
    • if you train on economic data only during a boom period, you should be hesitant to make predictions during the time of recession, sometimes however this can be difficult because it’s not possible to envision all the cases in which this data may be generated under
  2. There is a constant feedback loop and repeatable, learnable processes. You can afford to make mistakes. This means that you are automating a process that takes place very often and is therefore very expensive for humans to advance towards.
    • if there is only one shot do make a decision and get feedback, then machine learning is ill-suited
    • to make decisions as to what products to launch, or what markets to launch in, machine learning
  3. No human intervention. There are ways to incorporate human input into the solutions via loss functions and whatnot, but if you have a system where the machine must learn from both increasing data and human input, it can be difficult to cope.

In other domains, modelling and/or theory can assist with extrapolating outside of the training space, and Bayesian analysis can help with introducing uncertainty.

Predictions for the future

The data collection power of our technology continues to grow, so I think ML will become more and more relevant to more aspects of the space. But there is danger in being premature and “ignoring” statistical modelling when ML is not yet applicable, because it can lead to unmet expectations and loss of trust in the profession as a whole.