Field notes are notes I leave myself as I go through my day to day work. The hope is that other people will also find these notes useful. Note that these notes are unfiltered and unverified.
Causality
Correlation does not imply causation, but correlation does imply there is (common or actual) causation. Bias in models is a result of:
- Conditioning on a common effect (only vieweing data for visitors of a site), or
- Not conditionin on a common cause (viewing data for situtations where the confounder is present).
Uncertainty is a result of:
- Sampling uncertainty
- Modeling uncertainty
- Structural uncertainty
Correlation and causation
- If you want to select a population that can be observed to over-index for a trait, then selecting based on a correlated trait is okay
- If your strategy involves interventions with one of the correlated variables to change the other, then correlation alone is not sufficient. You need causation
- Correlation implies causation (and vice versa) whenever there is no bias
- If you need a causal result, and all you have is observational data, it’s okay to act on correlation alone if you’re sure there’s no bias. That is, estimation problems aside, you’re sure that there’s no confounding, and no selection bias.
Back door criterion
- This is the property for identifying the right variables to control for.
- You need to control for variables that:
- IS NOT an effect of the cause we’re interested in, AND
- IS on the confounding path
Data Processing Inequality
- Words daily to represent the whole reality
- There will never be a feature engineering or change in representation that will increase the amount of information in data
- Data processing can only reduce the amount of information in a model
- IC Algorithm - used for finding causal relationships according to Pearl
Applied to a driver earnings model
We want to find the earnings values that would reduce churn. If we simply correlated, for example, decrease in earnings volatility and churn, we might see a correlation, but we are unsure as to whether this will actually cause any changes in churn if we reduce volatility. There may simply be a common cause. That common cause may be casual drivers have both earnings volatility and churn. So reducing earnings volatility. If the entire effect on churn is because of the cohort of that driver, then reducing earnings volatility would not lead to any changes in churn.