The Book of Why

Michael Connolly
Oct 11, 2025
3 min read

Updated: Oct 31, 2025

The Book of Why by Judea Pearl with Dana MacKenzie, Basic Books, 2018.

Correlation is Not Causation

It is widely known that correlation is not causation. Correlation comes from observation. Much effort has gone into trying to figure some way to discover causation from analyzing observational data.

Interventions and Counterfactuals

The author says that in order to understand causation, observation and analyzing observational data is not enough. It is necessary to take two further steps: (a) intervention (doing something to change the situation, and then analyzing the effects, and (b) imagining counterfactuals; thinking about what would have happened if things had happened in a way different than what actually happened.

Conditional Probabilities

Correlation is less than conditional probability. Conditional probabilities depend on direction: (P(A|B) not = P(B|A).

Bayesian Statistics

In Bayesian statistics, named after Thomas Bayes, one studies conditional probabilities. For example, what is the probability of B happening, if you know that event A has happened. This is written mathematically as P(A|B). A Bayesian diagram is a network diagram of events connected with lines with associated conditional probabilities. The lines do not have directions. You can see that A and B are correlated, but you cannot understand the direction of causality: does A cause B or does B cause A?

But in order to differentiate causality from correlation you must study P(A|do(b)), that is, the probability that A will happen after you do B (not merely observe B). Doing B is called an intervention. By intervention, we determine the existence and direction of causality.

Causal Model

In order to mathematically analyze causality one needs to define a causal model. A causal model is a Bayesian network of event nodes with arrows showing the direction and probability of causality. In a Bayesian network, the arrows from one node to another represent conditional probability. In a causal diagram, there are arrows only for causal relations. Correlation is not enough. There are no lines connecting nodes that do not have a causal relationship, even if they are correlated.

Confounding Factors

Pearl discusses confounding factors. Consider this example:

Independent Variable: Amount of Exercise.
Dependent variable: Weight Gain.
Confounding Factor: Calorie Intake.

Variation in Calorie Intake will make it difficult to understand the causal relation between Amount of Exercise and Weight Gain. Pearl discusses early efforts to rigorously define what a confounding factor was. Before the Causal Revolution, people could not give a clear definition of what confounding meant. Pearl shows that defining confounding using only correlations is inferior. Pearl's definition of a confounding variable: Any variable causing P(Y|X) not to be equal to P(Y|do(X)). Pearl describes attempts to "control for” confounding variables. Control for means something like minimize their distortion of a causal relation.

Counterfactuals

The authors consider the issue of counterfactuals. They ask what would be the salary of a particular woman, if she had an extra year of education. We are asking regarding a situation that never happened. This means we are asking a counterfactual question. It is not possible for the intervention (experiment) to handle this problem, because we cannot have two worlds, one where she had the extra year, and one where she did not. If Salary is a function of number of years of Education and number of years of experience, traditional multiple regression analysis can estimate a salary for a person whose salary is not known, but whose Education and experience are known. The function must first be determined from a set of people where education, experience and salary are known.

Multiple Linear Regression

Multiple linear regression can give estimates for both situations: where she did and did not have the extra year of education, and we can compute the difference in salary between the two, to see whether an extra year of education helps her salary. Please note that the predicted salary for the situation where she did not have the extra year of education may not agree with her actual salary. Pearl argues that we should use our real-world knowledge that if a person has more years of education, then they are more likely to have fewer years of work experience. This is true because entering the work force later reduces your years of experience. Using this idea by means of causal models, enables us to give a more accurate estimate of the effect on her salary of the extra year of education, than is given by multiple linear regression. Using his sample data, Pearl shows that having the extra year of education decreases the years of experience, and so would reduce her salary, contrary to the estimate from multiple linear regression that more education increases ones salary.

The Book of Why

Recent Posts

Comments