Report by Zaiga Thomann with an introduction by Lizzie Silver
Zaiga Thomann recently completed a research internship at Silverpond. We asked her to showcase the use of causal models to solve an applied problem. Zaiga used causal models to address algorithmic fairness: how do you ensure that your decision-making algorithm is fair?
One way to define “fairness” is via counterfactual questions, such as: “What decision would have been made, if this loan applicant had been male instead of female?” A fair algorithm should make the same decision in the existing situation as it would make in the counterfactual situation where the applicant is male instead of female, or white instead of black.
Causal models are powerful tools for representing causal relationships. They allow us to predict the effects of interventions, and they allow us to answer counterfactual questions. We can then use the counterfactual statements to measure fairness.
Examples of unfair algorithms
Algorithmic fairness has been in the news frequently over the past five years. Embarrassingly unfair algorithms include the COMPAS recidivism prediction algorithm, which tended to mislabel white offenders as low risk and black offenders as high risk; and the Apple credit card, which offered dramatically lower credit limits to women. Citizens and consumers are holding governments and corporations accountable for algorithmic bias.
COMPAS does not use race as an input, and the Apple credit limit algorithm does not use gender as an input. The US Equal Credit Opportunity Act prohibits lenders from discriminating on the basis of gender, and lenders have interpreted this to mean that they can’t use gender as an input to creditworthiness decisions. But the algorithm manages to discern gender via other factors. For example, if it knows that an applicant has student debt from Wellesley College, it can infer that the applicant is probably a woman. So-called “fairness through unawareness” does not work. We actually need to know gender, if we want to make sure that the algorithm would make the same decision in the counterfactual gender-flipped situation.
There are (at least) three things we might want an algorithmic fairness toolkit to do:
- Create a decision algorithm that is fair from the start
- Check whether an existing algorithm is fair or not / measure how unfair it is
- Adjust an existing algorithm to make it fair
Zaiga focused on number 3: make an algorithm fair, because this is most useful to companies who already have useful decision algorithms. She applied a debiasing method introduced by Wang, Sridhar & Blei (2019).
The method requires two things:
- The existing decision algorithm
- A causal model of the relationships between features in the algorithm’s training data.
Silverpond doesn’t have any decision algorithms about individual people, so Zaiga used a public dataset to train a toy algorithm. The strategy was as follows:
- Train a decision algorithm that is known to be biased
- Apply Wang, Sridhar & Blei’s debiasing algorithm, then
- Test whether the debiasing worked.
The debiasing only works if you have the right causal model. In addition to the real dataset, Zaiga decided to create a synthetic dataset for which she knew the exact structure of the causal model. That way she could tell that the debiasing algorithm worked with the correct causal model, before testing it on real data where there was some uncertainty about the causal relationships.
In fact, Zaiga found that the debiasing algorithm worked perfectly on the synthetic data with the known, correct causal model. However, it produced counterintuitive results on the real data where the causal model was uncertain. This shows that finding the correct causal model is crucial for producing fair algorithms.
Silverpond has significant expertise in learning causal models (Lizzie Silver did her PhD on causal discovery). We look forward to applying these capabilities to algorithmic fairness.