The fear of reproducing society's prejudices through computer algorithms is being hotly discussed in both academic publications and the popular press. Just a few of the publications warning about bias in predictive analytics include the New York Times, the Guardian, the Harvard Business Review, and particularly a famous and hotly contested article by Propublica on predictions of recidivism among criminal defendants.
In this Q&A, Zoomdata asks Andy Oram, co-author of its recent book collaboration withO'Reilly Media, Delivering Embedded Analytics in Modern Applications, what the fuss is about and what data scientists currently know about bias in algorithms.
Why are algorithms and predictive analytics used in sensitive decisions affecting human lives in the first place?
Many organizations in both commerce and government have come to appreciate the almost magical-seeming achievements of data analysis. Google's Alphabet corporation would not trade at record highs without its comprehensive analysis of user input, nor would Walmart earn nearly $500,000,000 a year without analyzing customer behavior. On the government side, data analysis contributes to many critical initiatives, such as tracing the spread of the Zika virus. It is natural, therefore, that these organizations start to trust data to help them make better decisions about matters that directly affect ordinary people: who gets a loan, who is accepted into college, even who goes on a date.
Insofar as these predictions are accurate, they can be good for all of us. If irresponsible people are denied loans, the rest of us get loans at a better rate because our interest payments don't have to cover their defaults. And nobody wants a violent criminal who is likely to re-offend on the loose. Still, when analytics enter areas as critical to human life as jail sentencing or how we fight wars, it is equally natural for people to ask questions.
We must remember that bias is a part of life. The first instance of bias I can find comes from the Biblical book of Samuel. In I Samuel 9, the prophet Samuel chooses Saul as a king largely on the basis of his height--a poor choice as it turns out. Later, in I Samuel 16, the prophet seeks out a mature man for his next king and overlooks the young David until told prophetically to choose him. We don't have that source of correction in modern times.
Decisions about housing, loans, education, and other key benefits have traditionally been made by people with bias. Many women and people of color could report being rejected in favor of less qualified white men. We may well hope that predictive analytics will be more objective than humans in such decisions.
As we shall see, though, many forces twist predictive analytics into reproducing human biases.
How do you define bias?
This is a surprisingly difficult and subtle question. As I explained in the previous section, sometimes you want to discriminate, as when you deny someone a loan because she has been shown to be financially irresponsible. But suppose low-income people are less likely to succeed in college? There may be a host of social causes for this (poorer K-12 schooling, fewer extracurricular activities, or simply lack of funds for tuition and living expenses), but if a college prefers middle-income and high-income applicants, is that bias? Most colleges have answered that question, and take strong measures to give low-income applicants a chance.
But people can have legitimate differences of opinion about whether a particular decision is biased. So the most acceptable criterion for judging bias in analytics is to check legally "protected classes". Most jurisdictions define certain traits such as race, gender, and religion as protected classes and prohibit organizations from discriminating against people who fall into those classes. In many jurisdictions, these classes are joined by others, such as disability or sexual preference. To generalize, a reasonable policy is that any discrimination that would be illegal or unethical when practiced by a person should be equally prohibited when practiced by a computer program.
What is the evidence that predictive analytics are sometimes biased?
It's almost inconceivable that a programmer would write code saying if applicant==woman then score -= 10. Bias is probably never explicit, but has to be deduced by running statistics on the results of the program over a large number of people. This is the irony of modern life: analytics can be used to police analytics.
One early research project was carried out by Latanya Sweeney, a privacy expert who divides her time between Harvard and MIT. In 2013, she found that a search on Google for a name commonly associated with African Americans (such as her own name) tended to turn up an ad offering arrest records for that person. Searches for white-sounding names are much less likely to turn up such ads. The ad does not in any sense mean that an arrest record exists. But any human resources manager, landlord, or other person doing a search on a potential candidate could easily be frightened by the mere appearance of such an ad. We'll look later at why this odd result should occur.
Many other muckrakers have found bias since then through "black box" testing (no pun intended) that tallies up how a particular analytics package treats people of different classes. In some cases--as with Sweeney's study--the researcher has generated large numbers of queries in order to get statistically significant results. In other cases, researchers have obtained access to actual results of analytics out in the field.
One study found that people in low-income neighborhoods pay more automobile insurance than residents of high-income neighborhoods while another found that people of color pay more than whites, even though in neither case do their accident rates justify the discrimination. Online ads for managerial and high-paying jobs are more likely to be shown to men than to women with equivalent qualifications.
Who is researching this question?
Because strong statistical skills are required to prove bias, a number of data scientists have taken on the task; notable among them is Cathy O'Neil, a "quant" who worked in a hedge funds firm and later wrote the book Weapons of Math Destruction. As a mathematician and computer scientist, O'Neil is very persuasive when arguing that our current data and analytics are often completely inadequate for making the decisions we ask them to make, such as to judge the performance of a teacher by the small population of students in her classroom.
Other people investigating predictive analytics, like Sweeney, are known for their prior work on privacy. Cynthia Dwork, for instance, who was interviewed in the New York Times, has expanded her computer science research from privacy to algorithmic bias.
In addition to uncovering bias in existing algorithms, many researchers address remedies. Some suggestions attack the problem on a technical level, proposing thorough-going changes to way data scientists design their algorithms. For instance, Dwork's "Fairness Through Awareness” adapts the doctrine of differential privacy to the analysis of analytics.
Other researchers look for policy and regulatory solutions. Many organizations have released principles for the development and use of predictive analytics; one was recently developed by the US policy group (with which I've worked) in the Association for Computing Machinery. Its seven principles include channels for redress, transparency about the way data is collected, auditability, and testing.
Law professor Frank Pasquale is illustrative of the policy experts who have taken on algorithmic bias. His book The Black Box Society lays out the difficulties in monitoring or regulating predictive analytics, but offers some suggestions, including a "just say no" approach in some situations where analytics aren't appropriate.
What are the sources of this bias?
When predictive analytics turn out to be biased, the cause usually lies in their input data. A simple example involves policing. Police have historically been concentrated in low-income and minority neighborhoods, so if you look for data about arrests, you will obviously turn up more low-income and minority people. Crimes occur in other neighborhoods too, but are less likely to be recorded. Predictive analytics may drive the police to devote yet more forces on the ground in low-income and minority neighborhoods, with no knowledge that unreported crimes are occurring in other neighborhoods. The computer code may be perfectly fair, but the data is skewed.
Policing, job interviews, and other uses of analytics may also involve selection bias: ignoring factors that would improve the input's diversity and representativeness.
Suppose, for instance, that firms have historically done well by hiring candidates who attended Ivy League schools or other prestigious institutions. These institutions are not representative of the larger population (although many try hard to be more representative through their acceptance criteria). Either by human choice or by running analytics on historical data, an algorithm may be designed to favor candidates from the prestigious institutions. Other factors, of which the analysts might be totally ignorant, might have highlighted the achievements and potential of diverse candidates--for instance, achievements as volunteers in community organizations. But this data might not be collected, or might not be organized into a structured form that the analytics can consume, or might just never be considered by the people designing the analytics. In short, the criteria chosen to develop the analytics may cause bias.
How does human bias infect the data fed into predictive analytics?
The previous answer explained how data scientists and managers who design predictive analytics tend to look for types of data that are familiar to them, which translates into data that reflects their own upbringing and experience. This then produces bias.
Bias from the general population can also translate into bias in analytics. Although no one knows the reason for Sweeney's findings that searches for common African-American names are more likely to turn up ads for "arrest records", we can probably rule out bias by Google or by the firm that served up most of the ads, Instant Checkmate. Google and Instant Checkmate denied they were deliberately targeting African-American names, and they would be unlikely to do so because of legal and publicity fire storm such targeting could cause.
The most believable explanation for Sweeney's findings is that ordinary Google visitors--millions of them--type in requests for arrest records more often for African-American names than for others. Google's algorithm for choosing AdSense ads picks up on what people ask for. So these millions of everyday Google visitors are (according to the most likely hypothesis) creating biased data.
Can predictive analytics avoid bias?
Perhaps the most confounding problem in attacking bias is that different worthy goals can be in irreconcilable conflict. This was revealed as data scientists reviewed the highly publicized Propublica article on criminal sentencing. The company that produced the sentencing software, Northpointe, rebutted Propublica by claiming that its software was fair and accurate because it had equal accuracy rates for predicting whether a black or white convict would commit another crime. (Actually, the analytics made a lot of mistakes on both populations.) But Propublica focused on the "false positive" rate, showing that black people were more often rated as a risk for re-offending when they did not actually re-offend.
In other words, Northpointe compared black and white convicts directly to demonstrate fairness. Propublica compared the predictions and actual outcomes for black people, and then compared them to white people, to demonstrate unfairness.
Other articles, such as by the American Conservative, showed that Northepointe's claims and Propublica's claims are both accurate--and even more problematically, that they cannot be reconciled. Black people in the population surveyed are more likely to re-offend, so any algorithm that predicts recidivism must choose between Northepointe's idea of fairness and Propublica's idea of fairness. Other research has generalized these findings, saying that analyzing two populations with disparate behaviors cannot satisfy all criteria for fairness.
No one to my knowledge has resolve this dilemma. We all feel in our guts that fairness is important--it is a trait of all humans and many other species--but we have to choose what kind of fairness to pursue.
How can expert reviewers and product users check for bias?
This is also hard to do. The examples of bias cited in this article have been discovered after the analytics were put in production, when they had already potentially affected thousands of people. But once they analytics are in the field, they can become self-fulfilling prophecies. For instance, if you give someone a loan and she defaults on it, you can acknowledge that your algorithm gave you a false result. But suppose your algorithm is biased against a whole class of people and you deny loans to them? How can you know whether they would have repaid the loans? Perhaps you could find other institutions that gave the same people loans and discover that the loans were repaid, but the chance of discovering such neat experimental results is minimal.
Most developers try to avoid letting bad analytics out into the field--checking for all kinds of errors, not just the kinds of discrimination covered in this article--by separating their input data into training and test data. But unless the developers specifically look for evidence of bias, they will not find it. As we saw earlier, developers may be blind to selection bias and other factors that distort input data. So the first task in establishing fairness is to set up special checks for bias affecting protected classes. Some researchers offer ways to audit analytics for bias.
In addition, many researchers (such as the Dwork team mentioned earlier) seek ways for data scientists and coders to discover bias during development. Because the world changes around us, and changes the data we consume along the way, techniques for discovering bias should be repeated regularly, and a historical review of results should also be done.
Can we prevent bias through transparency and open source code?
Providing code for expert review is of dubious value. As we have seen, bias usually lies in the way input data was collected and organized rather than in the code. If there was a problem in the code, experts would have difficulty finding it, given the code's size and complexity. Furthermore, modern analytics often employ machine learning techniques that produce algorithms one or more layers removed from human coding. The humans do not explicitly tell the machine what to do, but provide guidelines for the analysis to find its own correlations and relationships. As the New York Times said about Google’s translation efforts, "Some of the stuff was not done in full consciousness. They didn’t know themselves why they worked."
What should governments do to help?
A common complaint is that laws and regulations move too slowly to keep up with modern technological change. And indeed, governments have not done much yet in response to the intense publicity around bias in predictive analytics.
The European Union's latest General Data Protection Regulation calls on organizations using analytics to provide information about these analytics to people who want to challenge the findings returned by them. However, the directive may not be strong enough to be applied in practice. Furthermore, as we have seen, it will be hard to adhere to the directive and show how machine learning produces its precise results.
The Obama administration, in its investigation of the risks and value of data analysis, produced a report in May 2016 mentioning the issues of bias: Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights.
Direct interventions into analytics, such as attempted by the European directive, may prove unworkable. But governments can re-affirm a prohibition on discrimination against protected classes and emphasize that organizations retain responsibility for obeying these laws whether or not they use predictive analytics to make decisions.
As with cybersecurity and privacy, the government can set up standards bodies to help organizations develop good practices in the area of predictive analytics. It can encourage these organizations to police themselves and can provide safe harbors so that organizations using currently accepted best practices can cite them in court if accused of bias.
Where does the individual stand in relation to predictive analytics?
Bias in algorithms is an expression of the growing power gap between institutions and individuals. A merchant may take years of my purchase history into account, along with other personal information derived from a host of sources, when offering me a product or price, but I don't possess an equivalent knowledge of the merchant's product line, production costs, or inventory. If I knew as much about the merchant as it knew about me, and could run analytics just as sophisticated as the merchant's, our encounter around a price would be fair. As things are now, I may be able to use price-check sites to compare different merchants' offerings, but I can't evaluate them on the basis of the merchants' potential motivations and pressures.
In many ways, institutions are opaque to the people they affect. In the old days, a person might deny me a loan or reject me for a job, and I couldn't look inside his head to determine whether he was biased (although his speech or body language might give me clues). Nowadays, it may be an opaque algorithm rather than an opaque interviewer who I face. I may have no idea that predictive analytics were used at all. If I know analytics were used, I don't have the right to look at information about the analytics--and even if I did, the information would probably be insufficient to let me know whether bias was involved.
Thus, many factors go into the lack of accountability for bias in analytics. Transparency about their use can help. Increased privacy for the user would help even more. But ultimately, if analytics discriminate against us on the basis of race, gender, or other factors, it's because the same discrimination exists throughout everyday social interactions. We must ameliorate the underlying social discrimination in order to eliminate algorithmic bias.
_______________________________________________________________
Andy Oram is a writer and editor at O'Reilly Media. As editor, he brought to publication O'Reilly's Linux series, the ground-breaking book Peer-to-Peer, and the best-seller Beautiful Code. In print, his articles have appeared in The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, DebConf, and LibrePlanet. Andy participates in several groups in the Association for Computing Machinery policy organization, USACM. He also writes for various web sites about health IT and about issues in computing and policy.