Diversifying Data: Why Technology Continues to Ignore People of Color

As members of California State Legislature convened to discuss whether facial recognition tools should be used in law enforcement, they were met with an unexpected dilemma. The American Civil Liberties Union of Northern California reported that according to Amazon’s facial recognition algorithm, approximately 20 percent of them were matched to criminal mugshots, a majority of whom were people of color.

While protests continue over the deaths of George Floyd, Breonna Taylor, and dozens of other black Americans who lost their lives  at the hands of law enforcement, many of the world’s largest technology companies have promised millions of dollars towards diversity measures and anti-racism initiatives. However, most have failed to admit to their complicity in the propagation of racial bias and profiling in their very own industry, one that many deem to be an  “objective” field.

The idea that the research and data sectors function outside the realms of systemic racism has been a longstanding misconception. When phrases such as “data doesn’t lie” are constantly being thrown around in academia and popular media, it’s easy to see why many would believe that the research coming out of these fields is free from the shackles of bias that entangle so many others. But a quick look at the frameworks behind these discoveries uncover a lot about the false moral pedestals upon which the tech industry is built.

While many have been critical of the tech industry’s recent demonstrations of covert racism, racial bias is ingrained into the very beginnings of modern technology itself. Although some of the most groundbreaking discoveries in the tech sector came as early as 1904, the field wasn’t democratized until the 1990s when the frameworks of the internet were introduced.  With such a large gap between the creation of the field and its opening to the public, many racial groups were simply left out of its development entirely.

Before the ‘90s, most discoveries were made by the military and homogeneous research firms such as Bell Laboratories. Although Bell Laboratories was one of the few research organizations to offer fellowships for minorities following the civil rights movement, the barriers to entry for being able to interact with emerging technologies were too high. As a result, many immigrants and people of color simply had no opportunities to contribute to the blooming industry in any meaningful capacity.

We continue to see the lasting impacts of the historical lack of representation in this space on technological inventions today. Many phrases known to be common in the computing space, including whitelists and blacklists as well as master and slave terminology, have continued to perpetuate racist and hateful language. Although these phrases may not have been originally intended to be explicitly racist, the act of using them without understanding their potential for racial interpretations has lasted far too long in this space.

It took until 2018 for the programming language Python to remove master/slave terminology from their documentation, and the terms whitelist and blacklist continue to be commonplace in most web filtering platforms today.

While the industry begins to reflect upon the oversights built into the foundations of their codebases, they face many challenges with addressing the racial bias that their tools may introduce as fields—including artificial intelligence—continue to develop. 

Artificial intelligence is based upon the concept of giving machines the ability to perform and simulate actions that would generally require human intelligence to complete. While we humans have the benefit of life experiences to aid our decision-making systems, computers can’t  pull from that expansive knowledge. Instead, tech researchers and data scientists must find ways to collect this information to provide to AI models. While the data being collected to inform these models is improving, it is far from representative of the general knowledge that we possess. 

This is evident in the widespread use of facial recognition algorithms, which can be seen in harmless applications such as Snapchat filters to potentially dangerous instances like the use of these tools to identify criminal suspects. A government study from the National Institute of Standards and Technology (NIST) found that a majority of the 189 recognition algorithms it tested had higher false-positive rates for people of color than it did for Caucasians. This, however, was not true for some algorithms developed in Asian countries.

A closer look into this disparity shows that it isn’t the technology that is producing these results. It’s their creators. Philosophy professor Dr. James Taylor at Duquesne University highlights why this disparity exists.

“Technology is supposed to be morally neutral, since it is supposed to consist of nothing more than the manipulation or use of human tools and the goals to which humans direct such tools,” Taylor said in his Ph.D. dissertation. “If technology raises any ethical concerns, then those concerns should be related to the goals that humans have in using technology or in the improper use of technology.”

So where is this bias coming from?

Although these facial recognition tools rely on a multitude of algorithms to process images, those aren’t the sole sources of the racial bias we see. It also comes from the information that these researchers use while building AI models. When facial recognition models are trained with datasets featuring the faces of thousands of people, most being Caucasian, they will likely have a fairly easy time identifying Caucasian individuals they have never seen before. But when that same algorithm is given an image or video featuring people of color, it will be much harder for the model to discern who those people are simply because they have not been exposed to them as much as other races. However, this is not to say that simply providing algorithms with more data will fix the issue entirely.

We must understand that until that bias is eliminated from these systems through a thorough analysis of the algorithms themselves, people of color are constantly put at a greater risk of being exploited by these technologies. Multiple law enforcement departments across the country have continued to use facial-recognition software to identify suspects in crowds and look up the records of detainees who do not have an ID. While officers may believe this to be a useful tool, there is no way to verify the accuracy of such technologies when the lives of innocent people are at risk. All it takes is one misclassification of this tool to cause a blameless citizen to be arrested on the grounds of a crime they did not commit. 

Despite the multitude of opportunities for technologists to implicitly apply bias against people of color, there are applications in which tech has the potential to remove racial bias. 

A key place that could benefit from these tools is the criminal justice system. Pioneering research has shown that automating the process of bail hearings could significantly reduce crime rates, failure to appear citations, and unfair treatment given to African Americans for the same crimes as others. 

Dr. Jon Kleinberg, a professor of computer science at Cornell University, showed the potential for a computer-based system that utilizes the same if not less information than the information provided to a judge. By feeding an AI model with all available data of prior judicial decisions as well as the criminal records of the defendants, this algorithm was able to successfully predict the risk of a defendant committing a crime if they posted bail. An important distinction is that the AI model was never fed the race of the defendant, leaving it truly separated from the bias that may come from a judge interacting with a defendant while on the stand. Although the question of who to hold accountable is still unanswered, we may be able to see implementations of similar algorithms in the future to aid with these decisions.

We can see why omitting racial data in criminal justice hearings is so important to ensure that racial bias is not introduced by taking a look at the COMPAS algorithm, a popular tool used to predict the likelihood of a convict becoming a recidivist. The algorithm was shown to identify black convicts as almost twice as likely to commit a future crime as a white convict with the same criminal history. Such predictive policing tools have shown time and time again that when datasets are provided with biased data, they will continue to act in favor of the biases that are perpetuated through systemic racism.

What is most important in light of this movement is to recognize that it is not enough for technology companies to just donate to Black Lives Matter initiatives. While the money will undoubtedly help to a certain extent, it does nothing to address the constant threat that people of color face when technology is not made in the interests of everyone. At the end of the day, technology can only be as benevolent as its creators, and without the overhaul of the antiquated practices plaguing the industry itself, it will be impossible to create tools that are truly deemed as “objective.”

Leave a Reply

Your email address will not be published. Required fields are marked *