Shutterstock

An obscure app has scraped 3 billion photos to use for facial recognition and is selling its services to law enforcement

The photos you post on social media, the videos that you upload to sites like YouTube and Vimeo — it isn't just your friends who are looking at them. According to a report from the New York Times, a largely unknown company called Clearview AI has been scraping every inch of the internet for photos of faces in order to train facial recognition technology. It has in turn sold access to its massive and comprehensive database to hundreds of law enforcement agencies including the FBI and the Department of Homeland Security.

Clearview AI's origins and goals

If you haven't heard of Clearview AI before, you're not alone. The company has operated largely in the shadows until brought to light by the Times. For some time now, the tiny startup has been accumulating a huge collection of faces — more than three billion images, all taken from sites like Facebook, YouTube, Venmo and any number of random site that happens to host pictures of people. The result of the tool is an incredibly powerful facial recognition tool — one that greatly outstrips any of its competitors in the space. According to the New York Times, the database can be fed a single photo of a person and produce dozens of other images of them, complete with links to where those photos originally appear online.

Beyond simply hosting this unmatched database of public images, Clearview AI has a broader vision of how it plans to use its technology. It reportedly has structured its technology to be used with cameras, scanning people in real time and identifying them on the fly. There is also code within the company's application that would allow it to be paired with augmented reality glasses — think something like Google Glass or Microsoft's HoloLens. This would enable a member of law enforcement to walk down the street and identify just about every person that they saw — providing information like their name, address and other biographical details, all right through their lenses. The level of invasiveness that these use cases open up are previously unheard of. While there are plenty of efforts to create facial recognition technology that can interact with cameras, the real-time ability to scan and identify people — even people who are not thought to be suspicious or guilty of any criminal act — represents the potential for a massive invasion of privacy.

The trouble with facial recognition technology

The pursuit of facial recognition technology has been kicking into full gear over the last decade or so, as new technology has come available to make the concept more viable. Artificial intelligence-driven systems capable of identifying people automatically is something of a holy grail for police and law enforcement, who view the technology as a tool that can simplify their jobs. With the most generous interpretation of its use case — one that assumes no misuse or abuse or failure — police could use the tool to quickly identify the location of a criminal by simply waiting for them to pass in front of a camera equipped with facial recognition technology. It could keep dangerous people from traveling or being in places where they may put others at risk, for example.

Of course, there are very few guard rails to ensure that facial recognition technology doesn't succumb to the many potential pitfalls that could turn it from a well-meaning tool to an invasive, privacy-destroying, guilty-assuming monstrosity that reinforces negative and false stereotypes about the people that it monitors. Most facial recognition technology on the market today produces a high number of false positives, meaning it misidentifies people with alarming regularity. A report from The Independent found that systems used in the United Kingdom produced a 98 percent false positive rate. Even the most accurate systems produce about a 10 percent false positive rate according to testing done by the United States' National Institute of Standards and Technology (NIST). That means one in every 10 people are wrongly identified by facial recognition technology.

These problems are largely driven by the tech's inability to accurately identify the faces of people who are members of marginalized communities. NIST found that facial recognition systems produce a false match for black women 10 times as often as it does for white counterparts. Tests have also shown that these systems suffer from significant racial bias that results in falsely identifying people of color, particularly Black and Asian people. Similarly, these systems are largely incapable of accurately identifying trans and non-binary people. A recent study found that while facial recognition technology accurately identifies the gender of cis men 97.6 percent of the time and cis women with 98.4 percent accuracy, it misgenders trans men nearly 40 percent of the time. The systems also fail to accurately identify agender, genderqueer and nonbinary people 100 percent of the time as most systems fail to account for anyone who does not subscribe to binary gender identifiers.

Opportunities for abuse

Tools the misidentify people create the opportunity for falsely accusing people of crimes that they didn't commit. Given that these tools regularly misidentify people of color and gender non-conforming people — communities that have regularly been over-policed and exposed to violence and mistreatment at the hands of law enforcement — it is troubling to think that police may use facial recognition to automate part of its job when it has bias baked into it. But Clearview AI is promising a much more accurate database — one that presumably would avoid many of those false positives simply by building a comprehensive database of just about everyone who publically shares images of themselves online.

The problem is that accuracy does not ensure proper use. According to the New York Times, Clearview AI's technology has not been vetted by experts to make sure that it abides by best practices to protect the privacy of individuals. There are also very few, if any, standardizations in place for this type of technology — and any guidelines that do exist largely get ignored. According to a Government Accountability Office (GAO) report, the FBI has chosen not to conduct testing and audits of its facial recognition technology. These checks are intended to check how accurate the tool is and make sure it complies with all internal rules, but the law enforcement agency has simply used it without undergoing these checks. The ACLU has called for an "ethical framework" for facial recognition that includes principles intended to protect the privacy of people through every step of the process, but actual adoption of such technology is still a long way off.

As cities, states and law enforcement hash out the rules for facial recognition technology, the databases continue to be built. The FBI's own database now has more than 641 million photos, according to the GAO, while companies like Clearview AI are turning our penchant for sharing into tools that can be used to identify us on the fly. All of this is taking place without our explicit consent and is creating tools that monitor and attempt to identify us, whether we have done anything wrong or not. There is a presumption of guilt that comes along with this level of surveillance — a feeling that even if you haven't done anything wrong yet, you should be watched because you might at some point. Companies like Clearview AI and partners in law enforcement are moving ahead with systems that are largely untested, unproven and unregulated. At this rate, the public will be their guinea pigs and may suffer from invasions of privacy, bias and misidentifications while we wait for the law and technology to get up to speed.