How AI lets bigots and trolls flourish while censoring LGBTQ+ voices

Illustration by Lorenza Centi

In 2019, Thiago Dias Oliva gathered over 114,000 tweets from the accounts of drag queens featured on RuPaul’s Drag Race like Bianca Del Rio, white supremacists like David Duke, and political figures like Michelle Obama and Donald Trump. Then he threw them all into Perspective, an artificial intelligence tool launched by Google’s Jigsaw tech incubator in 2017 that assigns every piece of text it scans a “toxicity” score.

“Perspective is one of the most advanced tools of this sort on the market,” says Oliva. Dozens of sites, including major media outlets like The New York Times, use it to help them keep hateful and harassing user posts off of their platforms. In early 2019, Perspective also launched Tune, a Chrome browser plugin that automatically hides toxic-rated posts for users browsing Facebook, Reddit, Twitter, and other social media sites. Oliva, a researcher who studies anti-LGBTQ+ hate speech and efforts to control it online, wanted to see how the tool would react to drag queens’ mock impolite speech, like their constructive uses of reclaimed slurs, versus the dog whistles white supremacists use to make their hateful bile sound calm and reasonable. He threw in Obama and Trump’s tweets as well-known points of comparison for civil and toxic posts, respectively.

The results, which he and two colleagues published in the academic journal Sexuality & Culture late last year, were grim: Perspective rated over a dozen drag queens’ accounts on average more toxic than Duke’s. It rated just a fourth of the drag queen accounts Oliva fed it less toxic than leading neo-Nazi Richard Spencer. Notably, Perspective decided a tweet from the virulent white nationalist Stefan Molyneux claiming “the three major races have different brain volumes and different average IQs,” a blatantly hateful, ignorant, and inflammatory statement, was 21.7% toxic. Meanwhile, it decided that a tweet by drag queen Mayhem Miller that simply states, “I am black. I am gay. I am a man. I am a drag queen. If those are not enough for you… kindly, fuck off!!!” was 95.98% toxic.

The results didn’t surprise Oliva. He didn’t tell any drag queens about his plans to study their accounts, or of his findings. But when Mic told Yuhua Hamasaki that Perspective ranked her one of the most toxic drag queens on Twitter — and far more toxic than Duke or Trump — she wasn’t surprised either. Queer activists and creators have long argued that many automatic content monitoring systems, not just those that run on Perspective, over-police them while under-policing most bigots and provocateurs. Even systems “created to curtail [anti-LGBTQ+] hate speech are often inherently problematic themselves,” argues Avatara Smith-Carrington, an attorney and researcher at the LGBTQ+ civil rights group Lambda Legal, tells Mic.

Artificial intelligence experts, who’ve collected many examples of harmful bias in AI systems in recent years, say such ironies are the unfortunate but inevitable result of a few major limitations of this sort of tech. “It’s important to bring these limitations to the attention of the general public and policy makers,” says Oliva, especially as “technocratic narratives are selling AI tools as bulletproof solutions to all sorts of everyday problems, including content moderation.”

Bigots and trolls often spout off on how they ought to be able to say whatever they want online with impunity, and that even so much as putting content warning labels on their posts will lead to the inevitable collapse of free and vibrant speech. But as Jigsaw’s Lucy Vasserman pointed out in a recent talk, unchecked digital hate and vitriol alienates most people online, narrowing the number and diversity of voices in debates and all too often resulting in real harm to marginalized communities. So, if we want the internet to be a place for constructive dialogue — the thing many hoped it would be in the late 20th century — we do need to curb actual toxicity online.

But that’s much easier said than done. Human content moderators struggle, often on low wages and long shifts, to keep up with the impossible deluge of content people post online every day. They may make mistakes, or the rules they’re trained on may contain ambiguities, gaps, and flaws. And they have to review so many horrible texts and images, including scenes of death and mutilation, on a daily basis that the job of moderating content often takes a grueling toll on their psyches.

If you feed [AI systems] biased data, they will learn and perpetuate the biases within it.

In the face of mounting criticism about the spread of hate and disinformation online in recent years, tech companies have increasingly argued that AI tools will be able to solve everything. They can learn complex rules and apply them dispassionately at lightning speed, even potentially providing feedback on something a user is typing or uploading, before they’ve posted it. In some cases, they can learn as they go, developing sharper and more nuanced eyes than human moderators. And they won’t suffer when they review even the most heinous content.

But AI systems just reflect the things their creators plug into them. If you feed them biased data, they will learn and perpetuate the biases within it. If you ask them to look at something they didn’t see in their training data, they won’t know what to do with it. And they’re not good at picking up on context and nuance, unless you train them to recognize a given context or nuance.

Yet hate speech and harassment are all about context, says Jillian York of the Electronic Frontier Foundation, a digital civil liberties group. The same string of words can carry a very different set of meanings depending on who is uttering them, why, and to whom. Marginalized groups often turn slurs directed against them into sources of empowerment, and general insults into terms of endearment. Meanwhile, bigots often display a talent for turning the most innocuous language into bilious cudgels — with just the right nods and winks to unspoken ideas or external events.

In theory, developers can account for all of this, feeding their systems balanced or unbiased data and tweaking their algorithms to account for specific nuances from the start. But only if they know to consider these biases and contexts, and believe that the often-substantial investment of accounting for them is worth it. The tech industry struggles with diversity, so many extant AI tools were designed without minority groups in the room to advocate for such considerations. Facebook in particular until recently rejected most calls for nuance, insisting it needed to use consistent rules that reflect its global operations.

In many cases, these tendencies resulted in “language from minoritized communities [being] poorly handled by [AI-enabled] content moderation systems,” says Su Lin Blodgett, a computer scientist who specializes in digital language analysis, “often resulting in disproportionate removals.”

“Most everyone in the content moderation world is aware of these massive blind spots, regarding how AI fails to understand context.”

“Members of minoritized communities need to be hyper-careful about language use,” she adds, even when they are speaking up about injustices or speaking constructively to each other.

“Most everyone in the content moderation world is aware of these massive blind spots, regarding how AI fails to understand context,” says tech ethicist David Ryan Polgar. Insiders tend to insist their systems will get better over time, as they consume more, and more nuanced, data and get tuned and tweaked in response to legitimate critiques. Many tech firms have, over the last couple of years, made public commitments to building more diverse staffs, hired social scientists to help them, and created anti-bias principles and auditing protocols to improve their AI systems.

Granted, these sorts of improvements will take time, Blodgett notes. “Finding people who may have been affected [by a system], building relationships of trust to be able to understand their experiences, bringing them in as research co-participants, and so forth ... is a long-term effort.”

Jigsaw claims it gave researchers open access to Perspective from day one to invite scrutiny for potential oversights. In her December talk, Vasserman noted that they quickly learned that Perspective had labeled neutral identity terms as toxic thanks to their inclusion in insulting comments, such that it deemed the sentence “I am a gay black woman” 87% toxic while “I am a man” came out fairly neutral. Early versions of Perspective also considered any use of the word “fuck” a sign of toxicity, York adds, “without differentiation between ‘fuck you’ and ‘fuck yeah’.”

Vasserman also claimed that the Jigsaw team promptly started feeding new data into its system to make its scoring of identity terms much more neutral as soon as it learned about this issue.

Oliva says that his team shared their findings with a Google official in 2019, who passed it on to the Jigsaw team. They “contacted us to say they appreciated our work and have been discussing the issue we raised,” he adds. Vasserman specifically acknowledged Oliva’s work in her talk, admitting that it shows they “still have a lot of work to do” to control for system misjudgments.

A representative from Jigsaw tells Mic the team is “conducting several experiments to address some more nuanced challenges on this issue.” He offered to share more details, but when Mic took him up on that he replied: “We don’t have any additional details to share.” He later clarified that the team decided it would be better to share updates when they finish their experiments, adding that they’re “in partnership with leading academics and researchers in the field.”

“The larger companies — Facebook, Google — have the capacity, the funding, and the ability to do so much more than they are doing now.”

Even this is more transparency than many AI teams offer. Big tech firms have this irksome habit of arguing that they need to protect their proprietary systems and therefore cannot let outsiders take a look under the hood, even by hearing the exact details of how they’re addressing flaws and misfires like the one Oliva identified. This makes it incredibly hard for watchdogs to judge exactly how sincere or meaningful the efforts AI teams say they’re putting into making their systems fair and unbiased actually are relative to the scale of the challenges they face.

York for one doubts that most of these companies are really dedicated to investing meaningful time and energy into debiasing their systems. “The larger companies — Facebook, Google — have the capacity, the funding, and the ability to do so much more than they are doing now,” she says.

One of the drag queens whose accounts Oliva plugged into Perspective has her doubts as well. (She asked Mic to withhold her name, because she fears Google might retaliate against her by ramping up its policing of her content.) “If [Jigsaw] were honest about their intent to address their over-policing of our accounts, they would have told you the details of what they’re already doing about it,” she argues.

Jigsaw has stressed that because they’re still working on their system’s responses to nuance, no one should use it to fully automate content moderation. “It’s very important to have humans spot-checking, and to correct when there are mistakes,” Vasserman said in her talk.

Most sites that use AI tools like Perspective follow that rule of thumb — but not all of them. In fact, Jigsaw’s Tune plugin arguably fully automates content moderation at the user browser level without human mediation based on Perspective’s self-admittedly limited analyses. When Oliva used the plugin in 2019, it fully censored Hamasaki’s Twitter page, save for her name.

However, even keeping humans in the mix may not alleviate the harm that these tools can cause. After all, humans and the rules they operate under may also fail to recognize context or nuance and affirm bad AI decisions that get a post or an account taken down. Most sites allow users to appeal decisions like this, but doing so can be hard if, as is often the case, you don’t know how a site made a moderation decision, or when the appeal process itself is opaque and labyrinthine. And if a site temporarily removes a post, or hides it behind a warning, while a human reviews an AI system’s initial judgment, even if they eventually reinstate the post they’ll still have silenced meaningful speech for hours, days, weeks.

Those actions are all chilling and disruptive enough for queer folk trying to speak openly about their experiences and build community. They can be especially damaging for drag queens and others in the community who depend on social media to make a living. “Promotional brands look at engagements and likes,” Hamasaki tells Mic. “If all the work that we’re putting in is not being seen or reflected because of these systems’ decisions, that affects our money as content creators.”

Oliva urges Jigsaw and other teams working on AI content analysis tools to consider these harsh potential implications of their technology, and in light of them to commit to ever more substantive and transparent attempts to debias them. Hamasaki thinks that, at a bare minimum, every tech company should consult with LGBTQ+ individuals before launching future products.

But considering all of these pitfalls, Polgar, the tech ethicist, argues one thing is clear: No matter how much Big Tech wills it to be so, “AI will never be a silver bullet for content moderation.”