Trolls keep outsmarting anti-harassment tools. Will Twitter's new system actually work?


Brianna Wu had expected some blowback when she announced she was running for Congress. For the last three years, the game developer has been a frequent target of Gamergate, a rabid online subcommunity that protests political correctness by harassing women and minorities online. But this time, she faced trolling she'd never seen before: spambots that created and promoted fake scandals about her.

The bots were a 4chan creation, Wu said, and they would tweet the same messages over and over again, like "'Offensive in Every Way' Brianna Wu (D?) Is a Racist." They spread the hoaxes by spamming Massachusetts hashtags like #MaPoli and #BoPoli. When she reported these accounts and tweets to Twitter, Twitter said it found one account to be in violation of its policies. But the account is still active, she said, and the harassment has continued.

"As far as the kind of trolling I expected when running for Congress, I expected it to be a return to Gamergate, and it has been," Wu said in an email on Wednesday. "It's an attempt to assassinate my character with local voters."

Brianna Wu/Twitter
Twitter/Brianna Wu

A beacon of hope amid the cesspool of harassment

Stories like Wu's are a big reason Twitter's new anti-abuse tools appear so promising. Announced Tuesday, new features will automatically show users "the most relevant conversations" in reply threads and eliminate "potentially sensitive content" from search results. The features also purport to target abusive users, not just their tweets. Twitter Vice President of Engineering Ed Ho said in a blog post on Tuesday that the company aims to stop the creation of abusive accounts in an effort to curb repeat offenders who had been permanently suspended. This specifically targets users hatching a bunch of eggs (that is, new accounts with default "egg" avatars) with the sole intention of harassment, like the infantry of bots Wu mentioned.

This update, unlike Twitter's many other attempts to make its platform less of a nightmare for its most vulnerable users, shifts the burden away from the victims. Brittan Heller, the Director of Technology and Society at the Anti-Defamation League, is hopeful about the changes. 

"In my experience working with people who have been the victims of extraordinary online harassment, I think that shows that this will be a very useful improvement for that," Heller said. When she worked with users who were harassed on Twitter, "the army of eggs that would follow … would amplify the abuse and make it seem even worse. This seems like a direct way to curb that."

But not all tools are troll-proof

Two of the features Twitter rolled out — safer search results as well as the ability to collapse reply threads to hide "potentially abusive and low-quality" responses — are begging to be outfoxed.

"The pace of innovation for abuse far outstrips the pace of innovation for good," Heller said. "It's inevitable that trolls are going to try to find a way to game the system."


Heller noted that the usefulness of the two user-facing features Twitter announced on Tuesday will depend on two things: one, how the company defines what sensitive content is, and two, what counts as a relevant conversation. That information has not been revealed to the public.

"If you allow users to explain why something should be considered sensitive content, and not just check a box that says it's sensitive or not, that's the way they can get the benefit of diversity of their users," Heller said.

How trolls have dodged anti-abuse tools to harass women and minorities

When a company leans on automation to pinpoint harassment, trolls tend to find new ways to fly under its radar. 4chan users did this to circumvent Google's anti-harassment tool Conversation AI, which aimed to tackle hate speech online. To work around it, trolls developed a full list of slang using innocent-sounding words to covertly make racist statements without being flagged. For example, Skype, Yahoo and Skittle mean Jew, Mexican and Muslim, respectively. (A bot may not understand that a Pepe the Frog cartoon tweeting "Gas the Skypes" at someone is a discriminatory attack.)

In 2014, white nationalist podcast hosts coined a slur known as (((echoes))) — three parentheses placed around a Jewish person's surname online to out them to other neo-Nazis. The symbol is not searchable on most social networks, and search engines strip punctuation from results, so abusers could harass Jewish users and send dog-whistles to their followers, and their threats would be essentially invisible via the search bar.


And when Twitter announced an anti-harassment feature that lets users mute specific words, phrases and then entire conversations altogether, it was met with some skepticism. Again, this puts the onus on the users to prevent their own attackers. It asks them to put themselves in the heads of their abusers and guess which words or phrases they might spew at them, adding them to a snowballing list of slurs — including intentionally misspelled slurs.

Can Twitter's new system stand up to these tactics? 

The technology exists — but we'll see how well it works.

Mic asked Leslie Miley, a former engineer at Twitter who started the product safety and security team that handled abuse, if Twitter is capable of handling the complex mechanisms of creative and dedicated abusers.

The answer: It's complicated but possible. Miley said that trolls can use code words and misspelled words, but that the tooling Twitter had in 2015 would be able to handle those.

"If you start using 'bob' as a code word for some racist term, then it gets really difficult," Miley said. "You can try to do a signal-to-noise ratio" so if accounts are flagged that are part of "affinity groups known for abusive behavior or racist views, the tooling is really easily modified to handle that."

If these algorithms don't prove effective, however, there's still plenty of room for improvement.

Right now, Twitter needs a better reporting process…

Wu believes Twitter needs to have an appeal process that is taken more seriously. When the company didn't find the spambots spewing fake scandals about her to violate the Terms of Service, Wu asked Twitter to take a look at it a second time. Nothing happened. 

"I've never had anything happen because of that," Wu said, adding that Twitter should develop a more transparent policy about what happens when the machine language makes a mistake. "There are things happening to me literally every day that are blatant violations of the ToS, and I know if Twitter just gives it a pass the first time, nothing is going to happen."

…And a more diverse team

Another step Twitter could take is diversifying the team that handles abuse — which disproportionately impacts women and people of color.

"We have white and Asian men running abuse and harassment teams, engineering teams in particular," Miley said. "Are they going to create the right tools and the right systems to handle it? I think it's almost impossible. Because you fundamentally don't get what it's like to have somebody call you the N-word or somebody say they are going to come and rape you or somebody dox you and put your address and your friend's address out there. If that's never happened to you, how are you going to respond?"