Twitter’s algorithm misidentifies harmless tweet as “sensitive content” (April 2018)

Automated image detection leads to an image being categorized as sensitive, prompting concerns over algorithmic accuracy, and over-moderation.​


While some Twitter users welcome the chance to view and interact with “sensitive” content, most do not. Twitter utilizes algorithms to detect content average users would like to avoid seeing, especially if they’ve opted in to Twitter’s content filtering via their user preferences.

Super Mario Tweet

Unfortunately, software can’t always tell what’s offensive and what just looks offensive to the programmable eye that constantly scans uploads for anything that should be hidden from public view unless the viewer has expressed a preference to see it.

A long-running and well-respected Twitter account that focused on the weirder aspects of Nintendo’s history found itself caught in Twitter’s filters. The tweeted image featured an actor putting on his Princess Peach costume. It focused on the massive Princess Peach head, which apparently contained enough flesh color and “sensitive” shapes to get it — and the Twitter account — flagged as “sensitive.”

The user behind the account tested Twitter to see if it was its algorithm or something else setting off the “sensitive” filter. Dummy accounts tweeting the image were flagged almost immediately, indicating it was the image — rather than other content contained in the user’s original account — that had triggered the automatic moderation.

Unfortunately, the account was likely followed by several users who never expected it to suddenly shift to “sensitive” content. Thanks to the algorithm, the entire account was flagged as “sensitive,” possibly resulting in the account losing followers.

Twitter ultimately removed the block, but the user was never directly contacted by Twitter about the alleged violation.

Decisions to be made by Twitter:

  • Are false positives common enough that a notification process should be implemented?
  • Should the process be stop-gapped by human moderators? If so, at what point does double-checking the algorithm become unprofitable?
  • Would a challenge process that involved affected users limit collateral damage caused by AI mistakes?
  • Does sensitive content negatively affect enough users that over-blocking/over-moderation is acceptable?

Questions and policy implications to consider:

  • Should Twitter change its content rules to further deter the posting of sensitive content?
  • Given Twitter’s reputation as a porn-friendly social media platform, would stricter moderation of sensitive content result in a noticeable loss of users?
  • Should Twitter continue to remain one of the only social media outlets that welcomes “adult” content?
  • If users are able to opt out of filtering at any point, is Twitter doing anything to ensure younger users aren’t exposed to sensitive material?


Twitter removed the flag on the user’s account. According to the user behind the account, it took the work of an employee “behind the scenes” to remove the “sensitive content” warning. Since there was no communication between Twitter and the user, it’s unknown if Twitter has implemented any measures to limit future mischaracterizations of uploaded content.

Written by The Copia Institute, August 2020

Copia logo