Using AI to detect problematic edits on Wikipedia (2015)

Wikipedia implements a tool called ORES that scores content to help editors assess vandalism and improve quality on the platform. As with most automated tools, a number of considerations follow.


Wikipedia is well known as an online encyclopedia that anyone can edit. This has enabled a massive corpus of knowledge to be created, that has achieved high marks for accuracy, while also recognizing that at any one moment some content may not be accurate, as anyone may have entered in recent changes. Indeed, one of the key struggles that Wikipedia has dealt with over the years is with so-called “vandals” who change a page not to improve the quality of an entry, but to deliberately decrease the quality.

In late 2015, the Wikimedia Foundation, which runs Wikipedia, announced an artificial intelligence tool, called ORES (Objective Revision Evaluation Service) which they hoped might be useful to effectively pre-score edits for the various volunteer editors so they could catch vandalism quicker.

ORES brings automated edit and article quality classification to everyone via a set of open Application Programming Interfaces (APIs). The system works by training models against edit- and article-quality assessments made by Wikipedians and generating automated scores for every single edit and article.

What’s the predicted probability that a specific edit be damaging? You can now get a quick answer to this question. ORES allows you to specify a project (e.g. English Wikipedia), a model (e.g. the damage detection model), and one or more revisions. The API returns an easily consumable response in JSON format:

Wikipedia page with code exposed

The system was not designed, necessarily, to be user-facing, but rather as a system that others could build tools on top of to help with the editing process. Thus it was designed to feed some of its output into other existing and future tools.

Part of the goal of the system, according to the person who created it, Aaron Halfaker, was to hopefully make it easier to teach new editors how to be productive editors on Wikipedia. There was a concern that more and more of the site was controlled by an increasingly small number of volunteers, and new entrants were scared off, sometimes by the various arcane rules. Thus, rather than seeing ORES as a tool for automating content moderation, or as a tool for “quality control” over edits, Halfaker saw it more as a tool to help experienced editors better guide new, well-meaning, but perhaps inexperienced editors in ways to improve.

The motivation for Mr. Halfaker and the Wikimedia Foundation wasn’t to smack contributors on the wrist for getting things wrong. “I think we who engineer tools for social communities, have a responsibility to the communities we are working with to empower them,” Mr. Halfaker said. After all, Wikipedia already has three AI systems working well on the site’s quality control, Huggle, STiki and ClueBot NG.

“I don’t want to build the next quality control tool. What I’d rather do is give people the signal and let them work with it,“ Mr. Halfaker said.

The artificial intelligence essentially works on two axes. It gives edits two scores: first, the likelihood that it’s a damaging edit, and, second, the odds that it was an edit made in good faith or not. If contributors make bad edits in good faith, the hope is that someone more experienced in the community will reach out to them to help them understand the mistake.

“If you have a sequence of bad scores, then you’re probably a vandal,” Mr. Halfaker said. “If you have a sequence of good scores with a couple of bad ones, you’re probably a good faith contributor.”

Decisions to be made by Wikipedia:

  • How useful is artificial intelligence in helping to determine the quality of edits?
  • How best to implement a tool like ORES?
    • Should it automatically revert likely “bad” edits?
    • Should it be used for quality control?
    • Should it be a tool to just highlight edits for volunteers to review?
  • What is likely to encourage more editors to help keep Wikipedia as up to date and clean of vandalism?
  • What data do you train ORES on?  How do you validate the accuracy of the training data?

Questions and policy implications to consider:

  • Are there issues when, because the AI has scored something, the tendency is to assume the AI must be “correct”? How do you make sure the AI is accurate?
  • Does AI help bring on new editors or does it scare away new editors?
  • Are there ways to prevent inherent bias from being baked into any AI moderation system, especially one trained by existing moderators?


Halfaker, who later left Wikimedia to go to Microsoft Research, has published a few papers about ORES since it launched. In 2017, a paper by Halfaker and a few others noted that the tool was increasingly used over the previous three years.

The ORES service has been online since July 2015[27]. Since then, usage has steadily risen as we’ve developed and deployed new models and additional integrations are made by tool developers and researchers. Currently, ORES supports 78 different models and 37 different language-specific wikis.

Generally, we see 50 to 125 requests per minute from external tools that are using ORES’ predictions (excluding the MediaWiki extension that is more difficult to track). Sometimes these external requests will burst up to 400-500 requests per second

One thing they noticed was that those using the ORES output often wanted search through the metrics and set their own thresholds rather than accepting the hard coded ones in ORES:

Originally, when we developed ORES, we defined these threshold optimizations in our

deployment configuration. But eventually, it became apparent that our users wanted to

be able to search through fitness metrics to choose thresholds that matched their own

operational concerns. Adding new optimizations and redeploying quickly became a burden on us and a delay for our users. In response, we developed a syntax for requesting an optimization from ORES in realtime using fitness statistics from the models tests

The project also appeared to be successful in getting built into various editing tools, and possibly inspiring ideas for new editing quality tools:

Many tools for counter-vandalism in Wikipedia were already available when we developed ORES. Some of them made use of machine prediction (e.g. Huggle27, STiki, ClueBot NG), but most did not. Soon after we deployed ORES, many developers that had not previously included their own prediction models in their tools were quick to adopt ORES. For example, RealTime Recent Changes includes ORES predictions along-side their realtime interface and FastButtons, a Portuguese Wikipedia gadget, began displaying ORES predictions next to their buttons for quick reviewing and reverting damaging edits.

Other tools that were not targeted at counter-vandalism also found ORES predictions—

specifically that of article quality (wp10)—useful. For example, RATER,30 a gadget for

supporting the assessment of article quality began to include ORES predictions to help their users assess the quality of articles and SuggestBot,31[5] a robot for suggesting articles to an editor, began including ORES predictions in their tables of recommendations.

Many new tools have been developed since ORES was released that may not have been developed at all otherwise. For example, the Wikimedia Foundation product department developed a complete redesign on MediaWiki’s Special:RecentChanges interface that implements a set of powerful filters and highlighting. They took the ORES Review Tool to it’s logical conclusion with an initiative that they referred to as Edit Review Filters. In this interface, ORES scores are prominently featured at the top of the list of available features, and they have been highlighted as one of the main benefits of the new interface to the editing community.

In a later paper, Halfaker explored, among other things, concerns about how AI systems like ORES might reinforce inherent bias.

A 2016 ProPublica investigation [4] raised serious allegations of racial biases in a ML-based tool sold to criminal courts across the US. The COMPAS system by Northpointe, Inc. produced risk scores for defendants charged with a crime, to be used to assist judges in determining if defendants should be released on bail or held in jail until their trial. This exposé began a wave of academic research, legal challenges, journalism, and organizing about a range of similar commercial software tools that have saturated the criminal justice system. Academic debates followed over what it meant

for such a system to be “fair” or “biased”. As Mulligan et al. discuss, debates over these “essentially contested concepts” often focused on competing mathematically-defined

criteria, like equality of false positives between groups, etc.

When we examine COMPAS, we must admit that we feel an uneasy comparison between how it operates and how ORES is used for content moderation in Wikipedia. Of course, decisions about what is kept or removed from Wikipedia are of a different kind of social consequence than decisions about who is jailed by the state. However, just as ORES gives Wikipedia’s human patrollers a score intended to influence their gatekeeping decisions, so does COMPAS give judges a similarly functioning score. Both are trained on data that assumes a knowable ground truth for the question to be answered by the classifier. Often this data is taken from prior decisions, heavily relying

on found traces produced by a multitude of different individuals, who brought quite different assumptions and frameworks to bear when originally making those decisions

Written by The Copia Institute, September 2020

Copia logo