On November 17th, 2005, an anonymous Wikipedia user deleted 15 paragraphs from an article on e-voting machine-vendor Diebold, excising an entire section critical of the company's machines. While anonymous, such changes typically leave behind digital fingerprints offering hints about the contributor, such as the location of the computer used to make the edits.
In this case, the changes came from an IP address reserved for the corporate offices of Diebold itself. And it is far from an isolated case. A new data-mining service launched Monday traces millions of Wikipedia entries to their corporate sources, and for the first time puts comprehensive data behind longstanding suspicions of manipulation, which until now have surfaced only piecemeal in investigations of specific allegations.
Wikipedia Scanner-- the brainchild of Cal Tech computation and neural-systems graduate student Virgil Griffith -- offers users a searchable database that ties millions of anonymous Wikipedia edits to organizations where those edits apparently originated, by cross-referencing the edits with data on who owns the associated block of internet IP addresses.
Inspired by news last year that Congress members' offices had been editing their own entries, Griffith says he got curious, and wanted to know whether big companies and other organizations were doing things in a similarly self-interested vein.
"Everything's better if you do it on a huge scale, and automate it," he says with a grin.
This database is possible thanks to a combination of Wikipedia policies and (mostly) publicly available information.
The online encyclopedia allows anyone to make edits, but keeps detailed logs of all these changes. Users who are logged in are tracked only by their user name, but anonymous changes leave a public record of their IP address.
Share Your Sleuthing!
Cornered any companies polishing up their Wikipedia entries? Spotted any government spooks rewriting history? Try Virgil Griffith's Wikipedia Scanner yourself, then submit your finds and vote on other readers' discoveries here.
The organization also allows downloads of the complete Wikipedia, including records of all these changes.
Griffith thus downloaded the entire encyclopedia, isolating the XML-based records of anonymous changes and IP addresses. He then correlated those IP addresses with public net-address lookup services such as ARIN, as well as private domain-name data provided by IP2Location.com.
The result: A database of 34.4 million edits, performed by 2.6 million organizations or individuals ranging from the CIA to Microsoft to Congressional offices, now linked to the edits they or someone at their organization's net address has made.
Some of this appears to be transparently self-interested, either adding positive, press release-like material to entries, or deleting whole swaths of critical material.
Voting-machine company Diebold provides a good example of the latter, with someone at the company's IP address apparently deleting long paragraphs detailing the security industry's concerns over the integrity of their voting machines, and information about the company's CEO's fund-raising for President Bush.
The text, deleted in November 2005, was quickly restored by another Wikipedia contributor, who advised the anonymous editor, "Please stop removing content from Wikipedia. It is considered vandalism."
A Diebold Election Systems spokesman said he'd look into the matter but could not comment by press time.
Wal-Mart has a series of relatively small changes in 2005 that that burnish the company's image on its own entry while often leaving criticism in, changing a line that its wages are less than other retail stores to a note that it pays nearly double the minimum wage, for example. Another leaves activist criticism on community impact intact, while citing a "definitive" study showing Wal-Mart raised the total number of jobs in a community.
By John Borland 08.14.07 | 2:00 AM