What Crowdsourcing Really Means

Last week, Reddit failed to find the Boston bombers. This failure has been widely decried as a ‘failure of crowdsourcing”, and not just a small failure, a gross one. Nothing that Reddit did helped to catch the bombers, and the product of its much-publicized efforts was to denounce and then misidentify at least one innocent bystander. The results of this were obviously negative, both internal to the investigation (the police were forced to release pictures of the suspects earlier than intended to help quell the online furore) and external (in that entirely innocent people were wrongly labelled as terrorists). While what happened on Reddit was certainly a ‘crowd-failure’, it wasn’t a failure of crowdsourcing for one important reason – it wasn’t crowdsourcing at all.

Crowdsourcing uses a mass of separate individuals as an organic computer to solve a specific set of problems or questions. There are two distinct kinds of crowdsourcing; the first is active, which makes use of mass cognitive surplus (brain power not otherwise employed meaningfully, a term coined by Clay Shirky) to solve a problem. The second form is passive (like an app collecting data in the background during normal use), which repurposes existing behaviour and uses it to collect information without the agent having to do anything different. For crowdsourcing to function, it is necessary for the ‘organic computer’ to be issued with a set of instructions. Much like any computer, the ‘crowd’ needs a user.

Reddit set out to compile evidence and to form a collective judgement. This was Reddit functioning as a kind of public sphere (a place for the deliberation of evidence in order to form conclusions), rather than an organic computer. In order to clarify this point, I’m going to give an example of how Reddit users could have been used to crowdsource information on the Boston bombings:

The FBI make a PSA asking for help going through photos of the marathon looking for anyone carrying a backpack and wearing a backwards baseball cap. Tens of thousands of independent Redditors take up the call, trawling through photos and identifying men who fit these characteristics and sending them to the FBI.

This would have been active crowdsourcing working perfectly, using the cognitive surplus of thousands of people to analyse a data set. Even if none of the results helped to catch the bombers, the process would still have been a success. The crucial elements here that make it different from what actually happened, and therefore qualify it as crowdsourcing, are that there is an external source for the problem to be solved, and that there is no (or limited) discussion between the agents at work solving the problem.

Crowdsourcing works best when the crowd is completely atomised, when it is a collection of independent agents working to a clearly defined set of parameters. Once the crowd-computer can talk amongst itself it becomes less efficient (in that certain data points get reflected and overemphasised through the discussion) and the moment it starts setting itself the questions, becoming both user and processor, it stops being a computer at all. That’s the point at which crowdsourcing stops and a rough kind of public democratic discourse takes over.

What happened on Reddit wasn’t crowdsourcing because it wasn’t a crowd being used to solve an external problem. It was communicative action at work, a group-discussion aimed at forming a positive conclusion. Crowdsourcing was created by the Internet, what happened on Reddit was simply enabled by it. What happened on Reddit was an old phenomenon using a newish medium, crowdsourcing is a newish phenomenon using a newish medium – it is important not to confuse the two.

Many thanks to /u/OhioFury for his post on Reddit and Charles Arthur in the Guardian for stimulating some of the thoughts (and supplying some of the info) that led to me writing this post.

