If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



Crowdsourcing For Science

Internet efforts digitize old data, classify images, and contribute information

by Cheryl Hogue
July 4, 2011 | A version of this story appeared in Volume 89, Issue 27

The website PatientsLikeMe allows people with chronic health problems to share information about their conditions, the drugs they take, and side effects. The company that runs the site also sells crowdsourced data to pharmaceutical companies.
Screen grab of web site.
The website PatientsLikeMe allows people with chronic health problems to share information about their conditions, the drugs they take, and side effects. The company that runs the site also sells crowdsourced data to pharmaceutical companies.

Crowdsourcing—using an unidentified group of people to conduct a task online—may not be what traditionally springs to mind as a way to gather or improve scientific data. Yet crowdsourcing is contributing useful scientific knowledge for applications from drug discovery to reducing uncertainty about trends in climate change to speeding development of military vehicles.

The National Research Council is recognizing that the collection and refinement of scientific data is growing through websites and e-mail. To raise awareness about this trend, NRC’s Board on Research Data & Information held a symposium last month. There, speakers described how some organizations, including the U.S. government, use crowdsourcing to improve the quality of scientific information.

Through crowdsourcing, volunteers can make observations and record them, do corrections, classify images, perform computations, and catalog information, said Michael Lesk, outgoing chair of the board. He is a professor at Rutgers University’s department of library and information science.

One key contribution of crowdsourcing is translating paper records into digital form, Lesk said at the symposium. For instance, through Project Gutenberg, which digitizes books in the public domain, volunteers proofread material converted from the printed page through optical recognition software and correct errors, Lesk pointed out.

In other cases, volunteers key in information originally recorded on paper, rendering digital data that researchers can analyze. This includes weather observation data, said Scott A. Hausman, deputy director of the National Oceanic & Atmospheric Administration’s National Climatic Data Center. Many weather observations recorded before the 1990s are on paper, he noted.

Although scanning these records creates digital images that preserve the data, it doesn’t always yield information that can be accessible to and usable by researchers, Hausman continued. In addition, optical character recognition software is useless for digitizing records dating to the 1800s that are handwritten in ornate script. People need to view these records and key the data they contain into a computer.

To convert old weather observations into data that help strengthen the historic climate records, NOAA is working with the Citizen Science Alliance, a collaboration of universities and museums working to involve the public in the process of science, Hausman said. The alliance grew out of the highly successful Galaxy Zoo project that allows online volunteers to sort images of uncategorized galaxies captured by the Hubble Space Telescope. More than 300,000 participants have sorted through the images, determining whether they show an elliptical or spiral galaxy.

Galaxy Zoo is part of a crowdsourced science site called Zooniverse. Now, at a portion of the Zooniverse site called Old Weather, volunteers are poring over scanned images of the handwritten logs of Royal Navy ships around the time of World War I. They are keying in the numbers and written information the logs contain.

During these war years, weather observations over the oceans virtually stopped, Hausman explained. Thus, the ships’ logs contain some of the only data available for marine weather during this time period.

To help engage volunteers in conversion of this information, Old Weather allows participants to get “promoted” as they key in more and more records, Hausman explained. And there’s a ranking—and competition—among volunteers who have processed the most records.

A major strength of the project is redundancy. Five to 10 volunteers analyze the same records, Hausman said. Project managers can see whether volunteers agree or disagree on, for example, whether a digit in the original record depicts a slim seven or a fancily scripted numeral one.

Old Weather will likely finish digitizing the ships’ weather logs within a year, Hausman said.

The volunteers’ work is extremely valuable, he emphasized. “If we paid for this, it’d cost millions.” Researchers will use these newly accessible data to refine climate-change models, Hausman added.

In an effort that blends the concepts behind Galaxy Zoo and Old Weather, NOAA is also turning to crowdsourced volunteers to classify satellite images of tropical cyclones, Hausman said. The agency has archived thousands of images of these storms since the 1970s—but they haven’t been processed to derive usable information.

Participants will view an image then answer a series of questions about the size of the storm, its shape, and the size of its eye, Hausman said. These answers will allow researchers to estimate a tropical cyclone’s intensity solely on the basis of satellite images. Like the newly digitized information in the ships’ logs, estimations of the intensity of past tropical cyclones will help improve climate-change research.

Engaged members of the public also provide original observational data that can add to science. For instance, people for decades have recorded and shared their weather and wildlife observations, said Roberta Balstad, vice chair of the board and a senior research scientist at Columbia University’s Earth Institute. These data have helped give rise to phenology, the study of how plant and animal life-cycle events—such as flowering or nesting—are influenced by variations in climate.

Sometimes, information assembled online by a crowd can be leveraged into a revenue stream for the organization that is providing the venue for information gathering.

For instance, the company PatientsLikeMe runs a website that collects information voluntarily supplied by people with chronic or progressive medical conditions such as epilepsy, HIV infection, fibromyalgia, mental illness, and amyotrophic lateral sclerosis (Lou Gehrig’s disease).

On this site, patients share information on the type and degree of the symptoms they experience, David Clifford, the head of public health and government affairs for PatientsLikeMe, told symposium attendees. They also describe experiences with drugs to treat their condition or symptoms, along with any side effects. Participants can use the site to track their individual conditions over time and use the information from others grappling with the same disease to help manage their health. The goal for patients, Clifford said, is to determine their best health outcome—given the progression of their disease—and how to obtain it.

PatientsLikeMe generates revenue from its website by selling the contributed data to companies that are developing or selling pharmaceuticals, medical devices, and medical services. The company tells participants up front how it makes its money and how it protects personally identifiable information from patients, Clifford said.

Another novel R&D opportunity emerging through crowdsourcing involves military vehicle design, according to Gregory D. Phelan, chair of the chemistry department at the State University of New York, Cortland.

The Defense Advanced Research Projects Agency (DARPA) recently used crowdsourcing to design a new off-road armored vehicle, Phelan said at the symposium. Traditionally, it has taken one to three years to develop design proposals for vehicles such as this. But using a blog for the public to offer innovative suggestions that companies were able to incorporate into their plans, design development took only a few months. DARPA is building on this effort, saying it will next explore how crowdsourcing could contribute to other areas of military manufacturing.

The Board on Research Data & Information will release a summary report of the symposium later in the year.  


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.