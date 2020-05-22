Follow these suggestions for digitizing and curating data to make the summer of COVID-19 as productive as possible.

1. Don’t be afraid to start. A spreadsheet with well-defined columns is a fine place to start. Each experiment gets a row, and each property has its own column. Plan to have a different column for everything that could vary in your experiment. Use consistent terminology for categories and names of things.

2. You don’t need full lab access. You can digitize and curate your data even if time in the lab is limited by social distancing. Use a cell phone or inexpensive USB document camera to take photos of notebook pages. It takes just one person to get the photos.

3. Capture every possible detail of your experiment. All the data and information on the notebook page should be captured. If everything makes it in, you’re less likely to have to re-enter data. Record raw data exactly as it is presented in the laboratory notebooks. Include a failed or incomplete reaction column that you can use to tag incompletely described experiments. Likewise, include a separate column to tag questionable reactions—for example, if the balance was faulty or reaction vessels leaked.

4. Use a systematic description scheme. Describe experiments in terms of the materials that are used (identities and quantities), the actions that are taken on them (types, durations, and settings), the human and machine actors involved (who performed the experiment and what type of instrument was used), observations during the final experiment, and the outcomes you are attempting to predict. Many such organizational schemata have been developed for the purpose of lab automation, but you don’t need robots to benefit from thinking about your results in this way.

5. Missing data are OK. Record all entries, even those with missing data. You will need to decide how to code these missing entries. Whatever you decide, record it in your documentation.

6. Don’t forget about metadata. Provenance—information about who did the experiment, when they did it, and where they did it—can be as important as the primary data, and often these data allow one to find unexpected relations within the dataset or correlations with other datasets. Record lab-notebook page numbers, or attach a digital photo of the notebook page to facilitate recovering old data or tracking down errors. And don’t forget to record who entered the data in the database.

7. Adopt standard naming conventions for molecules. Few people write their lab notebooks using full IUPAC names. Be sure that molecular entities use a standardized, machine-readable representation, such as SMILES or InChI. If your lab uses a set of standard abbreviations, use them, and include a glossary that defines them.

8. Document your data. It may seem obvious to you what the column names mean, but it will not be obvious to your collaborators. Keep a parallel document describing each column name, its definition, expected minima and maxima, and units (if applicable). If categorical data are denoted with a coding scheme (for example, 1 = red and 2 = blue), document those choices. If units remain constant within a column, capture this in the documentation. If units vary within a column, include a separate column for the units.

9. Look for errors. It is easy to enter the wrong data. To catch data entry errors, visualize the distribution of values in each column—a transposed digit or missing decimal point will be obvious. You can also randomly pick reactions for verification against the primary record. Whatever your method, start looking for errors sooner rather than later.

10. Start learning about programming, data curation, and machine learning. It’s easier than you think. There are excellent free, self-paced online resources from places like the Molecular Sciences Software Institute and the Carpentries that can help you get started.

11. Disseminate your work. If you have a clear hypothesis, use your data to pursue that goal, and publish your dataset along with that work in a traditional publication or on a preprint server. It is also possible to publish and cite datasets without claiming a hypothesis via specialized publications such as Scientific Data or data archiving services such as the Materials Data Facility. There are also services that can provide a citable DOI and searchable registry for materials uploaded to your own institutional repositories, such as DataCite.

12. Don’t be afraid to do it again. Accept the fact that you might miss something in the first attempt. You may have to go back and do it again. As Samuel Beckett wrote in his 1983 story “Worstward Ho,” “No matter. Try again. Fail again. Fail better.”