Springer Nature, the world’s second biggest academic publisher, has published the first scholarly book authored entirely by machines. The book, which is free to read and download, consists of four chapters summarizing studies about lithium-ion batteries. It is based on 150 papers published between 2016 and 2018 on SpringerLink, the publisher’s database of more than 1,200 scholarly journals.
“We thought it would be a nice way to provide people with an overview of an entire scientific field,” says Christian Chiarcos, a computational linguist at Goethe University, who created the algorithm that wrote the book. “It’s like a dream come true to every PhD student who ever needed to write a literature survey.”
The algorithm, consisting of various components, sifts through studies analyzing keywords using similarity-based clustering, a computational technique often used in the fields of machine learning, pattern recognition, and image analysis. It groups together text on similar topics, producing succinct paraphrased summaries central to the topic of interest. According to Chiarcos, the only thing users need to provide beforehand is the number of topical chapters and sections they want the generated book to have.
“The book did a fairly good job in identifying the numerous materials [that have] been studied in the literature and provided a great summary of the research in each area across the field,” says Jun Liu, a chemical engineer at the University of Washington and the Pacific Northwest National Laboratory. “This is very impressive.”
Although it will never have the quality of a human-written product, it might be enough to get some insights, Chiarcos says. “Of course, it cannot reflect actual intellect,” he notes, adding that he envisions researchers using tools like his when they want to get a quick glance into a new area.
Springer Nature and Chiarcos are also considering publishing a social sciences book using a tweaked algorithm. Chiarcos says the process will inevitably vary between different disciplines. “For example, chemists didn’t want us to summarize experiments which makes sense because if you miss some step in between, an experiment might fail,” he notes.
Copyright policies of the manuscripts used to write the book were not much of an issue, Chiarcos says: “We’re talking about facts so there’s very little originality involved.” For the newly generated book, Springer Nature retains the copyright.
Henning Schoenenberger, director of product data and metadata management at Springer Nature who is leading the machine-generated book pilot, says the company deliberately didn’t manually polish or copyedit the book. “It was our intention to highlight the current status and the remaining boundaries of machine-generated content,” he explains.
Generating content using machines can save time and effort and help fix the problem of information overload by providing fast structured summaries of a field, notes Schoenenberger. “We believe that the future holds a wide range of options to create content – from entirely human-created content to a variety of blended man-machine text generation to entirely machine-generated text.”
Chiarcos thinks there’s now a market for these machine-generated books, especially because they are cheap to create. “Based on demand, we can create anything within an hour,” he says.