Sam Lemonick should be commended for a nice article on machine learning—well researched and written and quite readable (C&EN, Aug. 27, page 16).
However, it glosses over the origins of the efforts before our current computer age—experimental design (ED). Efforts without proper ED often result in the problem of GIGO (garbage in, garbage out).
Two classic books on ED are “The Design and Analysis of Industrial Experiments,” edited by Owen L. Davies, and “Statistics for Experimenters,” by George E. P. Box, William G. Hunter, and J. Stuart Hunter.
Anyone desiring to understand the artificial intelligence (AI) verbiage flowing around data analysis would be well advised to read the Davies volume to understand the absolute necessity of proper ED and the Box, Hunter, and Hunter volume for a modernized version, which includes the early application of computers for crunching all the numbers. Both volumes make strong cases that without proper ED, one will inevitably end up mired in the GIGO problem.
Sadly, the discussion does not adequately describe how essential that first step is. Without it, one is merely engaging in black-box crunching without any sanity checking. My 35 years in industrial chemical research was benefited substantially by Davies and later by Box, Hunter, and Hunter when personal computers became available for routine use. While mainframes existed previously, they were not readily accessible to most of us.
In the course of my career, I often performed sanity checks by first using ED techniques and then later the linear algebra AI approach. However, I always used the ED technique, even for the AI analysis, so obtained quite similar results (usually looking for a sweet spot). Without ED, most people found the black-box approach of AI unsatisfying because it did not contribute any real learning to the process—just an “answer,” which did not teach anything about how to get there. We routinely found that for teaching purposes, the former was always superior because it allowed us to learn our way forward.
I am not trying to denigrate the use of advanced methods of computer analysis to shorten the computational steps necessary in the whole process. But to label that “machine learning” is misleading and perhaps even damaging to scientists trying to advance overall learning in their fields. Without understanding the ED step in the overall process, my view is that little actual learning will occur. And is not that the whole point of the scientific process?