DEALING WITH DATA OVERLOAD
Overwhelmed? Just wait a couple of years. That is as long as it will take, some futurists say, to generate as much information as has been accumulated throughout human history. In business, things move even faster. Industry analysts agree that overall corporate data storage needs are currently doubling on an annual basis. In pharmaceuticals, industry watchers estimate that it now takes only six to nine months to generate a volume of new information comparable to what is already stored in all drug industry libraries and computers worldwide.
Information overload generally stems from two very different sources: the influx of useful data resulting from scientific or technological breakthroughs, and the data overkill sparked by 1990s vintage software designed to support the broadest possible array of user industries.
The sources bear down heavily on the pharmaceutical industry, where the past five years have seen a revolution in data-intensive drug discovery technologies, and on chemicals, where continuous process manufacturers have had to install enterprise resource planning (ERP) systems that work for banks and car factories as well.
SYSTEM VENDORS, in response, are redesigning IT architecture yet again. Where in the 1990s, desktop networks emerged allowing users to build global computer infrastructures for finance, research, and manufacturing, the goal now in system design is to underpin this infrastructure with a network of widely accessible databases. In addition, ERP vendors are debuting software catered to specific markets like chemicals and pharmaceuticals.
But IT budgets in the chemical industry are tight. According to AMR Research, chief information officers in commodity, specialty, and fine chemicals will be lucky this year to maintain budgets at their 2003 level, an average of 2% of total company revenue. Still, there are a lot of projects under way. A recent survey by AMR shows that chemical industry investments in areas such as database installation and physical IT infrastructure are double those of the average for all manufacturing industries.
Pharmaceutical companies, on the other hand, are spending at the high end of manufacturing--5% of total revenue, according to AMR. Industry sources agree that regulatory compliance and heightened concern with efficiency are driving investments in the sector and that data management is a target area.
Much of the activity in database system development addresses the deployment of data repositories. IBM, for example, recently introduced a data network called DB2 Information Integrator, based on a technique the company has dubbed "database federation." It employs a series of local databases on a single backbone, allowing users to store data locally but access it from anywhere on the system rapidly through a series of automated queries.
According to Laura Haas, product development engineer, the system employs the best features of what had been the two options for global data management: a totally centralized data hub and completely dispersed data storage. "It lets you consolidate data or move it around when necessary," Haas says. "It gives you the feeling of a virtual database."
IBM's network uses industry communications protocols to link with nearly any commercially available data repository to establish a customizable relational database, Haas says. It can also access public databases. The data can be translated into an XML format, and users develop their own querying regimes, she says.
Indiana University School of Medicine's biomedical department is using the DB2 system to integrate publicly available data and genomics data generated in its own laboratories, a process that previously took several hours to do manually through a series of search queries. The routine has been automated using the IBM system and now can be done in less than one minute, says Craig Stewart, director of research and academic computing at Indiana.
The university, which as a test user helped in the design of DB2, is currently storing 10 gigabytes of data per day. It expects that to rise to 100 GB per day in two years, according to Haas.
The establishment of this kind of user control over how data enters an IT network, where it's stored, who uses it, and how it is used is now a guiding principle in software design. At Spotfire, a fast-rising firm in the field of data mining, the goal is to deliver data in a recognizable and usable format and to provide common access to information.
David Butler, vice president of product strategy and marketing at Spotfire, says it is important to realize that IT system design is not some kind of puzzle for computer whizzes. It is all about the users' competitiveness in business.
"Decision-making is more important than data integration," Butler says. "You need to solve business problems." A data network, therefore, needs to draw from a common database to deliver information to chemists, biologists, quality control engineers, and others in "the correct environment," Butler says. Data must come with everything workers in particular jobs need to make business decisions.
According to Butler and other vendor sources, chemical and pharmaceutical company IT organizations generally have no idea whether people are getting the right data, because they don't know what users want. This disconnect won't work in any advanced strategy for data management. Sources say, however, that designing data networks will inevitably tie IT departments more closely into business concerns, just as it will pull workers into the nuts and bolts of network design.
These days, in fact, the people heading up system design often come from a business or science background rather than from a strictly IT background. Eric Milgram, for example, is a chemist who heads the bioanalytics group for high-throughput screening and absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiling at Pfizer's La Jolla, Calif., research facility. A self-described data ninja, Milgram previously held jobs in IT system design and programming at the Centers for Disease Control & Prevention (CDC) and in customer service at K-Mart. He says both prepared him for setting up data analysis management for a tough internal customer group--researchers at Pfizer.
A BIG OBSTACLE, Milgram says, is the monolithic nature of data-handling IT. "There is a huge need for a system that can be applied to a global organization and that can be largely configured by the end user without code modifications and expensive IT investment every time there is a change in the scientific context," he says. "Let's say a drug metabolism group decides that instead of collecting four time-points on an in vitro metabolism study, it wants to collect six time-points. For some systems I've seen installed, that calls for a major change. It should be very easy to do without breaking your entire system."
Catering data delivery to user needs has been a kind of holy grail for Milgram, dating back to his work collecting data from mass spectrometers on the UNIX system at CDC in 1997. "We used to process the data on UNIX, put it on a floppy disk, move it over to a PC, do manual reformatting, and then upload it to a mainframe," he says. "I was appalled."
Milgram says he convinced CDC's IT managers that they could do most of the processing right on the UNIX system. "At first they looked at me like I was talking science fiction," he says. "But I spent a few weeks and wrote the code, and it became the natural way to do things."
He found a similar situation when he took a job in the private sector. In fact, Pfizer's IT predicament is typical of that of most large companies. "There is always a specialized piece of software that does statistical calculations very well," he says. "But getting the data managed and archived and into a format that is ready to be used has not been addressed. Pfizer had a nice informatics system for keeping track of all the combinatorial compounds that were made and of all the different chemical structures. But as far as integrating analytical data like purification and analytical quality control, none of that was integrated."
Milgram spent two years tracking applications and sequence list generators in order to set up data stations to do the processing. "In low throughput, that can be done by hand," he says. "But not when you have 10,000 chromatograms a week. You need an automated system."
Data analysis management is not conceptually difficult, Milgram says, but it is hard for commercial software to keep up with the needs of large companies. This is a problem for Pfizer, which has grown through a sequence of big mergers and now finds itself with a huge IT logistics problem. "It's really hard," he says. "It's like having a bunch of subcompanies within a big company. Our management is trying to remedy that, but it's not a simple problem. You just find work-arounds."
Lately, Milgram says he is making heavy use of Spotfire's DecisionSite software, which generates a kind of multidimensional spreadsheet, allowing users to view an array of data simultaneously. This is an aid to scientists who know how to work with multiple dimensions of data--or are willing to learn, he says.
The human factor is a major obstacle in science-based businesses where researchers are married to idiosyncratic computer routines and where there is particularly poor communication between IT systems staff and users, Milgram says. Scientists often give up on corporate IT, convinced they must fend for themselves. The Spotfire system, however, is gaining respect, he says, as users experience significant time reductions in experiments facilitated by customized "slicing and dicing" of data.
The DecisionSite software is also making headway in fine chemicals at Avecia, where its use is spreading from the computational chemistry arena to other parts of the company, according to Julian Cherryman, Avecia's head of computational chemistry. The firm's biologics business, for example, is taking a close look at it, he says.
Like Milgram, Cherryman points to the time it takes to prepare data before it can be worked with. "I reckon that half the effort is spent getting it into the right format," he says. "The other half is spent using it and testing it." The company has been chipping away at the up-front time by using Spotfire instead of "expert chemists and their intuition" to navigate databases. "It's hard to quantify," he says, "but there is a feeling that chemists are working more efficiently as far as time and coverage are concerned."
At BP Chemicals, DecisionSite is deployed at the company's plant in Hull, U.K., where it is used to manage data from manufacturing and research operations. According to Zaid Rawi, a process automation engineer at the plant, the rise of distributed intelligent controls--small microprocessors monitoring individual pumps and valves--has created an avalanche of data that is efficiently stored at most chemical plants, but not as easily accessed.
BP is running the DecisionSite software on a data warehouse that is hooked into the main digital control system running the plant. The software, which is also installed on operators' Windows-based desktop computers, is programmed with specific analysis routines that expedite and enhance monitoring and control functions. Data analysis management has improved researchers' understanding of lab results and helped detect and diagnose mechanical problems faster, Rawi says. It has also enhanced the lines of communication between lab and plant staff, he says. "It enables people to show others what they are talking about a lot faster. They can query the data in many dimensions."
Aegis, a supplier of data management systems for pharmaceutical producers, will introduce an upgrade to its Discoverant software this year that addresses the need to route plant information globally between manufacturing and business IT systems. Justin O. Neway, chief science officer at Aegis, calls version 3.0 "data agnostic"--able to monitor laboratory systems, manufacturing systems, ERP systems, and plant controls. He sees it as complementary to the Spotfire system in forming a link between manufacturing, research, and higher level business management IT.
Not that there is room for much more data at the high level. "I agree that the amount of data that is being captured is growing rapidly, but I don't agree that the amount of data that needs to be captured is growing rapidly," says Allen Look, director of global IT at privately held chemical maker Schenectady International.
"I think it's the result of a lot of companies running generic ERP systems that meet the needs of 25 different marketplaces--bakeries, engineering firms, financial management companies, discrete manufacturing operations. The fit is very loose because the system vendors are trying to serve many vertical markets."
Schenectady took rather drastic measures to remedy this last year when it pulled the plug on a major ERP project. Rather than complete installation of general-purpose software supplied by J.D. Edwards, at the time the second largest ERP supplier in the chemical industry after SAP, Schenectady decided to install a smaller, process industry-specific product supplied by Ross Systems.
"We realized the cost of capturing tremendous amounts of data that were not necessarily in line with what we wanted to do as a process manufacturer was extremely high for the value we got out of it," Look says. "We looked around for a focused system that met the needs of a chemical company and didn't do anything else."
Schenectady plans to install Ross's iRenaissance software at 34 locations in 14 countries over the next two-and-a-half years, Look says. This will cost approximately $75,000 per site, or $2.5 million, compared with the $3.5 million spent installing J.D. Edwards software at three sites: Freeport, Texas; Schenectady, N.Y.; and Toronto. The company had planned to install the J.D. Edwards software at all 34 locations. At the same time, it is switching from an Oracle server to Microsoft Windows 2003, which will lead to operational savings, Look says.
Schenectady pursued the J.D. Edwards project largely as a Y2K computer fix, Look says, and in 1997, there wasn't much choice in ERP software other than to buy a product overloaded with financial management applications. "I remember having to review 120 different options for tracking purchasing," he says. "We turned off all but 11 of them. Just having to get out of the way all the stuff that did not apply to us was 90 to 95% of the job. Features that didn't apply to us and shouldn't have been there to begin with were obtrusive during the implementation and even after."
The past three years, however, have seen the software market evolve toward simpler, more scalable products facilitated by advances in basic office computer software, such as Microsoft NT. "We expected to keep J.D. Edwards for 10 to 15 years," Look says, "but based on how rapidly the much cheaper platforms came along, and the need after the Internet bubble burst to really get competitive at the things we're best at, we didn't have any choice but to simplify."
Look says Schenectady management did not hesitate to sign off on a new ERP project. "The system simplifies our lives and should save $20 million over the next seven years just in maintenance and personnel costs," he says. "It was a slam dunk."
SAP and J.D. Edwards are fairly well entrenched in the chemical industry, however. Companies like Rohm and Haas and Shell Chemicals are pushing ahead on worldwide SAP installations, and there is little chance they'll turn back. The major ERP software vendors are also making a greater effort to design and implement software specifically for vertical markets. And while Ross has drawn its bull's eye exclusively around process manufacturing, the software is best suited for midsized companies like Schenectady, according to Scott McLeod, vice president of marketing at Ross.
J.D. Edwards has made significant moves in recent years. In 1999, it purchased Numetrix, an advanced planning and scheduling software firm working exclusively in process and batch manufacturing. J.D. Edwards, in turn, was purchased last year by PeopleSoft, an ERP vendor with little business in chemicals. Mark Wieber, director of strategy for process industries at PeopleSoft, says the acquisition will focus J.D. Edwards on developing better manufacturing software as its service industry products shift to the PeopleSoft line.
Wieber says some chemical firms have chosen J.D. Edwards over Ross for its strength in customer service, distribution, and equipment maintenance. Three years ago, Cabot unplugged a process manufacturing ERP system made by Marcam, now a division of Invensys, to install the J.D. Edwards system, Wieber says.
A scan of IT product development in recent years, in fact, reveals an unmistakable trend toward homing in on vertical markets.
For example, Sopheon, a knowledge management software firm that specializes in product development, introduced a system last October that touches all the bases. Version 5 of its Accolade software comes in configurations for several vertical markets, including chemicals, and incorporates a search engine that updates databases selectively, notifying appropriate personnel of significant changes.
The emergence of data analysis management and other methods of streamlining data will keep information managers busy in the years ahead, but chemical and pharmaceutical industry sources agree that recent advances are helping them concentrate on the real job at hand--improving competitiveness in tough markets. "I'm not losing sleep over IT issues," Schenectady's Look says. "I lose sleep over how we are going to differentiate ourselves in our customers' minds."