Protein Pieces [+]Enlarge Credit: PLOS BIOLOGY © 2006

A family of nearly 3,000 artificial cytochrome P450 enzymes has been created by a California Institute of Technology team's efforts to recombine sections of three natural cytochrome P450s, a large family of oxidative enzymes that are widespread in nature (PLoS Biol., published online April 11, dx.doi.org/10.1371/journal.pbio.0040112). In humans, P450s play a crucial role in the metabolism of drugs and other toxins.

"I'm hoping that this new family will contain cytochrome P450s that people would want to use, for example, to make the human metabolites of drugs or to synthesize complex, biologically active compounds," says Frances H. Arnold, a chemical engineering professor at Caltech. "It's my dream to make a whole library of cytochrome P450s that could hydroxylate anything."

Arnold, graduate student Christopher R. Otey, and coworkers used a computational method called SCHEMA to guide the creation of the new protein sequences and increase the likelihood that they will be useful.

P450 enzymes are such a diverse family of enzymes that the usual method of random DNA shuffling would generate a set of new sequences in which most of the members would not fold, Arnold says. "You'd be screening garbage."

To generate P450 enzymes that catalyze new reactions, Arnold's team wanted the new proteins to be 70 to 100 amino acids different from the starting proteins, yet still fold properly. They made new proteins by recombining chunks of the P450 enzymes nature has provided, Arnold says. "It's a dual optimization problem. We recombine them to preserve as many structural interactions as possible, while at the same time making lots of mutations."

Schemers [+]Enlarge Credit: Courtesy of Frances Arnold

That's where SCHEMA comes in. SCHEMA's job is to improve the likelihood that a given sequence will fold by considering the structures of the parent proteins. The crystal structures of the parent proteins are encoded mathematically to make counting broken interactions between side chains a simple calculation. "SCHEMA penalizes you for every broken contact. It says, 'Thou shalt make a library such that most of the members have few broken interactions,' " Arnold says. "At the end, you get a design that penalizes you the least."

The Caltech team chopped each of the three original enzymes into eight pieces and recombined them, yielding 6,561 (38) possible sequences. Of those, nearly half fold into properly functioning cytochrome P450s that can catalyze a reaction.

But the nonfolding proteins serve a useful purpose, too. The team used a mathematical technique known as logistic regression analysis, which relies on having sequences that don't fold and function in addition to sequences that do, to glean information about why particular sequences fold. "If you're trying to understand what it is about a sequence of amino acids that makes it into a functional protein, it's nice to have ones that weren't successful with which to compare," Arnold says.