It was not until 1957 that scientists gained special access to the third molecular dimension.
After 22 years of grueling experimentation, John Kendrew of the University of Cambridge finally discovered the 3D structure of a protein. It was a twisted blueprint of myoglobin, the fibrous chain of 154 amino acids that helps infuse our muscles with oxygen. As revolutionary as this discovery was, Kendrew did not finish opening the floodgates of protein architecture. Over the next decade, less than a dozen more would be identified.
Fast forward to today, 65 years since that Nobel Prize-winning breakthrough.
On Thursday, Google’s sister company DeepMind announced that it has successfully used artificial intelligence to predict the 3D structures of nearly every cataloged protein known to science. That’s more than 200 million proteins found in plants, bacteria, animals and humans—just about anything you can imagine.
“Essentially, you can think of it as covering the entire protein universe,” Demis Hassabis, founder and CEO of DeepMind, told reporters this week.
It’s thanks to AlphaFold, DeepMind’s innovative AI system, that it has an open-source database for scientists around the world to engage in their research at will and for free. Since the official launch of AlphaFold in July of last year, when it had located only about 350,000 3D proteins, the program has made a remarkable impact on the research landscape.
“More than 500,000 researchers and biologists have used the database to visualize more than 2 million structures,” Hassabis said. “And these predictive structures have helped scientists make brilliant new discoveries.”
In April, for example, scientists at Yale University called on AlphaFold’s database to help them in their goal of developing a new, highly effective malaria vaccine. And in July last year, scientists at the University of Portsmouth used the system to design enzymes that will fight single-use plastic pollution.
“This moved us a year ahead of where we were, if not two,” John McGeehan, director of the Portsmouth Center for Enzyme Innovation and a researcher on the latest study, told the New York Times.
The 3D structure of vitellogenin, which forms the egg yolk.
DeepMind
These efforts are just a small sample of AlphaFold’s ultimate reach.
“In the past year alone, there have been more than a thousand scientific papers on a wide range of research topics using AlphaFold structures; I’ve never seen anything like it,” Sameer Velankar, DeepMind contributor and leader of team of the European Laboratory of Molecular Biology. Protein Data Bank, said in a press release.
Others who have used the database, according to Hassabis, include those trying to improve our understanding of Parkinson’s disease, people hoping to protect bee health and even some looking to gain valuable insight into evolution human
“AlphaFold is already changing the way we think about the survival of molecules in the fossil record, and I can see it soon becoming a fundamental tool for researchers working not only in evolutionary biology but also in archeology and other paleosciences” , said Beatrice. Demarchi, associate professor at the University of Turin, who recently used the system in a study on an ancient egg controversy, said in a press release.
In the coming years, DeepMind also plans to partner with teams from the Drugs For Neglected Diseases Initiative and the World Health Organization, with the goal of finding cures for understudied but widespread tropical diseases , such as Chagas disease and leishmaniasis.
“It will get a lot of researchers around the world thinking about what experiments they could do,” Ewan Birney, a DeepMind collaborator and EMBL deputy director, told reporters. “And think about what’s happening in the organisms and systems they study.”
Locks and keys
So why do so many scientific advances depend on this treasure trove of 3D protein modeling? Let’s explain.
Let’s say you’re trying to make a key that fits perfectly into a lock. But you have no way to see the structure of this lock. All you know is that this lock exists, some data about its materials, and maybe some numerical information about the size of each ridge and the type of where those ridges should be.
Developing such a key would not be impossible, perhaps, but it would be quite difficult. Keys must be accurate or they won’t work. So before you start, you’d probably do your best to model a few different mock locks with the information you have so you can make your key.
In this analogy, the lock is a protein and the key is a small molecule that binds to that protein.
For scientists, whether they are doctors trying to develop new medicines or botanists dissecting the anatomy of plants to make fertilizers, the interaction between certain molecules and proteins is crucial.
With drugs, for example, the specific way a drug molecule binds to a protein could be the tipping point for whether it works. This interaction is complicated by the fact that although proteins are just chains of amino acids, they are not straight or flat. They inevitably get bent, bent, and sometimes tangled around them, like headphone cables in your pocket.
In fact, the unique folds of a protein dictate how it functions, and even the smallest misfolding errors in the human body can lead to disease.
But back to small molecule drugs, sometimes pieces of a folded protein don’t bind to a drug. They might be folded in a weird way that makes them inaccessible, for example. Things like this are very important information for scientists trying to get their drug molecule to stick. “I think it’s true that almost every drug that has come on the market in recent years has been designed, in part, using knowledge of protein structures,” said Janet Thornton, a research scientist at EMBL, in the conference
This is why researchers typically spend an incredible amount of time and effort decoding the folded 3D structure of a protein they are working with in the same way that you would begin your journey to create keys by assembling the lock mold. If you know the exact structure, it will be much easier to know where and how a molecule would bind to a given protein, as well as how that binding might affect the folding of the protein in response.
But this effort is not simple. Or cheap
“The cost of solving a new, unique structure is on the order of $100,000,” Steve Darnell, a structural and computational biologist at the University of Wisconsin and a researcher at the bioinformatics company DNAStar, said in a statement.
This is because the solution usually comes from super complicated lab experiments.
Kendrew, for example, took advantage of a technique called X-ray crystallography back in the day. Basically, this method requires you to take solid crystals of the protein of interest, place them in an X-ray beam, and look to see what pattern the beam makes. This pattern is practically the position of thousands of atoms within the crystal. Only then can you use the pattern to discover the structure of a protein.
There is also the newer technique known as cryo-electron microscopy. This is similar to X-ray crystallography, except that the protein sample is directly imaged with electrons instead of an X-ray beam. And although it is considered much higher in resolution than the other technique , cannot penetrate exactly everything. Also, in the field of technology, some have attempted to digitally create protein folding structures. But the first attempts, like some attempts in the 80s and 90s, were not great. As you can imagine, laboratory methods are also tedious and difficult.
Over the years, these barriers have led to what is called the “protein folding problem.” Scientists simply don’t know how proteins fold, and have faced significant hurdles to overcome this problem.
AlphaFold’s AI could be a game changer.
A diagram provided by DeepMind of the explosive growth of the AlphaFold database, by species.
DeepMind
Solving the “folding problem”
In short, AlphaFold was trained by DeepMind engineers to predict protein structures without requiring a laboratory presence. No crystals, no shooting electrons, no $100,000 experiments.
To get AlphaFold where it is today, first, according to the company’s website, the system was exposed to 100,000 known protein folding structures. Then, over time, he began to learn how to decode the rest.
It’s really as simple as that. (Well, apart from the talent that went into coding the AI.)
“It takes, I don’t know, a minimum of $20,000 and a huge amount of time to crystallize a…