Google’s DeepMind, Chan Zuckerberg Biohub hail separate scientific breakthroughs in AI

After the project, the AlphaFold database, which DeepMind (a Google sister company) created with EMBL-EBI, contains more than 200 million protein structures, up from less than 1 million previously.

DeepMind and EMBL-EBI shared details last week of the AlphaFold expansion shortly after researchers at the Chan Zuckerberg Biohub published a paper on a deep learning method to provide information about the location and function of proteins within a cell, which they say could speed up research time for cell biologists and accelerate drug discovery and screening.

The DeepMind-EMBL-EBI project was based on the launch of the AlphaFold database in July 2021. At launch, the database featured more than 350,000 protein structure predictions, including all human proteins. The collaborators later added 27 new proteomes, including 17 related to neglected tropical diseases.

The partners have now expanded the database to cover almost all protein sequences in the UniProt database, meaning it includes almost every organism on Earth that has had its genome sequenced. The expanded database covers proteins made by plants, bacteria, animals and other organisms that may be relevant to research on sustainability, food insecurity and neglected diseases.

“We launched AlphaFold with the hope that other teams could learn from and leverage the advances we made, and it’s been exciting to see that happen so quickly. Many other AI research organizations have entered the field and are building on the advances of “AlphaFold to create more breakthroughs. This is truly a new era in structural biology, and AI-based methods will drive incredible progress,” John Jumper, research scientist and AlphaFold lead at DeepMind, said in a statement.

Separately, the Chan Zuckerberg Biohub project, details of which were published in Nature Methods, involved the development of a machine learning method for quantitative analysis and comparison of protein microscopy images without prior knowledge .

Instead of training the algorithm by showing examples, the researchers developed a system capable of monitoring its own learning. Self-learning reduces the manual work involved in setting up an algorithm and addresses the bias that can occur when humans choose the images used to train the model. The self-monitoring approach worked better than expected.

“The degree of detail in protein localization was much higher than we would have thought,” said Manuel Leonetti, PhD, study co-author and group leader at CZ Biohub, in a statement. “The machine transforms each protein image into a mathematical vector. So you can start to classify images that look alike. We realized that by doing this we could predict, with high specificity, proteins that work together in the cell just by comparing the his pictures. , which was a little surprising.”

While there has been previous research on protein imaging using self-supervised or unsupervised models, the study authors say this is the first time self-supervised learning has been used so successfully on such a large dataset of more than 1 million images covering more than 1,300 proteins measured from living human cells.

Leave a Comment Cancel Reply