The new machine learning algorithm finds a characteristic genetic signature of tumors

How do cancer cells differentiate from healthy ones? A new machine learning algorithm called “ikarus” knows the answer, reports a team led by MDC bioinformatics Altuna Akalin in the journal Genome Biology. The AI ​​program has found a characteristic genetic signature of tumors.

When it comes to identifying patterns in mountains of data, humans are no match for artificial intelligence (AI). In particular, a branch of AI called machine learning is often used to find regularities in data sets, whether for stock market analysis, image and voice recognition, or cell classification. lules. To reliably distinguish cancer cells from healthy ones, a team led by Dr. Altuna Akalin, head of the Bioinformatics and Omics Data Science Platform at the Max Delbrück Center for Molecular Medicine of the Helmholtz Association (MDC), has now developed a machine learning program. called “ikarus”. The program found a pattern in tumor cells that is common to different types of cancer, consisting of a characteristic combination of genes. According to the team’s paper in the journal Genome Biology, the algorithm also detected types of genes in the pattern that had never before been clearly linked to cancer.

Machine learning essentially means that an algorithm uses training data to learn how to answer certain questions on its own. He does this by looking for patterns in the data to help him solve problems. After the training phase, the system can generalize from what it has learned to evaluate unknown data.

It was a major challenge to obtain adequate training data where experts had already clearly distinguished between “healthy” and “cancerous” cells.


Jan Dohmen, first author of the article

A surprisingly high success rate

In addition, single-cell sequencing datasets are often noisy. This means that the information they contain about the molecular characteristics of individual cells is not very accurate, perhaps because a different number of genes are detected in each cell or because samples are not always processed in the same way. As reported by Dohmen and his colleague, Dr. Vedran Franke, co-head of the study, they reviewed countless publications and contacted many research groups in order to obtain appropriate data sets. Finally, the team used data from lung and colorectal cancer cells to train the algorithm before applying it to data sets from other types of tumors.

In the training phase, ikarus had to find a list of characteristic genes which he then used to categorize the cells. “We’ve tried and perfected several approaches,” Dohmen says. It was a time-consuming job, as the three scientists explain. “The key was for ikarus to finally use two lists: one for cancer genes and one for genes from other cells,” says Franke. After the learning phase, the algorithm was also able to reliably distinguish between healthy and tumor cells in other types of cancer, such as tissue samples from patients with liver cancer or neuroblastoma. Its success rate tended to be extraordinarily high, which surprised even the research group. “We didn’t expect a common signature to define tumor cells from different types of cancer so accurately,” says Akalin. “But we still can’t say if the method works for all types of cancer,” Dohmen adds. To make ikarus a reliable tool for diagnosing cancer, researchers now want to test it on other types of tumors.

AI as a fully automated diagnostic tool

The project aims to go far beyond the classification of “healthy” versus “cancerous” cells. In initial tests, ikarus already showed that the method can also distinguish other types (and certain subtypes) of tumor cells. “We want to make the approach more complete,” says Akalin, “by developing it even further so that it can distinguish between all possible cell types in a biopsy.”

In hospitals, pathologists only tend to examine tumor tissue samples under a microscope to identify different cell types. It is a laborious and time consuming job. With ikarus, this step could one day become a fully automated process. In addition, Akalin points out, the data could be used to draw conclusions about the immediate environment of the tumor. And that could help doctors choose the best therapy. For the composition of cancerous tissue and the microenvironment often indicates whether a particular treatment or medication will be effective or not. In addition, AI can also be helpful in developing new drugs. “Ikarus allows us to identify genes that are potential cancer drivers,” says Akalin. New therapeutic agents could then be used to target these molecular structures.

Home-office collaboration

A notable aspect of the publication is that it was prepared in its entirety during the COVID pandemic. Not everyone involved was at their usual tables at the Berlin Institute of Medical Systems Biology (BIMSB), which is part of the MDC. Instead, they were in the home offices and only communicated digitally. Therefore, according to Franke, “the project demonstrates that a digital structure can be created to facilitate scientific work under these conditions.”

Source:

Max Delbrück Center for Molecular Medicine of the Helmholtz Association

Magazine reference:

Dohmen, J., et al. (2022) Unicellular tumor cell identification through machine learning. Genome biology. doi.org/10.1186/s13059-022-02683-1.

Leave a Comment

Your email address will not be published. Required fields are marked *