Decentralized Medicine Research

Jul 30, 2023

How could this medicine development also be decentralized, not just the production?

Current way of creating new biosimilars or any molecules with cells is based on trial and error and extremely costly. It works by taking some cellular material and then generating mutations in future generations with the hope that one of these mutations would produce the results we want. In principle nature works exactly this way too but here the process is speeded up. This can be done with chemicals or say with sloppy PCR to create variations in DNA (as mentioned in previous post, sloppy-PCR is “failing” PCR process causing up to 2% errors but here used to generate and study mutations). Then we see what happened. Successful changes are selected for further iterations. Overwhelming majority of mutations do not survive or produce anything usable.

Basically, this works by trusting that some random DNA change will produce the needed new molecules with wanted properties.

A better approach combines machine learning with the tools for synthetic biology to enable the creation of new molecules that have the wanted properties.

This is how it works:

Training machine learning models starts by using the existing gene libraries and potentially augmenting with own research. The needed data includes information on the molecular structures of various proteins or antibodies (antibodies are proteins that cause immune response in body, these can for example be proteins on a virus surface) and what reactions they cause in the body. This data includes features like bond angles, atoms and molecular descriptors (molecular descriptors are the ways molecules are represented in numbers; a molecule is described in multiple ways, for example as as one dimensional list of atoms that make up it or as 3d representation). This data is potentially further preprocessed and transformed into features suitable for training.

Machine learning models are then trained on this data. These algorithms learn to identify patterns and relationships between the molecular features (shapes of the molecule) and what reactions they cause. Reactions can be desired or unwanted.

Models need to be validated using separate datasets to understand how well they predict previously unseen data. Models that perform well are selected for further analysis and optimisation until they are accurate enough.

Finally validated ML models can be used in production to predict the biological activities of new molecules. Researchers focus on best candidates that models estimate to have desirable therapeutic properties.

One goal is also to reduce the risk that the new drug causes immune response (this is called immunogenicity). ML models can also predict parts in protein structure that can cause this. Once identified, the researchers can try to change the structure to avoid this.

Synthetic biology meaning synthetizing the DNA sequence and adding it to a production microbe with gene “scissors” can then be applied to prepare for mass production.

Basically we have a machine learning model (software) that can be used to find fast DNA patterns that can be added to living cells so that the cells produce some molecule that works as a medicine in human body treating diseases.

As a formula this is simple enough, doing this is naturally much harder:

Synthetic Biology + Machine Learning => accelerate design of materials and biosimilars

The aim here is to enable production of superior medicines and biomaterials. The created materials can be for example biosimilar drugs, medical implants (heart valves, stents, ligaments, tendons), sutures and staples for wound closures, biosensors or just stretchable fabrics for sportswear (elastane like materials etc.)

How then to know where to start the process? It would be natural to start with the gene that is known to produce the needed molecule and iterate from it. Make small changes for example by simplifying it, changing orders of bases, adding parts from other genes, combining parts from several sequences that are known to affect the same end result or starting from sequences that produce molecules that have similar but not quite right shape and iterate those. At each step evaluate results and work with most promising ones further while adding some completely new randomness to the mix as well. This is in essence how generative design works.

As a diagram:

Seed proteins→generate variants→predict properties→pick best→iterate until good enough→verify

If the predictive models are open sourced, this would allow research into new medicines and also production of all kinds of materials with living cells to be decentralised and done anywhere in the world.

And as most of the research is done in universities and research institutes funded by the happy tax payer, why wouldn’t they.

Next: Role of Doctors

Schrodinger Mind by Martti Ylikoski

Discussion about this post