Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI's impact on biology research: Part I, today, published by octopocta on December 27, 2023 on LessWrong.
I'm a biology PhD, and have been working in tech for a number of years. I want to show why I believe that biological research is the most near term, high value application of machine learning. This has...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI's impact on biology research: Part I, today, published by octopocta on December 27, 2023 on LessWrong.
I'm a biology PhD, and have been working in tech for a number of years. I want to show why I believe that biological research is the most near term, high value application of machine learning. This has profound implications for human health, industrial development, and the fate of the world.
In this article I explain the current discoveries that machine learning has enabled in biology. In the next article I will consider what this implies will happen in the near term without major improvements in AI, along with my speculations about how our expectations that underlie our regulatory and business norms will fail. Finally, my last article will examine the longer term possibilities for machine learning and biology, including crazy but plausible sci-fi speculation.
TL;DR
Biology is complex, and the potential space of biological solutions to chemical, environmental, and other challenges is incredibly large. Biological research generates huge, well labeled datasets at low cost. This is a perfect fit with current machine learning approaches. Humans without computational assistance have very limited ability to understand biological systems enough to simulate, manipulate, and generate them. However, machine learning is giving us tools to do all of the above. This means things that have been constrained by human limits such as drug discovery or protein structure are suddenly unconstrained, turning a paucity of results into a superabundance in one step.
Biology and data
Biological research has been using technology to collect vast datasets since the bioinformatics revolution of the 1990's. DNA sequencing costs have dropped by 6 orders of magnitude in 20 years ($100,000,000 dollars per human genome to $1000 dollars per genome)[1]. Microarrays allowed researchers to measure changes in mRNA expression in response to different experimental conditions across the entire genome of many species. High throughput cell sorting, robotic multi-well assays, proteomics chips, automated microscopy, and many more technologies generate petabytes of data.
As a result, biologists have been using computational tools to analyze and manipulate big datasets for over 30 years. Labs create, use, and share programs. Grad students are quick to adapt open source software, and lead researchers have been investing in powerful computational resources. There is a strong culture of adopting new technology, and this extends to machine learning.
Leading Machine Learning experts want to solve biology
Computer researchers have long been interested in applying computational resources to solve biological problems. Hedge fund billionaire David E. Shaw intentionally started a hedge fund so that he could fund computational biology research[2]. Demis Hassabis, Deepmind founder, is a PhD neuroscientist. Under his leadership Deepmind has made biological research a major priority, spinning off Isomorphic Labs[3] focused on drug discovery.
The Chan Zuckerberg Institute is devoted to enabling computational research in biology and medicine to "cure, prevent, or manage all diseases by the end of this century"[4]. This shows that the highest level of machine learning research is being devoted to biological problems.
What have we discovered so far?
In 2020, Deepmind showed accuracy equal to the best physical methods of protein structure measurement at the CASP 14 protein folding prediction contest with their AlphaFold2 program.[5] This result "solved the protein folding problem"[6] for the large majority of proteins, showing that they could generate a high quality, biologically accurate 3D protein structure given the DNA sequence that encodes the protein.
Deepmind then used AlphaFold2 to generate structures for all proteins kn...
View more