AlphaFold 3
This case study was written as a part of a group project for my Management of Technology Enterprises class in fall 2024.
Part I: Introduction
Underlying the biological functions of all living things are proteins. Google’s Deep Mind, in collaboration with Isomorphic Labs, has developed the AI tool, AlphaFold, that allows users to develop protein structures and then perform various types of protein research. The developers of AlphaFold, John Jumper and Demis Hassabis, won the 2024 Nobel prize in chemistry for their game changing contributions to science.
AlphaFold consists of the tool AlphaFold Server and the database AlphaFold DB. In this section, we’ll journey through the basics of how proteins work, how AlphaFold contributes to research, and a case study on antibiotic research keeping up with the millions of antibiotic-resistant bacteria developing in our modern world.
Part II: What are proteins? And why are they researched?
Proteins are molecular machines, within every cell, of all living things — they underpin every biological process and keep things operating through communication and facilitation. The structure of a protein is what determines the function of that protein. Being able to model these structures for research improves our understanding of health and disease.
A protein is essentially a string that has been folded into a 3-D shape, consisting of various curls, loops, and pleats — this is known as “spontaneous origami.” Each twist and turn produces something specific, contributing to the greater biological process. The string is composed of 20 different amino acids. The sequence and positioning of the amino acids are what cause the structure to take its shape. These strings can be combined with other structures like DNA, RNA, ligands, and ions to produce all kinds of biological structures.
Protein research has made significant impacts in how we discover and treat disease. For example, researchers were able to develop treatments for sickle-cell anemia by determining the structure of hemoglobin, the protein found in red blood cells that carries oxygen, and in turn understanding the mutation responsible for the disease. Another example is vaccine development. Researchers were able to determine the structure of the SARS-CoV-2 viral proteins. Through determining these protein structures, they were able to understand the virus, its treatments, and eventually development a vaccine. (EMBLA-EBI, 2022)
For the past 60-years, research has determined the structures of over 180,000 proteins in atomic detail (EMBLA-EBI, 2022). Databases have existed containing this information for researchers, but the interface hasn’t been the easiest to use. AI and AlphaFold is helping to speed up that research, by creating a platform that’s open to the public and user friendly. As of 2024, AlphaFold 3’s database contains 200 million proteins (AlphaFold, 2024).. In 2021, the original AlphaFold contained 100,000 protein structures (Jumper et al., 2021). As you can see, research is developing at an exponential rate.
Part III: How does AlphaFold work?
At the source of the AlphaFold model is deep learning and neural network machine learning. Neural networks are types of statistical models consisting of layers of data processing and have been known to produce significantly accurate predictions. Each layer’s output acts as the input to the next layer, similar to how neurons fire. The parameters of these layers can be fine-tuned, and faulty calculations can be backpropagated through the layers to enhance accuracy (Stewart, 2020).
Different types of neural network models exist for different types of data — like audio and images. “Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm” (Jumper et al., 2021).
AlphaFold DB contains vast amounts of protein structure data that can be used to make predictions on new protein structures in the AlphaFold Server. AlphaFold Server is an interactive tool where researchers can input various molecular structures, like a protein’s amino acid sequence, DNA, RNA, ligands, and ions, and then model the structure those modular entities create. The tool is very easy to use and does not require any understanding of the underlying model or math (DeepMind, 2024).
Figure 2 shows the final product — the structure with all its twists and turns modeled from each entity and how they interact. Each color represents the level of confidence the model has in how that entity is structured.
AlphaFold’s accuracy was tested in the 14th Critical Assessment of Protein Structure Prediction (CASP). The model demonstrated competitive accuracy and greatly outperformed other methods, with its 95% confidence intervals — placing in the top 15 out of 146 entries (Jumper et al., 2021). Here’s an example of what that means, “…if a point estimate is generated from a statistical model of 10.00 with a 95% confidence interval of 9.50 to 10.50, it means one is 95% confident that the true value falls within that range” (Hayes, 2024).
A search in PubMed for “AlphaFold” produces 1,418 results (PubMed, 2024). These results are a mix of research done with AlphaFold and research on AlphaFold. Pretty good. As with any AI tool, nothing is 100% accurate. The consequences of research that is not fully tested, could prove fatal, but thanks to the standards of the scientific community, testing and more testing reduces the probability of potential pitfalls.
Part IV: Antibiotic Research Using AlphaFold
Antibiotic-resistant bacteria are quickly becoming a global problem. In the US alone, there are 2.8 million antibiotic-resistant infections. Scientists, like the Sousa Lab at University of Colorado, Boulder, are researching ways to develop more antibiotics to keep up with the growing need.
Using AlphaFold, the Sousa lab can model bacteria and the mechanism of bacteria that causes resistance to treatment. The general hypothesis is that by blocking the enzyme-controlled mechanism in bacteria, antibiotics should be able to function and prevent infections and disease. The lab takes those predictions and tests them in vitro and in vivo (Sousa, 2024). AlphaFold and AI are speeding up this research by assisting in the determination of these structures at a much faster speed than before; ultimately helping to keep up with our evolving biological world. (DeepMind, 2022).
Part V: Key Insights and Recommendations
AI is a means to an end, a tool, and will always need human intervention at some point in the process to monitor and make decisions about real world implications — it will always need human experts to analyze results and consider ethics. Clinical research processes, such as double-blind trials and long-term testing, should always be a standard part of any AI-assisted research.
AlphaFold is an accessible tool and database that is expediting the way research is being done. Molecular biology is a wide field and by having technology that allows for faster predictions of molecular structures, researchers can focus their time and energy on the bigger questions that these structures impact. As with any data modeling, AI can only get more efficient with better and more data. AlphaFold contributes quality data and makes quality data accessible, which is why it is making significant progress for the scientific community, and well-deserving of a Nobel prize.
References
AlphaFold. (2024, May 8). Alphafold 3 predicts the structure and interactions of all of life’s molecules. Google. https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/#life-molecules.
DeepMind. (2022, July 28). Accelerating the race against antibiotic resistance. Google DeepMind. https://deepmind.google/discover/blog/accelerating-the-race-against-antibiotic-resistance/
DeepMind. (2024). Alphafold server. Google DeepMind. https://deepmind.google/technologies/alphafold/alphafold-server/.
EMBL-EBI (2022). About. AlphaFold Protein Structure Database. https://alphafold.com/about
Hayes, A. (2024, June 6). What is a confidence interval and how do you calculate it?. Investopedia. https://www.investopedia.com/terms/c/confidenceinterval.asp
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021, July 15). Highly accurate protein structure prediction with alphafold. Nature News. https://www.nature.com/articles/s41586-021-03819-2
PubMed. (2024). Alphafold. — search results — pubmed. National Center for Biotechnology Information. https://pubmed.ncbi.nlm.nih.gov/?term=alphafold.
Sousa, M. (2024). Structural Biology and Protein Biophysics. Sousa Research Group. https://www.colorado.edu/lab/sousa/
Stewart, M. (2020, July 29). Simple introduction to neural networks. Medium. https://towardsdatascience.com/simple-introduction-to-neural-networks-ac1d7c3d7a2c