Automating the translation of science to continually improve genetic tests

Alexandre Colavin was recently announced as a winner of the 2019 Innovator Under 35 Europe award by MIT Technology Review in recognition of his work to improve variant interpretation for genetic testing. In this post, Alex recounts the motivation and challenges of translating his research and some next steps for variant interpretation at Invitae.

Road, meet rubber. Rubber, road.

Just under four years ago, as I was finishing my PhD in biophysics at Stanford, studying proteins in bacteria, I developed a new method for studying protein evolution. I found myself deeply interested in how my method could help other scientists understand the development of genetic diseases. Before long, it became abundantly clear that unless I shepherded the translation of my research, it might never have a positive impact on a patient.

My background was in basic science:  science for the sake of science. I had virtually no expertise in genomics or clinical genetics. Fortuitously, I met two like-minded postdocs in Stanford’s Department of Genetics, Carlos L. Araya and Jason Reuter, who were also interested in pursuing the translation of their research. In February 2016, we founded Jungla Inc. with the purpose of translating our research to the clinic to improve the value patients can expect from their genomes.

In stepping out of academia into industry, we rapidly realized that our research could help improve clinical genetic tests by increasing the quality and efficiency of interpretation of rare variants present in everyone’s genomes. It’s at this point we faced a new question from commercial testing laboratories that we never encountered in a research setting: “Why should we trust your product?” 

To understand why we were asked this question, consider this: hundreds of academic publications purport to have developed the best in class “predictors” — algorithms or machine learning models that attempt to predict the clinical effect of mutations. There is no established gold standard for evaluating these methods, so each publication measured their performance against dozens of incompatible standards. Moreover, historically, many of the published predictions have unacceptably low accuracies. It’s unsurprising that the overwhelming majority of clinical geneticists agree: predictors are unreliable in a clinical setting. 

Why should clinical geneticists trust our work over others? And how many other academic algorithms and experiments end up never being considered in the clinic? 

A platform for universal evaluation of evidence

We drew two incredibly important lessons from our early conversations with clinical genetic labs and key-opinion leaders:

1. Don’t build a silver bullet: There is rarely a single piece of genomic evidence that can drive a patient’s diagnosis. In practice, it’s the accumulation of several independent lines of evidence that give a physician confidence that one or several mutations is responsible for disease. 

2. Build quantitative trust: Even if a silver bullet algorithm existed, it would never be properly leveraged in a clinical setting unless we could measure exactly how accurate we expect it to be.

To address the first : we began rapidly expanding our methodologies outside of our research at Stanford to generate new types of evidence based on a range of different hypotheses. We built out a wet lab to run high-throughput assays and test the effects of individual mutations in mammalian cell lines. In parallel, we began consuming and testing a variety of mechanistically motivated computational biology hypotheses: systematic measurement and evaluation of protein stability; gene- and disease-specific machine learning predictors; mapping of knowledge between paralogous genes of interest; systematic identification of hotspots — recurrent regions of mutation. We are always searching for new methods to incorporate into the platform (more on that later).

This brought us to our second insight: with many distinct methods in hand, we realized that the only way to evaluate the efficacy of each method for each gene and disease would be to evaluate everything in the same way. We spent two years building an end-to-end cloud platform for universal, systematic generation and evaluation of evidence for interpreting variants. The “Functional Modeling Platform”—as we call it—provides a way to not only identify which methods are performant but also quantitatively assess how well we can expect them to perform in a clinical setting. 

We had one other major challenge that couldn’t be addressed by academic publications: knowledge is constantly changing. Every day, clinical genetic labs share their interpretations of variants on a central public repository called ClinVar. As a result, the field’s understanding of disease causing variants changes over time — the inexorable march of progress. We had to show that our methods are performing well not just on data today but on new data tomorrow.

To address this challenge, we began time-stamping our predictions by cryptographically linking the predictions of every model from every method to the Bitcoin Blockchain. This helped us irrefutably prove to third parties the specific predictions that we made in the past, and lets us demonstrate the performance of our models on variants whose clinical significance was discovered in the future — that is, we could passively measure the prospective performance of all our methods. 

Acid test 

In 2018, we began a multi-month pilot with Invitae. In a highly collaborative fashion, our teams set out with the singular goal of measuring whether Jungla’s evidence could actually improve variant interpretation in a clinical setting. Importantly, we worked together to refine our universal gold standard based on trusted clinical labs to evaluate the performance of our evidence. Also important, we offset our model development and evaluation by six months so we could measure performance on knowledge that was published to ClinVar in the intervening time (and which did not exist at the time of training). 

The results were extremely heartening: Jungla’s platform was able to drive double-digit increase in the percent of variants that could be classified without any cost to quality. In the process of running a pilot with Invitae, we developed a deep respect for Invitae’s culture and recognized an uncanny alignment in our companies’ missions. Although it had not been our intention at the outset of the pilot, we were very excited to announce that Invitae acquired Jungla in July 2019. 

Outlook and vision

It’s been just a few months since Jungla’s acquisition; a lot has changed, but more has stayed the same. The entire Jungla team helped seed a new team at Invitae :  the Clinical Science and Interpretation team. Already, the new team has plugged our platform into Invitae’s production, driving clinically impactful improvements to thousands of patient’s tests. 

We’ve redoubled our mission: to translate genetic information into personal understanding, however we can. We’re growing our team across a spectrum of expertise: genomic scientists, biophysicists, cellular engineers, data scientists, AI engineers, computational biologists, and software engineers, all of whom work closely together. 

I can’t wait to see the innovative, creative ideas that we can develop to continue improving genetic testing and patient health. There’s a lot to do to continue improving the value we can all expect from genetic testing. If you’re interested in being part of this next chapter, I highly encourage you to check out the current opportunities at Invitae!