Like many people deeply immersed in DNA sequencing, my first introduction to next-generation sequencing was a packed, standing-room only panel discussion at the 2003 Genomes, Medicine and Environment conference (née GSAC). It was two years before any NGS instrument was commercially available, and yet we had all gathered there because—although that year the price of sequencing a human genome was $10M (at best)—the buzz was all about $1,000 genomes, and we could sense that the world was about to irreversibly change. While the individual panel presentations were focused on technology, each presenter couldn’t help but share their personal view on how massively available genomic sequencing would transform our lives and how we manage our health.
Since then I’ve been heavily involved with every stage of bringing the next-generation sequencing dream to fruition, and it’s thrilling to see the final piece of that problem coming to market: transformative change in our access to genetic information will create transformative change in how we incorporate genetic information into routine healthcare.
But as we embark on that last stage, it’s interesting to see where the use of next-generation sequencing has gotten to and where it needs to go.
What worked for research does not always work for the clinic
When gigabases of DNA sequence first became routinely available seven years ago, it was natural that the early adopters of NGS were the research institutes and genomic centers specialized in DNA sequencing, with their existing lab infrastructure, bioinformatics personnel and a desire to apply cutting-edge technologies to genomics research. And as next-generation sequencing methods were rapidly developed and matured, the research setting was the first focus.
Now as we take next-generation sequencing to the clinic, we can see how what worked well for genomic research doesn’t immediately slot into clinical diagnostics.
To make a few sweeping contrasts:
1) Research often puts a premium on throughput and quantity; clinical application often puts quality first.
2) Research can accept false negatives and missing data if they’re finding enough of what they’re looking for; clinical applications need to cover all of the loci of relevance with sufficient sensitivity to answer the clinical question.
3) Research can cope with ever-changing technologies, changing lab methods, and changing software pipelines; clinical application needs consistency and reliability.
4) Research is okay with ambiguous results and soft indications of quality and accuracy; clinical application desires to bring every question down to a yes/no answer if possible.
All of these needs are addressable as we move to clinical next-generation sequencing. But they have required that—as a clinical community—we critically examine the sample prep, sequencing and bioinformatics methods (targeted enrichment, barcoding, sequencing instruments and protocols, aligners and variant callers) which have gotten NGS to its current state and address any remaining issues of accuracy, reproducibility, and reliability in our clinical setting. The good news is that, while the off-the-shelf components do not by themselves create a clinical pipeline, with the right modifications and controls, NGS can be tamed into routine analytical use in the clinical lab.
Isn’t clinical genetics just a list of variants and some data pulled from the web?
With the explosion in biological databases in the last decade there’s good cause to hope that interpreting genomic variation is a simple matter of finding the variation—the sequencing analysis pipeline—and then combining information from various computational resources to provide a clinical annotation and interpretation. My personal experience with this desire is that, in practice, the software development 80/20 rule applies here: while the majority of our needs can be accomplished with a relatively straightforward combination of data sources, the vast majority of our time as an organization is spent making sure all of the difficult exceptions are being handled transparently for our customers.
Why is this? Each of the interesting topics here are each worth their own blog entries! But here are a few examples:
Biological information is messy. Bioinformaticians have long chanted this—mostly in laments to their fellow lab mates—but we’re now in a world where entirely new consumers of biological information are about to see all of the special exceptions and gotchas in how we link up our combined information sources. Variation in a gene causes variation in the transcript causes a variation in the protein which then causes a functional effect in the cell, right? What if that variation is a difference from the reference genome, but not a difference from the reference transcriptome? (This happens quite naturally: transcript databases inherently contain population polymorphisms due to their larger set of sources, and little overlap with the sources that comprise the reference human genome. The community is also interested in fixing this.) What if the genetics literature prefers to refer to the transcript by an older name and older version of the human reference sequence? What if there’s a common indel polymorphism in the gene and the codon numbering is consistently skewed?
No data source is perfect… …but they are getting better! At Invitae, we routinely sequence the gene TMEM216 (causative for a form of Joubert syndrome) and find the variant NM_001173990.2:c.432-1G>C in our control and clinical samples. The current release of the 1,000 Genomes Project data does not contain an entry for this variant—it would de facto be considered a rare variant. Yet data at dbSNP suggests an allelic frequency of 60% and the Exome Sequencing Project finds 70%, both consistent with how frequently we see this variant. Knowing the strengths and weaknesses of your data sources and being able to create accurate and informative clinical reports through these issues is critical.
Understanding the real meaning of “risk” in genetics. Simple Mendelian genetics tells us that there are recessive diseases and there are dominant diseases. Two pathogenic variants in a recessive disease cause the disease state, while only one pathogenic variant is required in a dominant disease. What if the condition in question has been observed to have recessive and dominant inheritance? What if multiple variants in conjunction (haplotypes) are required to cause the disease? What if the risk conferred by variants is best captured with quantitative risk estimates rather than yes/no views of pathogenicity? What if it’s known that loss-of-function variants, normally considered pathogenic, are not pathogenic for this genetic condition? At Invitae, these considerations and many more have gone into our approach to systematic clinical annotation, and we’ve spent a significant fraction of our last two years creating a system for handling all of this genetic complexity.
Do all this and prove it’s right. Next-generation sequencing and the large-scale reporting of genetic findings are very new techniques in clinical diagnostics. Ensuring high quality in every step of our process is a central concern in how we approach our development. Producing accurate variant calls from next-generation sequencing results has matured greatly in the last several years, and we apply quality measures specifically designed to reduce common sources of false positive indications, while not reducing our analytical sensitivity. Validating that this all works on a genomic scale is a critical theme in our community and we’re not alone in looking for the best approaches to NGS validation.
Bringing it all together
Seeing next-generation sequencing grow up to the point of being able to deliver actionable results to patients around the world has been an exciting moment for my life in this field. The challenges I raise here are concerns spanning molecular biology, molecular genetics, computer science, bioinformatics, clinical genetics, and genetic counseling, to name a few of the many disciplines involved! At Invitae we’ve brought together a best in class team of individuals from all of these domains, stirred in a seasoned team of business leaders, and we are armed to face these challenges head on. Now’s the time to focus on making genomics truly clinical!
-Jon Sorenson, Invitae