Sequencing Coverage and Throughput | Thermo Fisher Scientific

The importance of coverage

Coverage describes the number of sequencing reads that are uniquely mapped to a reference and “cover” a known part of the genome. Ideally, the sequencing reads that uniquely aligned are uniformly distributed across the reference genome and hence provide uniform coverage. In reality, coverage is not uniform and may be underrepresented in genetic regions of interest due to a variety of factors (see table below). These include the fact that the genome itself is complex, containing genes, noncoding DNA, repetitive sequences, and other elements that can make it difficult to align the sequencing read to the proper genomic coordinates.

The number of sequencing reads that map to a known region is also an important part of coverage. A sufficient number of properly mapped reads is required to find and correctly identify genetic mutations. With high sequencing coverage, researchers can find the proverbial ‘needle in the haystack’, able to identify low frequency mutations or discover mutations in a heterogeneous sample such as a tumor biopsy. Poor coverage, whether due to an insufficient number of reads or sequencing reads that are mapped incorrectly, will result in the inability to detect the variants of interest.

White Paper: The importance of coverage: advantages of amplicon-based approaches in next-generation sequencing

Download

How does throughput relate to sequencing coverage?

Having coverage is clearly important to ensure that the genomic region of interest can be studied with high confidence. For regions with little to no coverage, researchers frequently increase the sequencing throughput for their studies. That is, obtain more sequencing reads and data to increase coverage for a genetic region by brute force. However, this method is inefficient, increases costs, and does not address the underlying reasons for the poor coverage itself. By increasing throughput, genomic regions with sufficient coverage will now be over-represented and the reads are in effect, wasted. Areas with zero coverage before may not have coverage just by sequencing more sample.

A more efficient way to address coverage is by using a targeted sequencing approach. Through targeted sequencing, researchers can focus on just their regions of interest instead of needing to sequence the entire genome. This provides the benefit of ensuring sufficient coverage, including in parts of the genome that may not have been accessible previously, with lower sequencing costs.

Potential reasons for poor sequencing coverage and uniformity

Reasons for poor coverage	Why this can affect coverage
Sample quality	Degraded samples are more difficult to prepare with shorter sequencing reads. Shorter sequencing reads are more difficult to map to the correct region since they may be less unique.
Sample input	May not have enough sample to sequence and the DNA is not representative of the entire genome
Homologous regions	Homologous regions have similar sequences. More difficult to map the read to the correct portion of the reference genome
Regions of low complexity	Sequence reads with low complexity may be mapped to the wrong part of the genome, resulting in coverage bias.
Hypervariable regions	Due to the high number of variants, the sequencing read will look very different compared to the reference genome and may not be mapped appropriately.
GC content	Potential sequencing bias due to the % content of guanine-cytosine nucleotides

References

https://www.genome.gov/27565109/the-cost-of-sequencing-a-human-genome/
Watson et al., J Immuno 198:3371 (2017)

Additional resources

NGS ebook: Introduction to next-generation sequencing
Ion Torrent Genexus System: Fast, easy, automated NGS workflow
Ion AmpliSeq Designer Tool for creation of custom panels
Ion Torrent focused NGS applications and solutions
Ion AmpliSeq Targeted Sequencing Technology

For Research Use Only. Not for use in diagnostic procedures.

The Importance of Sequencing Coverage and Throughput

White Paper: The importance of coverage: advantages of amplicon-based approaches in next-generation sequencing

How does throughput relate to sequencing coverage?

Potential reasons for poor sequencing coverage and uniformity

Reasons for poor coverage

Why this can affect coverage

References

Additional resources