The growth of genomics sequencing

The last 20 years have seen tremendous growth in human genomics. To put things in perspective, the Human Genome project began in 1990 as an international scientific research project with the goal of creating the first human genome sequence.

The project took 10 years to create its first working genome sequence draft and 13 years before it was completed. Today, this process can be completed in under 24 hours 1 .

Due to these incredible advances in technology, millions of genomes have been sequenced so far, making it easier to study diseases associated with mutations in a single gene.

But with genomics sequencing analysis becoming more accessible, the amount of data being produced is rapidly expanding, with the amount of raw genomic data being produced around the world doubling every seven months 2 .

By 2025 it is estimated that:

Between 100M and 2B genomes will be sequenced3

Between 2 and 40 exabytes of storage capacity will be required to store the entire globe's human genomic data 4

This data, combined with ever-growing amounts of single-cell and functional genomics data, digital medical records, and other critical biomedical data, has the potential to substantially enhance our understanding of the fundamental processes for healthy life and revolutionise the treatment of disease. But this doesn't come without its challenges.

Data challenges – overcoming research bottlenecks

Genomics data output is increasing all the time. But while these ever-larger scientific datasets may be a goldmine for discovery, analysing them within on-premise legacy environments – which many life-science organisations still use – has become a bottleneck in genomics research. Massive processing power and scalability is required, presenting a challenge to organisations of all sizes with on-premise storage systems based on outdated legacy infrastructure.

For example, to gain insights from huge and archived datasets, a researcher must secure sufficient storage space and perform large, time-consuming downloads, followed by a compute-intensive data re-analysis from scratch.

Many labs aren't equipped for this, so valuable data goes unused. In addition, the velocity and volume of genomic data continues to rise in response to reduced sequencing costs and broader adoption. Eventually, single organisations may struggle to independently manage, sequence, process and analyse all insight available from a particular data set.

Instead, we may see smaller, agile groups capable of looking at specific problems that drive the development of insight. They will however require the ability to easily and securely access this information – which is only achievable with modern cloud-based technology.

DATA CHALLENGES

Let's look at some of the other key areas of consideration when it comes to the most common data challenges.

Data challenges – overcoming research bottlenecks

Key areas of consideration

The need for greater data sharing

The National Human Genome Research Institute (NHGRI) strongly encourages studies involving human data to use data generated from sources with participant consent, for unrestricted access, or for general research use with controlled access.

At the same, time, life-science organisations seek to explore genomic science and its impact on research, health, and society, further accelerated by an increasing number of government-funded genome projects. The European Molecular Biology Laboratory (EMBL-EBI) for example offers data that is used extensively across the world by more than five million researchers in academia and industry, with some 64 million data requests made daily to its websites5 .

This demands close collaboration across the global scientific community along with fast, secure open access to biological research. However, uploading data into on-premise repositories can be time consuming and may result in data that's minimal and sparse, should researchers only deposit what's required to remain compliant. Sometimes data may be stored in more than one place, which again creates challenges associated with time spent locating and accessing information.

Advancing technology

Wearable devices powered by the Internet of Things (IoT) have the potential to greatly impact biomedical research, by providing scientists around the world access to data that supports the advancement of precision medicine

For example, in August 2020, Quest Diagnostics launched an automated next generation sequencing (NGS) engine to power Ancestry Health. This now enables people to access precise genetic testing and gain insights into inherited diseases, including cancers of the colon and other conditions6.

However, each new device or app generates gigabytes, potentially even terabytes of data every day. This data needs its own back-end capability for sending, requesting, and processing information on a massive scale, which is stretching the limits of hardware, software, and datacentres.

Click here to continue reading . . .

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.