The European Union has launched a new for researchers worldwide to share their work on coronavirus, with stewards of biological data struggling to make a tsunami of new information about the virus easily accessible and shareable.
“Scientists around the world have already produced a wealth of knowledge on coronavirus,” said Ursula von der Leyen, president of the European Commission, announcing the initiative on 21 April. “But no research, lab or country can find a solution alone.”
The portal is designed to give researchers a central point of access to everything from genomic data about the virus to epidemiological studies that track how far it has spread through the population.
Previously, “you had to go to different databases to get different pieces of information”, explained Rolf Apweiler, co-director of the European Bioinformatics Institute (EBI), one of the world’s key repositories for biological data.
Now, with the introduction of the portal, the hope is that researchers “don’t need to trawl the internet to go to maybe dozens of sites”, he said.
The scale of the organisational challenge is potentially enormous: the UK-based EBI, one of the organisations helping to create the site, has an overall storage capacity for more than 300 petabytes of data – the equivalent of about 300 million gigabytes – and its website gets approximately 60 million hits a day.
Labs across the world have been sending in thousands of sequenced coronavirus genomes, which can be used to track how it spreads by looking at mutations and allow drug and vaccine makers to understand which parts of the virus’s genome are mutating and which are not.
This sharing of virus genomes has been done “much better than for everything else”, said Frank Aarestrup, head of the research group for genomic epidemiology at the Technical University of Denmark, one of the institutions that has helped create the portal.
Right from the beginning of the outbreak, he said, Chinese scientists deposited sequences with GISAID, an internationally run database set up in 2008 to store virus data in the aftermath of sharing failures during the avian flu outbreak two years earlier.
Other types of data that are also potentially useful in the effort against coronavirus are more fractured, however. For example, sets of patient imaging and genetic data collected by hospitals can be used to discover if certain genetic markers made some people more vulnerable to Covid-19, explained Dr Apweiler.
However, “a lot of the data is not collected in a way that is useful outside the hospital situation”, he said. The hope is that the portal “can help to make progress in making data gathered in a healthcare setting faster and more useful for research”, he added.
This “decentralised” patchwork of patient data, collected at the national or even regional level and often protected by privacy barriers, is “Europe’s biggest challenge and our biggest opportunity”, said Niklas Blomberg, director of Elixir, an intergovernmental organisation attempting to create a single infrastructure for life sciences data and resources, and another of the organisations working on the portal.
The aim is to link up scientists with this higher-level patient metadata from national and regional sources, although researchers would probably need permission to access it more directly, he said.
Overall, the pandemic could spur the continent into making its patient data more accessible and shareable, he said, enabling medical advances beyond the virus. “The Covid-19 pandemic shows this is something we need to address in Europe,” he said.
Find out more about THE DataPoints
THE DataPoints is designed with the forward-looking and growth-minded institution in view