Chaos game representation dataset of SARS-CoV-2 genome
Author(s)
Barbosa, Raquel de M.; Fernandes, Marcelo A.C.
Download1-s2.0-S2352340920305126-main.pdf (1.902Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
As of April 16, 2020, the novel coronavirus disease (called COVID-19) spread to more than 185 countries/regions with more than 142,000 deaths and more than 2,000,000 confirmed cases. In the bioinformatics area, one of the crucial points is the analysis of the virus nucleotide sequences using approaches such as data stream, digital signal processing, and machine learning techniques and algorithms. However, to make feasible this approach, it is necessary to transform the nucleotide sequences string to numerical values representation. Thus, the dataset provides a chaos game representation (CGR) of SARS-CoV-2 virus nucleotide sequences. The dataset provides the CGR of 100 instances of SARS-CoV-2 virus, 11540 instances of other viruses from the Virus-Host DB dataset, and three instances of Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).
Date issued
2020-06Department
Massachusetts Institute of Technology. Department of Chemical EngineeringJournal
Data in Brief
Publisher
Elsevier BV
Citation
Barbosa, Raquel de M. and Marcelo A.C.Fernandes. "Chaos game representation dataset of SARS-CoV-2 genome." Data in Brief 30 (June 2020): 105618 © 2020 Elsevier
Version: Final published version
ISSN
2352-3409