Skip to content

thanhthinguyen/covid19resources

Repository files navigation

COVID-19 Resources

This page is continuously updated to maintain the resources related to applications of AI for COVID-19. The data are for research purposes and researchers using these data need to acknowlege the original sources using information provided in the associated links.

Sources Data Type Descriptions Links
Johns Hopkins University [Ref1] Web-based mapping global cases A dashboard illustrates the location and number of confirmed COVID-19 cases, deaths and recoveries for all affected countries in real time, started from January 22, 2020 until now. These data can be downloaded in the CSV format and can be used to analyse and predict the virus spread. Link
C. R. Wells’s GitHub [Ref2] Daily number of cases in China The data were recorded from mainland China only, from December 8, 2019 to February 15, 2020, available in MATLAB format. They can be used to evaluate the impact of international travel and border control measures on the global spread of COVID-19. Link
DataHub Time series data on cases These data are sourced from the Johns Hopkins University source, but they have been cleaned and normalized, e.g., tidying dates and consolidating several files into normalized time series. The data consist of confirmed cases, reported deaths, and reported recoveries. They are updated daily and can be downloaded in CSV format. Link
China CDC (CCDC) Daily number of cases in China Daily update data of new cases, asymptomatic cases, recoveries, and deaths in China only, available from January 20, 2020 until now. The data are in webpage format, so more effort is needed to extract them collectively. Link
U.S. CDC Cases in U.S. Number of COVID-19 daily cases, deaths, and test volume in the U.S. reported to CDC, by state/territory, available from January 2020 until now. The data can be downloaded in CSV format for each state/territory. There are also downloadable maps and charts tracking cases, deaths, and trends of COVID-19 in the U.S. Link Link
J. P. Cohen's GitHub [Ref3] Chest X-ray and CT images About 470 images of COVID-19 and 180 images of other viral and bacterial pneumonias such as MERS, SARS, acute respiratory distress syndrome, etc. The data can be used to develop AI approaches to predict and understand the COVID-19 infection. Link
European Society of Radiology Chest X-ray and CT images About 850 chest images, including 60 images related to COVID-19. Each image has well-documented clinical history, imaging findings, extensive discussion, and diagnosis, and can be downloaded as PDF. Although the number of images is limited, but it is useful for studying explainable imaging features of COVID-19. Link
Italian Society of Medical Radiology (SIRM) Chest X-ray and CT images Include chest images of 115 COVID-19 patients with detailed health record data and discussion for each case. The images are embedded in webpages, so they can be downloaded individually. Link
British Society of Thoracic Imaging (BSTI) Chest X-ray and CT images Include chest images of 59 COVID-19 patients with clinical details for each case. The images are embedded in webpages and can be downloaded individually. Link
Kaggle Chest X-ray and CT images Contain images of 204 patients, including 168 COVID-19 cases and the rest are of MERS, SARS, and acute respiratory distress syndrome. Each case has metadata showing clinical details and all images can be downloaded altogether in a folder. Link
UCSD-AI4H [Ref4] CT images Include 349 CT images containing clinical findings of COVID-19 from 216 patients with details of gender, age, medical history, severity, etc., and all images can be downloaded in a folder. There is also a folder of 463 non-COVID-19 CT scans. Link
MedSeg (medseg.ai) CT images Two datasets available. The first one contains 100 axial CT images from >40 COVID-19 patients with age and gender details. The second one contains 829 CT images, in which 373 are of COVID-19 positive cases. All can be downloaded in separate folders. Link
Point-of-Care Ultrasound (POCUS) [Ref5] Lung ultrasound images and videos Include ultrasound images using convex probe and linear probe. It comprises 202 videos, in which 70 are of COVID-19, 57 are of bacterial and viral pneumonia, and 75 healthy. It also contains 22 images of COVID-19, 22 images of bacterial pneumonia, and 15 healthy. Link
COVID-19 Radiography Database [Ref6] Chest X-ray images Contain 3,616 chest X-ray images of COVID-19 positive cases along with 10,192 normal, 6,012 lung opacity (non-COVID lung infection), and 1,345 other viral pneumonia. All can be downloaded in separate folders. Link
Actualmed COVID-19 Initiative Chest X-ray images Include 238 chest X-ray images of 215 patients, in which 49 are of COVID-19, 116 are of normal cases, and the rest are inconclusive. There are no clinical details available for each infectious case. Link
Georgia State University's Panacea Lab [Ref7] Twitter chatter dataset in many languages Contain tweets acquired from the Twitter Stream related to COVID-19 chatter, capturing all languages, but the higher prevalence is English, Spanish, and French. There are more than 990 million unique tweets and retweets, and a cleaned version with no retweets includes 252 million unique tweets. The data can be downloaded in TSV files. Link
COVID-19 Open Research Dataset (CORD-19) [Ref8] Scholarly articles about COVID-19 and related coronaviruses Contain over 500,000 scholarly articles, including over 200,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. The metadata comprise title, DOI number, publish time, authors, journal, URL, etc. This is a large dataset of more than 50 GB, which can be downloaded in folders. Link Link
World Health Organization Latest scientific findings and knowledge on COVID-19 This database is updated daily, comprising scholarly articles of latest international multilingual scientific findings and knowledge on COVID-19. Currently, it contains more than 318,000 articles, nearly 25,000 preprints, mostly in English, Spanish and Chinese. Users can search and export metadata (e.g., title, authors, journal, DOI number, etc.) into a CSV file. Link
NCBI GenBank SARS-CoV-2 sequences This database is updated daily. Currently, it contains more than 1.6 million nucleotide records and nearly 9 million protein records. Each record is well-documented with information about collection date, country, submitted authors, assembly method, sequencing technology, etc. Users can download multiple sequences in FASTA format. Link
The GISAID Initiative SARS-CoV-2 sequences Similar to NCBI, this database is updated daily. Currently, it contains approximately 3.9 million nucleotide records. Each record contains useful metadata such as collection date, location, gender, age, patient status, etc. Users need to register before being able to download either single or multiple records in FASTA format. Link
European COVID-19 Data Platform (EMBL-EBI) SARS-CoV-2 sequences Currently contains nearly 1.2 million nucleotide records across many countries with essential metadata, such as sampling tracking identifiers, sampling time, geographical location, method of sampling, health status of host and sequencing platform/strategy. Users can download multiple sequences in FASTA or EMBL format. Link

References

[Ref1] Dong, E., Du, H., Gardner, L. (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, doi: https://doi.org/10.1016/S1473-3099(20)30120-1.

[Ref2] Wells, C. R., Sah, P., Moghadas, S. M., Pandey, A., Shoukat, A., Wang, Y., ... & Galvani, A. P. (2020). Impact of international travel and border control measures on the global spread of the novel 2019 coronavirus outbreak. Proceedings of the National Academy of Sciences, 117(13), 7504-7509.

[Ref3] Cohen, J. P., Morrison, P., Dao, L., Roth, K., Duong, T. Q., & Ghassemi, M. (2020). COVID-19 image data collection: prospective predictions are the future. Journal of Machine Learning for Biomedical Imaging, 1, 1-38.

[Ref4] Zhao, J., Zhang, Y., He, X., & Xie, P. (2020). COVID-CT-Dataset: a CT scan dataset about COVID-19. arXiv preprint arXiv:2003.13865.

[Ref5] Born, J., Brandle, G., Cossio, M., Disdier, M., Goulet, J., Roulin, J., & Wiedemann, N. (2020). POCOVID-Net: automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS). arXiv preprint arXiv:2004.12084.

[Ref6] Chowdhury, M. E., Rahman, T., Khandakar, A., Mazhar, R., Kadir, M. A., Mahbub, Z. B., ... & Islam, M. T. (2020). Can AI help in screening viral and COVID-19 pneumonia?. IEEE Access, 8, 132665-132676.

[Ref7] Banda, J. M., Tekumalla, R., Wang, G., Yu, J., Liu, T., Ding, Y., ... & Chowell, G. (2021). A large-scale COVID-19 Twitter chatter dataset for open scientific research—an international collaboration. Epidemiologia, 2(3), 315-324.

[Ref8] Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D., ... & Kohlmeier, S. (2020). CORD-19: The COVID-19 open research dataset. ACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID), (pp. 1-12).

About

This page maintains the resources related to applications of AI for COVID-19

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published