Datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Code for fine-tuning Platypus fam LLMs using LoRA
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Datasets used in Plotly examples and documentation
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
Techniques for deep learning with satellite & aerial imagery
Training Materials for R and Microsoft R Server
Data archive of identifiable COVID-19 related public projects on GitHub
Repository containing Reproducility Material of "Bayesian Transfer Learning for Artificially Intelligent Geospatial Systems: A Predictive Stacking Approach" (Presicce and Banerjee, 2024).
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
The full dataset behind paperswithcode.com
COVID-19 data from the repo CSSEGISandData/COVID-19> https://github.com/CSSEGISandData/COVID-19
Maps following the #MapPromptMonday social mapping prompts
https://github.com/CSSEGISandData/COVID-19
Visualization of confirmed and recovered Corona cases, data from https://github.com/CSSEGISandData/COVID-19
CSV files of COVID-19 total daily confirmed cases and deaths in the USA by state and county. All data from Johns Hopkins & NYT..
Replication files for "The Effects of Historical Pandemics: The Black Death". Published in the Journal of Economic Literature.
This R file shows the investigation of differentially expressed genes between cocaine addict deaths and non-cocaine addict deaths.
R Markdown file analyzing US leading causes of death in the United States