dsfsi-datasets
Here are 19 public repositories matching this topic...
DSFSI South African Terminlogy Lists and Lexicon Project
-
Updated
Aug 22, 2024 - HTML
IsiZulu News (articles and headlines) and Siswati News (headlines) Corpora - za-isizulu-siswati-news-2022
-
Updated
Oct 26, 2023
Curated corpora for Setswana. Used to train PuoBERTa.
-
Updated
Oct 26, 2023
Zondo Commission or State Capture Commission Transcripts
-
Updated
Oct 26, 2023
Educational Assesement using LLMs
-
Updated
Nov 6, 2023 - Python
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
-
Updated
May 10, 2024 - Jupyter Notebook
Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.
-
Updated
Oct 26, 2023
-
Updated
Nov 6, 2023 - Python
This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIE–FRSE) of El-Haj et al. which created a corpus of …
-
Updated
Oct 26, 2023 - Jupyter Notebook
Embedding Evaluation Data for South African Languages
-
Updated
Oct 26, 2023
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
-
Updated
Dec 4, 2023 - Makefile
South African Health Facility map. Created to aid in covid19za responses
-
Updated
Oct 26, 2023 - JavaScript
This is an EDA Git for education researchers and practitioners
-
Updated
Sep 16, 2024 - Jupyter Notebook
StatsSA statistical language glossary in machine-readable format
-
Updated
Oct 26, 2023 - Jupyter Notebook
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
-
Updated
Dec 6, 2023 - Jupyter Notebook
Dataset of South African Disinformation [Fake News] Website Data collected in 2020
-
Updated
Oct 26, 2023
South African Member Of Parliament Data
-
Updated
Oct 26, 2023 - Python
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
-
Updated
Dec 6, 2023 - Jupyter Notebook
Improve this page
Add a description, image, and links to the dsfsi-datasets topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dsfsi-datasets topic, visit your repo's landing page and select "manage topics."