Skip to content
#

dask

Here are 10 public repositories matching this topic...

Developed for "Management and Analysis of Physics Dataset Mod. B," this project uses Dask and CloudVeneto VMs to handle a massive 250GB dataset. Clustering on 800k RCV1 articles involves dataset reduction by macrocategory and also implementing cosine similarity for improved clustering, as suggested by Natural Language Processing principles.

  • Updated Jan 25, 2024
  • HTML

Improve this page

Add a description, image, and links to the dask topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dask topic, visit your repo's landing page and select "manage topics."

Learn more