Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
-
Updated
Apr 5, 2024 - Python
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Google, Naver multiprocess image web crawler (Selenium)
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Apache Spark 3 - Structured Streaming Course Material
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Interview coding questions and experiences for several companies merged into one repository
Apache Airavata Django Portal Framework
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."