Utils for streaming large files (S3, HDFS, gzip, bz2...)
-
Updated
May 31, 2024 - Python
Utils for streaming large files (S3, HDFS, gzip, bz2...)
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。
A tool and library for easily deploying applications on Apache YARN
Analysis scripts for log data sets used in anomaly detection.
Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Network.
Minikube for big data with Scala and Spark
Distributed File System written in Python
Add a description, image, and links to the hdfs topic page so that developers can more easily learn about it.
To associate your repository with the hdfs topic, visit your repo's landing page and select "manage topics."