yesdata

ZhenYi.Ma yesdata

Docker+Harbor+Kubernetes CDH+Hadoop/Spark Maintenance

Achievements

Spark Spark Public

Latest News Spark 2.3.3 released (Feb 15, 2019) Spark 2.2.3 released (Jan 11, 2019) Spark+AI Summit (April 23-25th, 2019, San Francisco) agenda posted (Dec 19, 2018) Spark 2.4.0 released (Nov 02, 2…

1
Hadoop Hadoop Public

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing…
Hive Hive Public

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storag…
Kafka Kafka Public

Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
HBase HBase Public

HBase™ is the Hadoop database, a distributed, scalable, big data store. Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of ve…
Flume Flume Public

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming d…