Introduction to Yosegi

What does this project do?

Yosegi is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.

Why is this project useful?

There was a problem that it is too large to compress and save the data as it is in the Big Data era. From the demand for improvement in compression ratio and read performance, several columnar data formats (for example, Apache ORC and Apache Parquet) were proposed. They achieve the high compression ratio from similar data in column and reading performance for grouping data by column when data is used.

However, these data formats are required the data structure in a row (or a record) should be defined before saving the data. It was necessary to decide how to use it at the time of data storage, and it was often a problem that it was difficult to decide what kind of data to use.

In this project, we provide a new columnar format which does not require the schema at the time of data storage with compression and read performance equal to (or higher in case) than other formats.

Use cases

Data Analysis

Analyzing big data requires store data compactly and get data smoothly. Yosegi as a columnar format is useful for this needs.

Data Lake

Data Lake is a data pool that is not required the data structure (as a schema) in the row at the time of data storage. And stored data can be used with defining its schema at the time of analyzing. See DataLake.

License

This project is on the Apache License. Please treat this project under this license.

How do I get started?

Java

For easy usage please see the quick start.

CLI

Please see the repository of yosegi-tools for details.

If you want to know what kind of function it has, look at the command list.

Apache Hadoop

Yosegi supports Apache Hadoop. Please see the repository of yosegi-hadoop for details.

For easy usage please see quick start.

Apache Hive

Yosegi supports Apache Hive. Please see the repository of yosegi-hive for details.

For easy usage please see quick start.

Apache Spark

Yosegi supports Apache Spark. Please see the repository of yosegi-spark for details.

For easy usage please see quick start.

Where can I get more help, if I need it?

Support and discussion of Yosegi are on the Mailing list.

Mailing list: yosegi@googlegroups.com
Bug trackter:JIRA

We plan to support and discussion of Yosegi on the Mailing list. However, please contact us via GitHub until ML is opened.

How to contribute

We welcome to join this project widely.

For information on how to start contributing to the project, please refer to the Yosegi contribution guide.

Building

System requirement

Following environments are required.

Mac OS X or Linux
Java 8 Update 92 or higher (8u92+), 64-bit
Maven 3.3.9 or later (for building)

Maven

Yosegi sources can get from the Maven repository.

Compile sources

Compile each source following instructions.

$ mvn clean install

Name		Name	Last commit message	Last commit date
Latest commit History 441 Commits
.circleci		.circleci
.github		.github
docker		docker
docs		docs
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Yosegi

What does this project do?

Why is this project useful?

Use cases

Data Analysis

Data Lake

License

How do I get started?

Java

CLI

Apache Hadoop

Apache Hive

Apache Spark

Where can I get more help, if I need it?

How to contribute

Building

System requirement

Maven

Compile sources

About

Releases

Packages

Contributors 9

Languages

License

yahoojapan/yosegi

Folders and files

Latest commit

History

Repository files navigation

Introduction to Yosegi

What does this project do?

Why is this project useful?

Use cases

Data Analysis

Data Lake

License

How do I get started?

Java

CLI

Apache Hadoop

Apache Hive

Apache Spark

Where can I get more help, if I need it?

How to contribute

Building

System requirement

Maven

Compile sources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages