Skip to content

Commit

Permalink
Merge 04816c1 into 9dfbb12
Browse files Browse the repository at this point in the history
  • Loading branch information
hsnthn committed Aug 1, 2016
2 parents 9dfbb12 + 04816c1 commit 31a3600
Showing 1 changed file with 19 additions and 24 deletions.
43 changes: 19 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# spark-etl

![License](https://img.shields.io/github/license/mashape/apistatus.svg)](http://badges.mit-license.org)
[![Build Status](https://travis-ci.org/vngrs/spark-etl.svg?branch=master)](https://travis-ci.org/vngrs/spark-etl)
[![Coverage Status](https://coveralls.io/repos/github/vngrs/spark-etl/badge.svg?branch=master)](https://coveralls.io/github/vngrs/spark-etl?branch=master)
[![Join the chat at https://gitter.im/vngrs/spark-etl](https://badges.gitter.im/vngrs/spark-etl.svg)](https://gitter.im/vngrs/spark-etl?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

### What is spark-etl?

[The ETL(Extract-Transform-Load)] process is a key component of many data management operations, including move data and to transform the data from one format to another. To effectively support these operations, spark-etl is providing a distributed solution.

spark-etl is a Scala-based project and it is developing with Spark. So it is scalable and distributed. spark-etl will process data from N source to N database.
Expand Down Expand Up @@ -33,23 +36,25 @@ The project structure:

Differences between other ETL projects:
- parallel ETL on cluster level
- synchronisation
- synchronisation of data
- open source

Example scenario to use spark-etl:
### Example Scenario

We want to get data from multiple sources like MySQL and CVS. When we extracting data, we also want to filter and merge some fields/tables. During the transform layer, we want to run an SQL. Then we want to write the transformed data to multiple targets like S3 and Redshift.

![etl](http://goo.gl/nZMmIE)

spark-etl is the easiest way to do this scenario!

### Version
0.0.1-SNAPSHOT

### Tech
* [Scala] - Functional Programming Language
* [ScalaTest] - ScalaTest is a testing tool in the Scala ecosystem.
* [wartremover] - WartRemover is a flexible Scala code linting tool.
* [scalastyle] - Scalastyle examines Scala code and indicates potential problems with it.
* [scoverage] - Scoverage is a code coverage tool for scala that offers statement and branch coverage. * [Apache Spark] - Apache Spark is a fast and general engine for large-scale data processing.
* [scoverage] - Scoverage is a code coverage tool for scala that offers statement and branch coverage.
* [Apache Spark] - Apache Spark is a fast and general engine for large-scale data processing.
* [travis-ci] - Travis CI is a hosted, distributed continuous integration service used to build and test software projects
* [coveralls] - Coveralls is a web service to help you track your code coverage over time, and ensure that all your new code is fully covered

Expand All @@ -63,24 +68,8 @@ Prerequisites for building spark-etl:
### How to become a committer

Want to contribute? Great!
Let's say "Hello" on [gitter].

Committers have write access to the project’s repositories, i.e., they can modify the code, documentation by themselves and also accept other contributions.

There is no strict protocol for becoming a committer.

Being an active committer means participating on [gitter] discussions, helping to answer questions.
Of course, contributing code and documentation to the project is important as well. A good way to start is contributing improvements, new features, or bug fixes. You need to show that you take responsibility for the code that you contribute, add tests and documentation, and help to maintain it.

If you would like to become a committer, please write on [gitter].

### Development

spark-etl uses Scala and Apache Spark for distributed developing.

Minimal requirements:

- Support for Scala (scala plugin)
- Sbt (sbt.version = 0.13.12)

### Todos

Expand All @@ -89,10 +78,16 @@ Minimal requirements:

License
----

[MIT License]

[etl]: <https://github.com/vngrs/spark-etl>
[The ETL(Extract-Transform-Load)]: <https://en.wikipedia.org/wiki/Extract,_transform,_load>
[gitter]: https://gitter.im/vngrs/spark-etl?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
[MIT License]: https://github.com/vngrs/spark-etl/blob/master/LICENSE
[MIT License]: https://github.com/vngrs/spark-etl/blob/master/LICENSE
[Scala]: https://github.com/scala/scala
[ScalaTest]: https://github.com/scalatest/scalatest
[wartremover]: https://github.com/puffnfresh/wartremover
[scalastyle]: https://github.com/scalastyle/scalastyle
[travis-ci]: https://github.com/travis-ci/travis-ci
[coveralls]: https://coveralls.io/
[Apache Spark]: https://github.com/apache/spark

0 comments on commit 31a3600

Please sign in to comment.