Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions Code-of-Conduct.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Oath Open Source Code of Conduct

## Summary
This Code of Conduct is our way to encourage good behavior and discourage bad behavior in our open source community. We invite participation from many people to bring different perspectives to support this project. We pledge to do our part to foster a welcoming and professional environment free of harassment. We expect participants to communicate professionally and thoughtfully during their involvement with this project.

Participants may lose their good standing by engaging in misconduct. For example: insulting, threatening, or conveying unwelcome sexual content. We ask participants who observe conduct issues to report the incident directly to the project's Response Team at opensource-conduct@oath.com. Oath will assign a respondent to address the issue. We may remove harassers from this project.

This code does not replace the terms of service or acceptable use policies of the websites used to support this project. We acknowledge that participants may be subject to additional conduct terms based on their employment which may govern their online expressions.

## Details
This Code of Conduct makes our expectations of participants in this community explicit.
* We forbid harassment and abusive speech within this community.
* We request participants to report misconduct to the project’s Response Team.
* We urge participants to refrain from using discussion forums to play out a fight.

### Expected Behaviors
We expect participants in this community to conduct themselves professionally. Since our primary mode of communication is text on an online forum (e.g. issues, pull requests, comments, emails, or chats) devoid of vocal tone, gestures, or other context that is often vital to understanding, it is important that participants are attentive to their interaction style.

* **Assume positive intent.** We ask community members to assume positive intent on the part of other people’s communications. We may disagree on details, but we expect all suggestions to be supportive of the community goals.
* **Respect participants.** We expect participants will occasionally disagree. Even if we reject an idea, we welcome everyone’s participation. Open Source projects are learning experiences. Ask, explore, challenge, and then respectfully assert if you agree or disagree. If your idea is rejected, be more persuasive not bitter.
* **Welcoming to new members.** New members bring new perspectives. Some may raise questions that have been addressed before. Kindly point them to existing discussions. Everyone is new to every project once.
* **Be kind to beginners.** Beginners use open source projects to get experience. They might not be talented coders yet, and projects should not accept poor quality code. But we were all beginners once, and we need to engage kindly.
* **Consider your impact on others.** Your work will be used by others, and you depend on the work of others. We expect community members to be considerate and establish a balance their self-interest with communal interest.
* **Use words carefully.** We may not understand intent when you say something ironic. Poe’s Law suggests that without an emoticon people will misinterpret sarcasm. We ask community members to communicate plainly.
* **Leave with class.** When you wish to resign from participating in this project for any reason, you are free to fork the code and create a competitive project. Open Source explicitly allows this. Your exit should not be dramatic or bitter.

### Unacceptable Behaviors
Participants remain in good standing when they do not engage in misconduct or harassment. To elaborate:
* **Don't be a bigot.** Calling out project members by their identity or background in a negative or insulting manner. This includes, but is not limited to, slurs or insinuations related to protected or suspect classes e.g. race, color, citizenship, national origin, political belief, religion, sexual orientation, gender identity and expression, age, size, culture, ethnicity, genetic features, language, profession, national minority statue, mental or physical ability.
* **Don't insult.** Insulting remarks about a person’s lifestyle practices.
* **Don't dox.** Revealing private information about other participants without explicit permission.
* **Don't intimidate.** Threats of violence or intimidation of any project member.
* **Don't creep.** Unwanted sexual attention or content unsuited for the subject of this project.
* **Don't disrupt.** Sustained disruptions in a discussion.
* **Let us help.** Refusal to assist the Response Team to resolve an issue in the community.

We do not list all forms of harassment, nor imply some forms of harassment are not worthy of action. Any participant who *feels* harassed or *observes* harassment, should report the incident. Victim of harassment should not address grievances in the public forum, as this often intensifies the problem. Report it, and let us address it off-line.

### Reporting Issues
If you experience or witness misconduct, or have any other concerns about the conduct of members of this project, please report it by contacting our Response Team at opensource-conduct@oath.com who will handle your report with discretion. Your report should include:
* Your preferred contact information. We cannot process anonymous reports.
* Names (real or usernames) of those involved in the incident.
* Your account of what occurred, and if the incident is ongoing. Please provide links to or transcripts of the publicly available records (e.g. a mailing list archive or a public IRC logger), so that we can review it.
* Any additional information that may be helpful to achieve resolution.

After filing a report, a representative will contact you directly to review the incident and ask additional questions. If a member of the Oath Response Team is named in an incident report, that member will be recused from handling your incident. If the complaint originates from a member of the Response Team, it will be addressed by a different member of the Response Team. We will consider reports to be confidential for the purpose of protecting victims of abuse.

### Scope
Oath will assign a Response Team member with admin rights on the project and legal rights on the project copyright. The Response Team is empowered to restrict some privileges to the project as needed. Since this project is governed by an open source license, any participant may fork the code under the terms of the project license. The Response Team’s goal is to preserve the project if possible, and will restrict or remove participation from those who disrupt the project.

This code does not replace the terms of service or acceptable use policies that are provided by the websites used to support this community. Nor does this code apply to communications or actions that take place outside of the context of this community. Many participants in this project are also subject to codes of conduct based on their employment. This code is a social-contract that informs participants of our social expectations. It is not a terms of service or legal contract.

## License and Acknowledgment.
This text is shared under the [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/). This code is based on a study conducted by the [TODO Group](https://todogroup.org/) of many codes used in the open source community. If you have feedback about this code, contact our Response Team at the address listed above.
26 changes: 26 additions & 0 deletions Contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# How to contribute
First, thanks for taking the time to contribute to our project! The following information provides a guide for making contributions.

## Code of Conduct

By participating in this project, you agree to abide by the [Oath Code of Conduct](Code-of-Conduct.md). Everyone is welcome to submit a pull request or open an issue to improve the documentation, add improvements, or report bugs.

## How to Ask a Question

If you simply have a question that needs an answer, [create an issue](https://help.github.com/articles/creating-an-issue/), and label it as a question.

## How To Contribute

### Report a Bug or Request a Feature

If you encounter any bugs while using this software, or want to request a new feature or enhancement, feel free to [create an issue](https://help.github.com/articles/creating-an-issue/) to report it, make sure you add a label to indicate what type of issue it is.

### Contribute Code
Pull requests are welcome for bug fixes. If you want to implement something new, please [request a feature first](#report-a-bug-or-request-a-feature) so we can discuss it.

#### Creating a Pull Request
Before you submit any code, we need you to agree to our [Contributor License Agreement](https://yahoocla.herokuapp.com/); this ensures we can continue to protect your contributions under an open source license well into the future.

Please follow [best practices](https://github.com/trein/dev-best-practices/wiki/Git-Commit-Best-Practices) for creating git commits.

When your code is ready to be submitted, you can [submit a pull request](https://help.github.com/articles/creating-a-pull-request/) to begin the code review process.
61 changes: 38 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,40 @@ Copyright 2017 Yahoo Inc.
Licensed under the terms of the Apache 2.0 license.
Please see LICENSE file in the project root for terms.
-->
[![Build Status](https://travis-ci.org/yahoo/TensorFlowOnSpark.svg?branch=master)](https://travis-ci.org/yahoo/TensorFlowOnSpark) [![PyPI version](https://badge.fury.io/py/tensorflowonspark.svg)](https://badge.fury.io/py/tensorflowonspark)

# TensorFlowOnSpark
> _TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark
clusters._

## What's TensorFlowOnSpark?
[![Build Status](https://travis-ci.org/yahoo/TensorFlowOnSpark.svg?branch=master)](https://travis-ci.org/yahoo/TensorFlowOnSpark) [![PyPI version](https://badge.fury.io/py/tensorflowonspark.svg)](https://badge.fury.io/py/tensorflowonspark)

TensorFlowOnSpark brings scalable deep learning to [Apache Hadoop](http://hadoop.apache.org) and [Apache Spark](http://spark.apache.org)
clusters. By combining salient features from deep learning framework
[TensorFlow](https://www.tensorflow.org) and big-data frameworks
Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed
By combining salient features from the [TensorFlow](https://www.tensorflow.org) deep learning framework with [Apache Spark](http://spark.apache.org) and [Apache Hadoop](http://hadoop.apache.org), TensorFlowOnSpark enables distributed
deep learning on a cluster of GPU and CPU servers.

TensorFlowOnSpark enables distributed TensorFlow training and
inference on Apache Spark clusters. It seeks to minimize the amount
It enables both distributed TensorFlow training and
inferencing on Spark clusters, with a goal to minimize the amount
of code changes required to run existing TensorFlow programs on a
shared grid. Its Spark-compatible API helps manage the TensorFlow
cluster with the following steps:

1. **Startup** - launches the Tensorflow main function on the executors, along with listeners for data/control messages.
1. **Data ingestion**
- **Readers & QueueRunners** - leverages TensorFlow's [Reader](https://www.tensorflow.org/how_tos/reading_data/#reading_from_files) mechanism to read data files directly from HDFS.
- **Feeding** - sends Spark RDD data into the TensorFlow nodes using the [feed_dict](https://www.tensorflow.org/how_tos/reading_data/#feeding) mechanism. Note that we leverage the [Hadoop Input/Output Format](https://github.com/tensorflow/ecosystem/tree/master/hadoop) for access to TFRecords on HDFS.
- **InputMode.TENSORFLOW** - leverages TensorFlow's built-in APIs to read data files directly from HDFS.
- **InputMode.SPARK** - sends Spark RDD data to the TensorFlow nodes via the [feed_dict](https://www.tensorflow.org/how_tos/reading_data/#feeding) mechanism. Note that we leverage the [Hadoop Input/Output Format](https://github.com/tensorflow/ecosystem/tree/master/hadoop) to access TFRecords on HDFS.
1. **Shutdown** - shuts down the Tensorflow workers and PS nodes on the executors.

TensorFlowOnSpark was developed by Yahoo for large-scale distributed
deep learning on our Hadoop clusters in Yahoo's private cloud.
## Table of Contents

- [Background](#background)
- [Install](#install)
- [Usage](#usage)
- [API](#api)
- [Contribute](#contribute)
- [License](#license)

## Background

## Why TensorFlowOnSpark?
TensorFlowOnSpark was developed by Yahoo for large-scale distributed
deep learning on our Hadoop clusters in Yahoo's private cloud.

TensorFlowOnSpark provides some important benefits (see [our
blog](http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing-tensorflowonspark-distributed-deep))
Expand All @@ -43,19 +48,29 @@ over alternative deep learning solutions.
* Easily integrate with your existing data processing pipelines and machine learning algorithms (ex. MLlib, CaffeOnSpark);
* Easily deployed on cloud or on-premise: CPU & GPU, Ethernet and Infiniband.

## Install

TensorFlowOnSpark is provided as a pip package, which can be installed on single machines via:
```
pip install tensorflowonspark
```

For distributed clusters, please see our [wiki site](../../wiki) for detailed documentation for specific environments, such as our getting started guides for [single-node Spark Standalone](https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_Standalone), [YARN clusters](../../wiki/GetStarted_YARN) and [AWS EC2](../../wiki/GetStarted_EC2).

## Usage

To use TensorFlowOnSpark with an existing TensorFlow application, you can follow our [Conversion Guide](../../wiki/Conversion-Guide) to describe the required changes. Additionally, our [wiki site](../../wiki) has pointers to some presentations which provide an overview of the platform.

## API

[API Documentation](https://yahoo.github.io/TensorFlowOnSpark/) is automatically generated from the code.

## Using TensorFlowOnSpark
## Contribute

Please check TensorFlowOnSpark [wiki site](../../wiki) for detailed
documentations such as getting started guides for [YARN
cluster](../../wiki/GetStarted_YARN) and [AWS EC2
cluster](../../wiki/GetStarted_EC2). A [Conversion
Guide](../../wiki/Conversion-Guide) has been provided to help you convert
your TensorFlow programs.
Please join the [TensorFlowOnSpark user group](https://groups.google.com/forum/#!forum/TensorFlowOnSpark-users) for discussions and questions. If you have a question, please review our [FAQ](../../wiki/Frequently-Asked-Questions) before posting.

## Mailing List
Contributions are always welcome. For more information, please see our [guide for getting involved](Contributing.md).

Please join [TensorFlowOnSpark user group](https://groups.google.com/forum/#!forum/TensorFlowOnSpark-users) for discussions and questions.

## License

Expand Down