forked from Azure/acs-engine
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
386 changed files
with
45,953 additions
and
20,141 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Microsoft Azure Container Service Engine - Kubernetes Multi-GPU support Walkthrough | ||
|
||
## Deployment | ||
|
||
Here are the steps to deploy a simple Kubernetes cluster with multi-GPU support: | ||
|
||
1. [Install a Kubernetes cluster][Kubernetes Walkthrough](kubernetes.md) - shows how to create a Kubernetes cluster. | ||
> NOTE: Make sure to configure the agent nodes with vm size `Standard_NC12` or above to utilize the GPUs | ||
2. Install drivers: | ||
* SSH into each node and run the following scripts : | ||
install-nvidia-driver.sh | ||
``` | ||
curl -L -sf https://raw.githubusercontent.com/ritazh/acs-k8s-gpu/master/install-nvidia-driver.sh | sudo sh | ||
``` | ||
|
||
To verify, when you run `kubectl describe node <node-name>`, you should get something like the following: | ||
|
||
``` | ||
Capacity: | ||
alpha.kubernetes.io/nvidia-gpu: 2 | ||
cpu: 12 | ||
memory: 115505744Ki | ||
pods: 110 | ||
``` | ||
|
||
3. Scheduling a multi-GPU container | ||
|
||
* You need to specify `alpha.kubernetes.io/nvidia-gpu: 2` as a limit | ||
* You need to expose the drivers to the container as a volume. If you are using TF original docker image, it is based on ubuntu 16.04, just like your cluster's VM, so you can just mount `/usr/bin` and `/usr/lib/x86_64-linux-gnu`, it's a bit dirty but it works. Ideally, improve the previous script to install the driver in a specific directory and only expose this one. | ||
|
||
``` yaml | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: gpu-test | ||
labels: | ||
app: gpu-test | ||
spec: | ||
volumes: | ||
- name: binaries | ||
hostPath: | ||
path: /usr/bin/ | ||
- name: libraries | ||
hostPath: | ||
path: /usr/lib/x86_64-linux-gnu | ||
containers: | ||
- name: tensorflow | ||
image: gcr.io/tensorflow/tensorflow:latest-gpu | ||
ports: | ||
- containerPort: 8888 | ||
resources: | ||
limits: | ||
alpha.kubernetes.io/nvidia-gpu: 2 | ||
volumeMounts: | ||
- mountPath: /usr/bin/ | ||
name: binaries | ||
- mountPath: /usr/lib/x86_64-linux-gnu | ||
name: libraries | ||
``` | ||
To verify, when you run `kubectl describe pod <pod-name>`, you see get the following: | ||
|
||
``` | ||
Successfully assigned gpu-test to k8s-agentpool1-10960440-1 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Planning Process | ||
|
||
acs-engine features a lightweight process that emphasizes openness and ensures every community member can see project goals for the future. | ||
|
||
## The Role of Maintainers | ||
|
||
[Maintainers][] lead the acs-engine project. Their duties include proposing the Roadmap, reviewing and integrating contributions and maintaining the vision of the project. | ||
|
||
## Open Roadmap | ||
|
||
The [acs-engine Roadmap](roadmap.md) is a community document. While Maintainers propose the Roadmap, it gets discussed and refined in Release Planning Meetings. | ||
|
||
## Contributing to the Roadmap | ||
|
||
Proposals and issues can be opened by anyone. Every member of the community is welcome to participate in the discussion by providing feedback and/or offering counter-proposals. | ||
|
||
## Release Milestones | ||
|
||
The Roadmap gets delivered progressively via the [Release Schedule][]. Releases are defined during Release Planning Meetings and managed using GitHub Milestones which track specific deliverables and work-in-progress. | ||
|
||
## Release Planning Meetings | ||
|
||
Major decisions affecting the Roadmap are discussed during Release Planning Meetings on the first Thursday of each month, aligned with the [Release Schedule][] and monthly objectives for the Microsoft ACS team. | ||
|
||
Release Planning Meetings are not currently open to non-Microsoft contributors, but we may change this in the future. | ||
|
||
[Maintainers]: https://github.com/Azure/acs-engine/blob/master/OWNERS | ||
[Release Schedule]: releases.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# Releases | ||
|
||
acs-engine uses a [continuous delivery][] approach for creating releases. Every merged commit that passes | ||
testing results in a deliverable that can be given a [semantic version][] tag and shipped. | ||
|
||
## Release as Needed | ||
|
||
The master `git` branch of a project should always work. Only changes considered ready to be | ||
released publicly are merged. | ||
|
||
acs-engine depends on components that release new versions as often as needed. Fixing | ||
a high priority bug requires the project maintainer to create a new patch release. | ||
Merging a backward-compatible feature implies a minor release. | ||
|
||
By releasing often, each component release becomes a safe and routine event. This makes it faster | ||
and easier for users to obtain specific fixes. Continuous delivery also reduces the work | ||
necessary to release a product such as acs-engine, which depends on several external projects. | ||
|
||
"Components" applies not just to ACS projects, but also to development and release | ||
tools, orchestrator versions (Kubernetes, DC/OS, Swarm),to Docker base images, and to other Azure | ||
projects that do [semantic version][] releases. | ||
|
||
## acs-engine Releases Each Month | ||
|
||
acs-engine has a regular, public release cadence. From v0.1.0 onward, new acs-engine feature | ||
releases arrive on the first Thursday of each month. Patch releases are created at any time, | ||
as needed. GitHub milestones are used to communicate the content and timing of major and minor | ||
releases, and longer-term planning is visible at [the Roadmap](roadmap.md). | ||
|
||
acs-engine release timing is not linked to specific features. If a feature is merged before the | ||
release date, it is included in the next release. | ||
|
||
See "[How to Release acs-engine](#how-to-release-acs-engine)" for more detail. | ||
|
||
## Semantic Versioning | ||
|
||
acs-engine releases comply with [semantic versioning][semantic version], with the "public API" broadly | ||
defined as: | ||
|
||
- REST, gRPC, or other API that is network-accessible | ||
- Library or framework API intended for public use | ||
- "Pluggable" socket-level protocols users can redirect | ||
- CLI commands and output formats | ||
- Integration with Azure public APIs such as ARM | ||
|
||
In general, changes to anything a user might reasonably link to, customize, or integrate with should | ||
be backward-compatible, or else require a major release. acs-engine users can be confident that upgrading | ||
to a patch or to a minor release will not break anything. | ||
|
||
## How to Release acs-engine | ||
|
||
This section leads a maintainer through creating an acs-engine release. | ||
|
||
### Step 1: Assemble Master Changelog | ||
A change log is a file which contains a curated, chronologically ordered list of changes | ||
for each version of acs-engine, which helps users and contributors see what notable changes | ||
have been made between each version of the project. | ||
|
||
The CHANGELOG should be driven by release milestones defined on Github, which track specific deliverables and | ||
work-in-progress. | ||
|
||
### Step 2: Manual Testing | ||
|
||
Now it's time to go above and beyond current CI tests. Create a testing matrix spreadsheet (copying | ||
from the previous document is a good start) and sign up testers to cover all permutations. | ||
|
||
Testers should pay special attention to the overall user experience, make sure upgrading from | ||
earlier versions is smooth, and cover various storage configurations and Kubernetes versions and | ||
infrastructure providers. | ||
|
||
When showstopper-level bugs are found, the process is as follows: | ||
|
||
1. Create an issue that describes the bug. | ||
1. Create an PR that fixes the bug. | ||
- PRs should always include tests (unit or e2e as appropriate) to add | ||
automated coverage for the bug. | ||
1. Once the PR passes and is reviewed, merge it and update the CHANGELOG | ||
|
||
|
||
### Step 3: Tag and Create a Release | ||
|
||
TBD | ||
|
||
|
||
### Step 4: Close GitHub Milestones | ||
|
||
TBD | ||
|
||
### Step 5: Let Everyone Know | ||
|
||
Let the rest of the team know they can start blogging and tweeting about the new acs-engine release. | ||
Post a message to the #company channel on Slack. Include a link to the released chart and to the | ||
master CHANGELOG: | ||
|
||
``` | ||
@here acs-engine 0.1.0 is here! | ||
Master CHANGELOG: https://github.com/Azure/acs-engine/CHANGELOG.md | ||
``` | ||
|
||
You're done with the release. Nice job! | ||
|
||
[continuous delivery]: https://en.wikipedia.org/wiki/Continuous_delivery | ||
[semantic version]: http://semver.org |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
MARATHON_JSON=marathon-slave-public.json |
Oops, something went wrong.