Turing AI Cloud Quick Start

Workflow Overview

The above picture illustrates the submission and debug workflows of TACC job.

Creating a TACC account

Before using tcloud SDK, please make sure that you have applied for a TACC account and submitted your public key to TACC. You may generate SSH public key according to the steps. To apply for a TACC account, please visit our website .

Installing `tcloud` SDK

Download tcloud SDK
Download the latest tcloud SDK from tags.
Install tcloud SDK
Place setup.sh and tcloud in the same directory, and run setup.sh.

Submitting Your First TACC Job

CLI Tool Initialization

First, you need to configure your TACC credentials. You can do this by running the tcloud config command:
```
$ tcloud config [-u/--username] MYUSERNAME
$ tcloud config [-f/--file] MYPRIVATEFILEPATH
```

Then, run tcloud init command to obtain the latest cluster hardware information from TACC cluster.

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
tacc*        up   infinite      5  alloc 10-0-7-[18-19],10-0-8-[18-19]
tacc*        up   infinite     19   idle 10-0-2-[18-19],10-0-3-[10-13]

Download Sample Job

You can use this link to download our example code.

Submit a Job

Each job requires a main.py with tuxiv.conf

main.py: Your machine learning training code.
tuxiv.conf: Detail about tuxiv.conf

After tcloud is configured correctly, you can try to submit your first job.

Go to the example folder in your terminal.

Run tcloud submit command.

~/Dow/quickstart-master/example/helloworld ❯ tcloud submit
Start parsing tuxiv.conf...
building file list ...
8 files to consider
helloworld/
helloworld/run.sh
        151 100%    0.00kB/s    0:00:00 (xfer#1, to-check=5/8)
helloworld/configurations/
helloworld/configurations/citynet.sh
          12 100%   11.72kB/s    0:00:00 (xfer#2, to-check=2/8)
helloworld/configurations/conda.yaml
        107 100%  104.49kB/s    0:00:00 (xfer#3, to-check=1/8)
helloworld/configurations/run.slurm
        278 100%  271.48kB/s    0:00:00 (xfer#4, to-check=0/8)

sent 429 bytes  received 144 bytes  382.00 bytes/sec
total size is 1071  speedup is 1.87
Submitted batch job 2000
Job helloworld submitted.

Retrive Your Job Status and Output

In this section, we provide two methods to monitor the job log.

After training, you can use tcloud ls [filepath] to find the output files

cat

You can configure your log path in the tuxiv.conf. The default path is slurm_log/slurm-jobid.out.
```
tcloud cat slurm_log/slurm-jobid.out
```
In the helloworld example, the tuxiv.conf file specifies the log path as slurm_log/hello.log
download

You can use tcloud download [filepath].

Note that you can only read and download files in USERDIR, and the files in WORKDIR may be removed after the job is finished.
```
tcloud download slurm_log/slurm-jobid.out
```

Manage your environment

tcloud uses Conda to manage your dependencies. All dependencies will be installed through conda. Please specify the required conda channel to meet the installation requirements. In tcloud, we offer two ways of environment management:

One-off Environment. A new environment with different dependencies will be created every time you submit a task to TACC. If you do not specify an environment name and your dependencies configuration does not change between two consecutive submissions in tuxiv.conf, we will reuse the previous environment to save time. This is the default behavior.
```
environment:
  # name:       # do not specify environment name
  dependencies:
      - pytorch=1.6.0
      - torchvision=0.7.0
  channels: pytorch
```
Persistent Environment. You can create a dedicated environment for each project. It needs to set a different environment name in tuxiv.conf for each project. When you change your dependencies configuration with an exist environment, tcloud will update this environment in stead of creating a new one. Learn how to do this in tuxiv.conf documentation environment part.
```
environment:
  name: torch-env   # dedicated environment name
  dependencies:
      - pytorch=1.6.0
      - torchvision=0.7.0
  channels: pytorch
```

Demo video

The following videos will help you use tcloud CLI to begin your TACC journey: demo video.

Examples

Basic examples are provided under the example folder. These examples include: HelloWorld, TensorFlow, PyTorch and MXNet.

FAQ

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
docs		docs
example		example
static		static
.gitignore		.gitignore
FAQ.md		FAQ.md
README.md		README.md
tuxiv.conf.md		tuxiv.conf.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Turing AI Cloud Quick Start

Workflow Overview

Creating a TACC account

Installing `tcloud` SDK

Submitting Your First TACC Job

CLI Tool Initialization

Download Sample Job

Submit a Job

Retrive Your Job Status and Output

Manage your environment

Demo video

Examples

FAQ

About

Releases 1

Packages

Contributors 4

Languages

turingaicloud/quickstart

Folders and files

Latest commit

History

Repository files navigation

Turing AI Cloud Quick Start

Workflow Overview

Creating a TACC account

Installing tcloud SDK

Submitting Your First TACC Job

CLI Tool Initialization

Download Sample Job

Submit a Job

Retrive Your Job Status and Output

Manage your environment

Demo video

Examples

FAQ

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Installing `tcloud` SDK

Packages