/
cloudtask-faq.txt
198 lines (144 loc) · 7.57 KB
/
cloudtask-faq.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
=============
Cloudtask-FAQ
=============
--------------------------
Frequently Asked Questions
--------------------------
:Author: Liraz Siri <liraz@turnkeylinux.org>
:Date: 2011-08-11
:Manual section: 7
:Manual group: misc
What is a task job?
===================
A task is a sequence of jobs. Each job is essentially just a shell
command which cloudtask creates by appending the task command to the job
input arguments. For example, consider the following cloudtask::
seq 3 | cloudtask echo
`seq 3` prints out a sequence of numbers from 1 to 3, each on a separate
line which cloudtask appends to the `echo` command to create three
commands::
echo 1
echo 2
echo 3
Each job command should be independent, which means it shouldn't rely on
any other job command being run before or after it on a particular
worker. The execution order and distribution of job commands is up to
cloudtask. If a task is split up amongst multiple workers (e.g.,
--split=3) each job command is likely be executed on a different server.
Job commands may not require any user interaction. Cloudtask can not
interact with job commands so any attempt at user interaction (e.g., a
confirmation dialog) will hang the job until the configured job
--timeout elapses (1 hour by default).
How do I prepare a worker for a job?
====================================
On a fresh TurnKey Core deployment install and test all the software
(e.g., packages, custom scripts, etc.) that your job command depend on.
This is your master worker.
Backup the master using TKLBAM, and pass its backup id to cloudtask so
that it can restore this backup on any worker it launches automatically.
You can substitute or supplement a TKLBAM restore with the --pre command
(e.g., to install a package) and/or apply an --overlay to the worker's
filesystem.
How do I provide job commands with required input data?
=======================================================
Small amounts of input data may be stored in the TKLBAM backup or
transfered over to the worker in the overlay.
For more substantial amounts of input data, it is recommended to pull in
data over the network (e.g., from a file server or Amazon S3).
Where do I store the useful end-products of a job command?
==========================================================
Jobs should squirrel away useful end-products such as files to an
external storage resource on the network.
Any hard disk storage space on the worker should be considered temporary
as any automatically launched worker will be destroyed at the end of the
task along with the contents of its temporary storage space.
For example if a job creates files on the local filesystem those would
be lost when the worker is destroyed unless the they are first uploaded
over the network to a file server, or to Amazon S3, etc.
Any console output (e.g., print statements) from a job is automatically
logged by Cloudtask.
What happens if a job fails?
============================
A job is considered to have failed if the job command returns a non-zero
exitcode. Failed jobs are not retried. They are simply logged and the
total number of job failures reported at the end of the session. The
worker then continues executing the next job.
Are jobs divided equally amongst workers?
=========================================
Not necessarily. Workers pull job commands from a queue of jobs on a
first come first served basis. A worker will grab the next job from the
queue as soon as it is finished with the previous job. A fast worker or
a worker that has received shorter jobs may execute more jobs than a
slow worker or a worker that has received longer jobs.
How does cloudtask authenticate to workers?
===========================================
Cloudtask logs into remote servers over SSH. It assumes it can do this
without a password using SSH key authentication (e.g., your SSH key has
been added to the worker's authorized keys). Password authentication is
not supported.
In the User Profile section the Hub allows you to configure one or more
SSH public keys which will be added to the authorized keys of any cloud
server launched.
So I need to put my private SSH key on any remote server I run cloudtask on?
============================================================================
That's one way to do it. Another, more secure alternative would be to
use SSH agent forwarding to log into the remote server::
ssh -A remote-server
Forwarding the local SSH agent will let remote-server authentiate with
your SSH keys without them ever leaving the security of your personal
computer.
What if a worker fails?
=======================
Cloudtask does not depend on the reliability of any single worker. If a
worker fails while it is running a job, the job will be re-routed to one
of the remaining workers.
A worker is considered to have failed when cloudtask detects that it is
no longer capable of executing commands over SSH (I.e., cloudtask pings
workers periodically).
It doesn't matter if this is because of a network routing problem which
makes the worker unreachable, a software problem (e.g., kernel panic) or
a critical performance issue such as the worker running out of memory
and thrashing so badly into swap that it can't even accept commands over
SSH.
As usual Cloudtask takes responsibility for the destruction of workers
it launches. A worker that has failed will be destroyed immediately.
Do I have to use the Hub to launch workers?
===========================================
No that's just the easiest way to do it. Cloudtask can accept an
arbitrary list of worker IP addresses via the --workers option.
Can I mix pre-launched workers with automatically launched workers?
===================================================================
Yes. If the --split is greater than the number of pre-launched workers
you provide via the --workers option then Cloudtask will launch
additional workers to satisfy the configured split.
For example, if you provide a list of 5 pre-launched worker IP addresses
and specify a task split of 15 then Cloudtask will launch an additional
10 workers automatically.
When are workers automatically destroyed?
=========================================
To minimize cloud server usage fees, Cloudtask destroys workers it
launches as soon as it runs out of work for them to do.
But Cloudtask only takes responsibility for the destruction of workers
it launches automatically. You can also launch workers by hand using the
cloudtask-launch-workers command and pass them to cloudtask using the
--workers option. In that case you are responsibile for worker
destruction (e.g., using the cloudtask-destroy-workers command).
How do I abort a task?
======================
You can abort a task safely at any time by either:
1) Pressing CTRL-C on the console in which cloudtask is executing.
2) Use kill to send the TERM signal to cloudtask session pid.
What happens when I abort a task?
=================================
The execution of all currently running jobs is immediately aborted. Any
worker instance that was automatically launched by cloudtask is
destroyed as soon as possible.
To allow an aborted session to be later resumed, the current state of
the task is saved in the task session. The state describes which jobs
have finished executing and which jobs are still in the pending state.
When the task is resumed any aborted jobs will be re-executed along with
the other pending jobs.
Aborting a task is not immediate because it can take anywhere from a few
seconds to to a few minutes to safely shut down a task. For example EC2
instances in the pending state can not be destroyed so cloudtask has to
wait for them to reach the running state first.