Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on restarting ec-2 cluster #11

Open
jortiz16 opened this issue Apr 2, 2015 · 7 comments
Open

Question on restarting ec-2 cluster #11

jortiz16 opened this issue Apr 2, 2015 · 7 comments

Comments

@jortiz16
Copy link

jortiz16 commented Apr 2, 2015

I was trying out the ability to start/restart myria-ec2 instances. Last night I started a cluster of 5 on-demand instances and stopped them. This morning, I tried to start them up again by running "starcluster start -x mycluster". My cluster did turn on, but there was an error when it tried to start up myria.

From the logs, it seems like starcluster was trying to issue an rsync command to /mnt/myria_ec2_deployment , which does not exist. So I was thinking I am probably not setting up or restarting correctly?

@BrandonHaynes
Copy link
Member

What instance types are you using? Are they EBS-backed?

If you happen to still have the log available, you might try pasting the relevant bits here so that I can better-diagnose.

Also -- I have several restart-related bugfixes that I'm expecting to push out today, and I suspect that one of them might be impacting you.

@jortiz16
Copy link
Author

jortiz16 commented Apr 2, 2015

Ah okay that is most likely what I missed, so how would I go about disabling the EBS-backend? I was reading through the starcluster - http://star.mit.edu/cluster/docs/0.92rc2/manual/configuration.html docs and it seems like you would enable EBS volumes through a [volume] section, is that right? I don't see the volume mentioned anywhere in the myriacluster.config. In the myriaplugin.py, there is a path parameter that is set to path='/mnt/myria_ec2_deployment', is that what would need to be modified in the config to disable EBS?

On the other hand, if I did disable the EBS-backend, then I'd be left with less storage, is that true? Or am I completely misunderstanding the point of the EBS volumes? Below is the error that I got because I tried to restart a machine with an EBS-backend:

DEBUG:starcluster:executing remote command: source /etc/profile && cd ~/myria/myriadeploy && sudo ./launch_cluster.sh ~/deployment.cfg.ec2
!!! ERROR - Error occured while running plugin 'myriaplugin.MyriaInstaller':
ERROR:starcluster:Error occured while running plugin 'myriaplugin.MyriaInstaller':
!!! ERROR - remote command 'source /etc/profile && cd
!!! ERROR - ~/myria/myriadeploy && sudo ./launch_cluster.sh
!!! ERROR - ~/deployment.cfg.ec2' failed with status 1:
!!! ERROR - starting master
!!! ERROR - sending incremental file list
!!! ERROR - Error starting master
!!! ERROR - rsync: change_dir#3 "/mnt/myria_ec2_deployment" failed: No
!!! ERROR - such file or directory (2)
!!! ERROR - rsync error: errors selecting input/output files, dirs (code
!!! ERROR - 3) at main.c(643) [Receiver=3.0.9]
!!! ERROR - rsync: connection unexpectedly closed (9 bytes received so
!!! ERROR - far) [sender]
!!! ERROR - rsync error: error in rsync protocol data stream (code 12)
!!! ERROR - at io.c(605) [sender=3.0.9]
!!! ERROR - Exception in thread "main" java.lang.RuntimeException: Error
!!! ERROR - 12 executing command: rsync -rtLDvz /root/deployment.cfg.ec2
!!! ERROR - mycluster-master:/mnt/myria_ec2_deployment/MyriaEC2-files
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.startA
!!! ERROR - Process(DeploymentUtils.java:396)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.startA
!!! ERROR - Process(DeploymentUtils.java:380)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.rsyncF
!!! ERROR - ileToRemote(DeploymentUtils.java:361)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.rsyncF
!!! ERROR - ileToRemote(DeploymentUtils.java:340)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.main(D
!!! ERROR - eploymentUtils.java:136)
ERROR:starcluster:eploymentUtils.java:136)

Thanks again! I'll check out the new bugfixes once they are out. This isn't super urgent for me, but I just wanted to check it out to test and know what I could do with this feature.

@senderista
Copy link

"EBS-backed" just refers to the boot volume, not to any other EBS volumes
that may be attached to the instance.

On Thu, Apr 2, 2015 at 3:29 PM, jortiz16 notifications@github.com wrote:

Ah okay that is most likely what I missed, so how would I go about
disabling the EBS-backend? I was reading through the starcluster -
http://star.mit.edu/cluster/docs/0.92rc2/manual/configuration.html docs
and it seems like you would enable EBS volumes through a [volume] section,
is that right? I don't see the volume mentioned anywhere in the
myriacluster.config. In the myriaplugin.py, there is a path parameter that
is set to path='/mnt/myria_ec2_deployment', is that what would need to be
modified in the config to disable EBS?

On the other hand, if I did disable the EBS-backend, then I'd be left with
less storage, is that true? Or am I completely misunderstanding the point
of the EBS volumes? Below is the error that I got because I tried to
restart a machine with an EBS-backend:

INFO:starcluster:Begin Myria cluster launch on mycluster-master
DEBUG:starcluster:executing remote command: source /etc/profile && cd ~/myria/myriadeploy && sudo ./launch_cluster.sh ~/deployment.cfg.ec2
!!! ERROR - Error occured while running plugin 'myriaplugin.MyriaInstaller':
ERROR:starcluster:Error occured while running plugin 'myriaplugin.MyriaInstaller':
!!! ERROR - remote command 'source /etc/profile && cd
!!! ERROR - ~/myria/myriadeploy && sudo ./launch_cluster.sh
!!! ERROR - ~/deployment.cfg.ec2' failed with status 1:
!!! ERROR - starting master
!!! ERROR - sending incremental file list
!!! ERROR - Error starting master
!!! ERROR - rsync: change_dir#3 "/mnt/myria_ec2_deployment" failed: No
!!! ERROR - such file or directory (2)
!!! ERROR - rsync error: errors selecting input/output files, dirs (code
!!! ERROR - 3) at main.c(643) [Receiver=3.0.9]
!!! ERROR - rsync: connection unexpectedly closed (9 bytes received so
!!! ERROR - far) [sender]
!!! ERROR - rsync error: error in rsync protocol data stream (code 12)
!!! ERROR - at io.c(605) [sender=3.0.9]
!!! ERROR - Exception in thread "main" java.lang.RuntimeException: Error
!!! ERROR - 12 executing command: rsync -rtLDvz /root/deployment.cfg.ec2
!!! ERROR - mycluster-master:/mnt/myria_ec2_deployment/MyriaEC2-files
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.startA
!!! ERROR - Process(DeploymentUtils.java:396)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.startA
!!! ERROR - Process(DeploymentUtils.java:380)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.rsyncF
!!! ERROR - ileToRemote(DeploymentUtils.java:361)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.rsyncF
!!! ERROR - ileToRemote(DeploymentUtils.java:340)
!!! ERROR - at edu.washington.escience.myria.util.DeploymentUtils.main(D
!!! ERROR - eploymentUtils.java:136)

Thanks again! I'll check out the new bugfixes once they are out. This
isn't super urgent for me, but I just wanted to check it out to test and
know what I could do with this feature.


Reply to this email directly or view it on GitHub
#11 (comment).

@jortiz16
Copy link
Author

jortiz16 commented Apr 2, 2015

And I was using 5 m1.large on-demand instances

@jortiz16
Copy link
Author

jortiz16 commented Apr 2, 2015

Thanks @senderista ! That makes sense, I was only thinking of the EBS volumes directly attached to the instances

@BrandonHaynes
Copy link
Member

This is super weird. The Myria deployment script should be automatically creating /mnt/myria_ec2_deployment, and this leads me to believe that a volume isn't being mounted properly at /mnt. Can you log into the cluster (starcluster sshmaster mycluster) and check to see if /mnt exists? If it does, are you able to manually create /mnt/myria_ec2_deployment?

@jortiz16
Copy link
Author

jortiz16 commented Apr 3, 2015

So I shut down that cluster earlier, but I saved my config file. I can retrace my steps and retry this to check whether /mnt exists or whether I can create a folder. I will get back to you once I know. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants