Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ver 0.2.2 error in single-host distributed mode #10

Open
frank-y-liu opened this issue May 30, 2018 · 6 comments
Open

Ver 0.2.2 error in single-host distributed mode #10

frank-y-liu opened this issue May 30, 2018 · 6 comments

Comments

@frank-y-liu
Copy link

Updated to ver. 0.2.2. Local mode works fine. But the encountered following error in single-host distributed mode. Error message in the master:

frankliu@arlz010:~/Projects/DtCraft$ sudo ./bin/dtc-master           
I  6016 2018-05-29 20:22:09 master.cpp:200] Master @127.0.0.1 [agent:9909|graph:9910|webui:9912]                                          
I  6016 2018-05-29 20:22:42 master.cpp:271] Agent 0 connected @127.0.0.1 [cpu:0|mem:135081242624|disk:17340207104]                        
I  6016 2018-05-29 20:23:20 master.cpp:300] Graph 0 connected @arlz010 [vertex:4|stream:6|container:4]                                    
W  6016 2018-05-29 20:23:20 master.cpp:303] Graph 0 doesn't fit with available resources                                                  
I  6016 2018-05-29 20:23:20 master.cpp:142] Graph 0 is removed from the master                                                            
I  6016 2018-05-29 20:23:30 master.cpp:300] Graph 1 connected @arlz010 [vertex:2|stream:2|container:2]                                    
W  6016 2018-05-29 20:23:30 master.cpp:303] Graph 1 doesn't fit with available resources                                                  
I  6016 2018-05-29 20:23:30 master.cpp:142] Graph 1 is removed from the master       

Error message in the submission window:

frankliu@arlz010:~/Projects/DtCraft$ sbin/submit.sh --master=127.0.0.1 /home/frankliu/Projects/DtCraft/example/hello_world                
I  1920 2018-05-29 20:26:33 executor.cpp:159] Executor @arlz010 [stdout:38211|stderr:38045]                                               
I  1920 2018-05-29 20:26:33 executor.cpp:161] Submit graph to master @127.0.0.1:9910                                                      
I 55040 2018-05-29 20:26:33 executor.cpp:173] Solution received      
[Graph 4]                         
+----+-----+------+-----------+-------------------+                  
|Task|Agent|Status|Elapsed (s)|Memory (peak/limit)|                  
+----+-----+------+-----------+-------------------+                  
Graph finished with 1 error(s): Resource request doesn't fit in cluster

OS: Ubuntu 17.10

@frank-y-liu
Copy link
Author

Add: master branch works fine

@tsung-wei-huang
Copy link
Owner

Yes. Please try master branch. Notice we have added support for cgroup and now you need privilege to launch master and agents. Please follow here to start the cluster.

Looks like there is no CPU configured to the agent and this is why you get "Resource request doesn't fit in cluster" error. This should be resolved in the master branch.

@frank-y-liu
Copy link
Author

Thanks for getting back on this. Tried the master branch in the upstream repo. Still have the same problem. Error message from dtc-master:

I 14208 2018-05-30 14:13:06 master.cpp:200] Master @127.0.0.1 [agent:9909|graph:9910|webui:9912]                      
I 14208 2018-05-30 14:13:50 master.cpp:271] Agent 0 connected @127.0.0.1 [cpu:0|mem:135081242624|disk:17343295488]    
I 14208 2018-05-30 14:14:24 master.cpp:300] Graph 0 connected @arlz009 [vertex:2|stream:2|container:2]                
W 14208 2018-05-30 14:14:24 master.cpp:303] Graph 0 doesn't fit with available resources                              
I 14208 2018-05-30 14:14:24 master.cpp:142] Graph 0 is removed from the master                 

Any suggestions to turn on debug?

@frank-y-liu
Copy link
Author

frank-y-liu commented May 30, 2018

Added log message from dtc-agent:

I 46976 2018-05-30 14:13:50 agent.cpp:135] Agent @127.0.0.1 [frontier:9913]                                           
I 46976 2018-05-30 14:13:50 agent.cpp:138] cg-subsys.memory "/sys/fs/cgroup/memory/dtc" [limit:135081242624]          
I 46976 2018-05-30 14:13:50 agent.cpp:139] cg-subsys.cpuset "/sys/fs/cgroup/cpuset/dtc" [cpus:0]                      
I 46976 2018-05-30 14:13:50 agent.cpp:140] cg-subsys.blkio "/sys/fs/cgroup/blkio/dtc" [weight:500]          

Does this mean the dtc-agent didn't get any cpu's allocated?

@tsung-wei-huang
Copy link
Owner

Could you please cat /sys/fs/cgroup/cpuset/dtc/cpuset.cpus and let me know what u have?

@tsung-wei-huang
Copy link
Owner

I have fixed a minor bug in the cgroup that might cause you to have this problem. Please update with the master branch and try it again. Let me know if the problem still exits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants