Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of memory with 12GB available #116

Closed
thoellrich opened this issue Jun 30, 2017 · 25 comments
Closed

Running out of memory with 12GB available #116

thoellrich opened this issue Jun 30, 2017 · 25 comments
Labels

Comments

@thoellrich
Copy link

Running awless -e sync with 12GB of memory available ends up in the process being killed because of an OOM situation. Would have expected that 12GB is plenty of memory for the task.

~/go/src/github.com/wallix/awless$ uname -a
Linux ubuntu 3.13.0-98-generic #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

~/go/src/github.com/wallix/awless$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

~/go/src/github.com/wallix/awless$ free
             total       used       free     shared    buffers     cached
Mem:      16427688    2612424   13815264        652        260      27940
-/+ buffers/cache:    2584224   13843464
Swap:      1046524    1046524          0

~/go/src/github.com/wallix/awless$ ./awless --version
awless version=v0.1.1

~/go/src/github.com/wallix/awless$ rm -rf ~/.awless

~/go/src/github.com/wallix/awless$ ./awless -e sync
First install. Welcome!

Found existing AWS region 'us-east-1'. Setting it as your default region.
Region updated to 'us-east-1'.
You might want to update your default AMI with `awless config set instance.image $(awless search images amazonlinux --id-only --silent)`
Syncing new region...
[extra]   sync: fetched lambda service took 406.591864ms
[extra]   sync: fetched cloudformation service took 408.829508ms
[extra]   sync: fetched cdn service took 489.701489ms
[extra]   sync: fetched dns service took 895.959112ms
[extra]   sync: fetched messaging service took 1.280953266s
[extra]   sync: fetched infra service took 3.58098968s
Killed

~/go/src/github.com/wallix/awless$ tail /var/log/syslog
Jun 30 13:01:00 localhost kernel: [77434.411864] [21886]     0 21886      376        0       5       16             0 sh
Jun 30 13:01:00 localhost kernel: [77434.411865] [21949]     0 21949  1489483    64503     295    17131             0 java
Jun 30 13:01:00 localhost kernel: [77434.411866] [25321]  1000 25321     7042     1902      20        0             0 bash
Jun 30 13:01:00 localhost kernel: [77434.411867] [27581]  1000 27581     2386       43      11        0             0 less
Jun 30 13:01:00 localhost kernel: [77434.411868] [27596]  1000 27596  9187110  3412059   14185        0             0 awless
Jun 30 13:01:00 localhost kernel: [77434.411869] Out of memory: Kill process 27596 (awless) score 784 or sacrifice child
Jun 30 13:01:00 localhost kernel: [77434.411889] Killed process 27596 (awless) total-vm:36748440kB, anon-rss:13648236kB, file-rss:0kB
@simcap
Copy link
Contributor

simcap commented Jun 30, 2017

Thanks for reporting!

Let us see if the size of your infra is causing the OOM. First we are going to sync just the infra:

> rm -rf ~/.awless
> ./awless -e sync --infra

Then we can deactivate the sync of the infra and sync again:

> rm -rf ~/.awless
> ./awless config set aws.infra.sync false
> ./awless -e sync

Can you run that and output the results ? Cheers

@thoellrich
Copy link
Author

Hi Simon - I just ran the steps and the 2nd chunk ran into OOM again. However, I think that's because you had one too many rm -rf ~/.awless in there. If I run the sequence without the 2nd rm then infra syncs fine and !infra also syncs correctly.

Here it is with just --infra using /usr/bin/time:

~/go/src/github.com/wallix/awless$ rm -rf ~/.awless
~/go/src/github.com/wallix/awless$ /usr/bin/time ./awless -e sync --infra
First install. Welcome!
...
[info]    sync took 2.334560243s
20.49user 7.84system 0:19.68elapsed 143%CPU (0avgtext+0avgdata 1456280maxresident)k
0inputs+5016outputs (0major+293772minor)pagefaults 0swaps
~/go/src/github.com/wallix/awless$

I think we can close this now, as there is a work-around.

@simcap
Copy link
Contributor

simcap commented Jun 30, 2017

Thanks. Indeed my steps were not clear enough also I did want to run the 2 distinct scenarios:

  • the first one was to see if syncing just the infra was ok.
  • the second one was to see if syncing all expect the infra was ok.

Also I remember now that the first install always trigger a new sync without any log (it is just written Syncing new region), which does not help us to investigate.

So are you saying that your last 2 command rm -rf ~/.awless, /usr/bin/time ./awless -e sync --infra did not trigger an OOM (it seems from your output at least)?

Also what do you consider a work-around ? It seems to me that awless might still have an intermittent issue when syncing your infra, and something is taking too much memory

Would you mind sharing in -e extra verbose mode the entities count in your infra? This would be a line at the end of a sync looking like:

[info]    -> infra: 6 snapshots, 2 dbsubnetgroups, 0 database, 1 launchconfiguration, 0 scalingpolicy, 0 containercluster, 3 vpcs, 4 images, 0 loadbalancer, 2 availabilityzones, 7 keypairs, 0 volume, 5 routetables, 0 containertask, 4 internetgateways, 0 importimagetask, 0 scalinggroup, 0 instance, 0 container, 0 containerinstance, 7 securitygroups, 0 targetgroup, 0 listener, 0 repository, 7 subnets, 0 natgateway, 1 elasticip

@thoellrich
Copy link
Author

Thanks for taking the time to look in more detail into this.

I believe I understood what the intention of your previous 2 commands was: sync just infra, then sync just non-infra. That's why I removed the rm -rf between those two steps during my last test (after realizing that ./awless config set aws.infra.sync false would do a full sync if ~/.awless was missing).

I considered it a work-around, b/c I was able to do at least one error-free sync using the 2 steps.

Now that I try to get a clean run with ~/.awless removed, it always seems to fail:

~/go/src/github.com/wallix/awless$ rm -rf ~/.awless && free && /usr/bin/time --verbose ./awless -e sync --infra
             total       used       free     shared    buffers     cached
Mem:      16427688     737560   15690128          4        424      11976
-/+ buffers/cache:     725160   15702528
Swap:      1046524     316260     730264
First install. Welcome!

Found existing AWS region 'us-east-1'. Setting it as your default region.
Region updated to 'us-east-1'.
You might want to update your default AMI with `awless config set instance.image $(awless search images amazonlinux --id-only --silent)`
Syncing new region...
[extra]   sync: fetched cdn service took 427.295469ms
[extra]   sync: fetched lambda service took 437.358012ms
[extra]   sync: fetched cloudformation service took 485.616298ms
[extra]   sync: fetched dns service took 852.88484ms
[extra]   sync: fetched messaging service took 892.474792ms
Command terminated by signal 9
        Command being timed: "./awless -e sync --infra"
        User time (seconds): 30.61
        System time (seconds): 31.14
        Percent of CPU this job got: 295%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.88
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 15719488
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 4200
        Minor (reclaiming a frame) page faults: 874526
        Voluntary context switches: 150976
        Involuntary context switches: 11412
        Swaps: 0
        File system inputs: 291152
        File system outputs: 88
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

If I am repeating the same sync-command without removing ~/.awless, then it seems to succeed and shows you the entity count - this time also with a lot more reasonable memory consumption:

~/go/src/github.com/wallix/awless$ free && /usr/bin/time --verbose ./awless -e sync --infra
             total       used       free     shared    buffers     cached
Mem:      16427688     534404   15893284          4        236      10928
-/+ buffers/cache:     523240   15904448
Swap:      1046524     534320     512204
[verbose] loading AWS session with profile 'default' and region 'us-east-1'
[info]    running sync: fetching remote resources for local store
[extra]   sync: fetched infra service took 3.22465402s
[info]    -> infra: 118 securitygroups, 19 routetables, 6 availabilityzones, 18 images, 0 listener, 15 containertasks, 51 snapshots, 27 subnets, 7 databases, 126 volumes, 4 dbsubnetgroups, 0 natgateway, 0 importimagetask, 10 elasticips, 6 internetgateways, 0 loadbalancer, 0 targetgroup, 2 containerclusters, 2 launchconfigurations, 2 scalinggroups, 27 repositories, 72 instances, 7 vpcs, 27 keypairs, 0 scalingpolicy, 10 containers, 1 containerinstance
[info]    sync took 3.296068554s
        Command being timed: "./awless -e sync --infra"
        User time (seconds): 9.21
        System time (seconds): 6.12
        Percent of CPU this job got: 353%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.33
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1192892
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 183
        Minor (reclaiming a frame) page faults: 267455
        Voluntary context switches: 12518
        Involuntary context switches: 891
        Swaps: 0
        File system inputs: 42464
        File system outputs: 1640
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I'm positive that during the previous attempts I ran rm -rf ~/.awless && ./awless -e sync --infra without an OOM error, but now I can't seem to make it happen ...

@simcap
Copy link
Contributor

simcap commented Jun 30, 2017

Yeap. In my opinion this is definitely due to the size/amount of what is fetched:

  • for full sync (syncing all services) it fails when the infra is coming in (comes just after messaging)
  • for infra sync alone it is ok

So adding up the infra (which as your output shows does not have a negligible size) to the previous sync services makes it blow up.

There is actually for some services a lot of parallel calls done. For instance, to retrieve containertasks properly (not the AWS way ;) ) it is quite greedy. The ecs service sync should actually be deactivated by default in the config (as is the s3objects for instance), but we have not pushed that yet in the code.

Anyway, we will have a look at how to mitigate and improve that in the coming weeks. If you have any ideas to share on this issue or more comments on the CLI do not hesitate.

I will keep this open until we have a kind of resolution.

Thanks.

@simcap
Copy link
Contributor

simcap commented Jul 13, 2017

@thoellrich I have been running some memory bench locally. So far the most greedy service is actually when fetching & resolving the access info (basically IAM users, groups, roles, ....).

In your case, I notice that syncing the access is the longest as it never displays the time it took (when in verbose mode). I am curious if you were to only sync the access with the following:

/usr/bin/time --verbose ./awless -e sync --access

(it does not have to be a first install)

Would you mind outputting the result of the command? Cheers.

@thoellrich
Copy link
Author

Here you go:

~/go/src/github.com/wallix/awless$ git pull && go build && rm -rf ~/.awless && ./awless --version && /usr/bin/time --verbose ./awless -e sync --access
awless version=v0.1.2
Welcome to awless! Resolving environment data...

Found existing AWS region 'us-east-1'. Setting it as your default region.
[verbose] loading AWS session with profile 'default' and region 'us-east-1'
[extra]   no valid cached credentials, getting new credentials
[info]    Syncing new region 'us-east-1'
[verbose] sync: *disabled* for resource storage[s3object]
[extra]   sync: fetched cloudformation service took 407.658411ms
[extra]   sync: fetched cdn service took 483.8939ms
[extra]   sync: fetched lambda service took 542.587173ms
[extra]   sync: fetched messaging service took 872.469067ms
[extra]   sync: fetched dns service took 2.311372997s
[extra]   sync: fetched infra service took 10.040019966s
[extra]   sync: fetched storage service took 10.320055411s
[extra]   sync: fetched access service took 14.548521254s

All done. Enjoy!
You can review and configure awless with `awless config`

Now running: `awless sync`
[info]    running sync for region 'us-east-1'
[extra]   sync: fetched access service took 7.302027021s
[info]    -> access: 38 policies, 12 groups, 81 roles, 71 users, 1 accesskey, 67 instanceprofiles
[info]    sync took 7.327083553s
        Command being timed: "./awless -e sync --access"
        User time (seconds): 11.77
        System time (seconds): 35.27
        Percent of CPU this job got: 205%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.88
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1687772
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 351369
        Voluntary context switches: 32561
        Involuntary context switches: 4237
        Swaps: 0
        File system inputs: 0
        File system outputs: 3984
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
~/go/src/github.com/wallix/awless$

@simcap
Copy link
Contributor

simcap commented Jul 17, 2017

Thanks a lot @thoellrich . So here a full sync is done (due to first install) and then the command to only sync access.

The funny thing is that you did not have the OOM issue!

I have just fixed also the fact that for first install we did not inject a proper logger in the Sync, hence it does not properly logs all info. For instance, in your case we should have seen [verbose] sync: *disabled* for service monitoring.

Anyway, roughly we can see that each of your services is a minimum of 1G in memory, which is a lot and we will have to figure out how to improve that.

Also a sync in your case would take more than 10 seconds. I am wondering if that does not make in your case the one-liners creation and awless {run,show,revert} commands impractical as those commands run an underlying sync (either pre or post command). For instance after running a on-liner like awless create instance ..., you might have to wait 10 seconds for the command to return.

As we try to focus awless on broad practical usage, it would be great if you could have a look at this https://goo.gl/forms/1lQFPEIxdt37aDn43

Cheers.

@gauravarora
Copy link

I'm on a mac with 8GB RAM and I can see that's clearly not enough. kernel_task takes almost double the CPU of awless when running sync presumably because of swapping. I've tried running sync --infra but that still syncs everything. Going to open a separate bug for that.

@simcap
Copy link
Contributor

simcap commented Jul 21, 2017

@gauravarora Thanks for reporting the issue.

The sync process is obviously proportional to the size of each services' resources to fetch and resolve. Given the size you have:

...
[extra]   sync: fetched storage service took 3m31.735965746s
[extra]   sync: fetched access service took 10m12.25207038s
...

... you run indeed into an limitation of local resources. And let me be clear, this is not acceptable from our point of view. Your infra should be a considered a normal infra (tending towards large maybe) and awless should not have trouble working with it.

Also the sync (per service) is run at different time when using awless (see doc). So 3mins or 10mins to resolve a service is definitely a no-no.

With awless we have spent so far time getting the foundations/paradigm out there. We have a lot more to come and tailoring awless to deal better with all kind of infrastructure's size is part of that.

PS: Thanks for opening a ticket for the separate issue of first install + syncing one service only. It might help indeed but as you guessed will not be a solution, especially since first install is tailored for newcomers that would not know there is a flag to sync per service.

Note: As said in the Getting Started doc, you can disable autosync with awless config set autosync false

simcap added a commit that referenced this issue Jul 28, 2017
snapshot instead of graph to avoid numerous in loop
call to triplestore.Source.Snapshot() (call that
pre allocates memory)

Tackle part of issue #116

Overall rough improvements:
- CPU: 50% better on fetching mainly Access resources
- Mem alloc_object: from ~70% to ~40% of cumulated alloc_objects
      for Snapshot() method calls
@simcap
Copy link
Contributor

simcap commented Jul 28, 2017

I have reduced the usage of memory during the sync (see commit 417a0a9).

This is only available for now on master (go get -u github.com/wallix/awless) but will be on 0.1.2 when released

@thoellrich @gauravarora More importantly the flag --profile-sync has been added on the awless sync command. So by running:

awless sync --profile-sync

# or for a specific service

awless sync --profile-sync --infra

... it will dump the profiling Go files mem-sync.prof and cpu-sync.prof for later inspection.

Then to inspect memory enter the interactive pprof and enter the web command like so:

$ go tool pprof -alloc_space mem-sync.prof
Entering interactive mode (type "help" for commands)
(pprof) web

The svg file can be viewed or sent to us to find the culprits ;)

Cheers!

@simcap
Copy link
Contributor

simcap commented Jul 28, 2017

Overall, independently of local or small improvements, the main issue is that we hold in memory all the cloud resources fetched. They are represented as RDF triples and indexed in a map before being flushed and written to their corresponding service local file (under ~/.awless/aws/rdf)

My take is that in order to NOT use that amount of memory, we will have to stream (i.e. channeled) the triples from creation down to being written to their respecting files. That would avoid holding them all in memory while still closing the file when all triples for this service are done.

Obviously, the interesting challenge is that cloud resources are held in memory to reconcile amongst them and built up their relations.

simcap added a commit that referenced this issue Aug 4, 2017
(issue #116)

Running with /usr/bin/time --verbose we get a decrease
of 2.5 in Maximum Resident Size memory.

Profiling give us a decrease in alloc_space from ~90%
to below 50%. See profiling results:

NEW
Showing top 10 nodes out of 292 (cum >= 2.51MB)
      flat  flat%   sum%        cum   cum%
   20.87MB 20.87% 20.87%    20.87MB 20.87%  runtime.makemap
   12.50MB 12.51% 33.38%    48.87MB 48.88%  github.com/wallix/awless/vendor/github.com/wallix/triplestore.(*source).Snapshot
   11.50MB 11.50% 44.88%    11.50MB 11.50%  runtime.rawstringtmp

OLD
Showing top 10 nodes out of 80 (cum >= 2MB)
      flat  flat%   sum%        cum   cum%
  165.34MB 46.51% 46.51%   165.34MB 46.51%  runtime.makemap
   80.51MB 22.65% 69.16%    80.51MB 22.65%  runtime.rawstringtmp
   55.54MB 15.62% 84.78%   323.87MB 91.10%  github.com/wallix/awless/vendor/github.com/wallix/triplestore.(*source).Snapshot
@simcap
Copy link
Contributor

simcap commented Aug 4, 2017

To see improvements from commit above (8bc14ce) and see what still takes memory with big infra one can run:

$ go get -u github.com/wallix/awless  # (get latest master locally)
$ awless sync --profile-sync
$ go tool pprof -alloc_space mem-sync.prof
Entering interactive mode (type "help" for commands)
(pprof) web

@deinspanjer
Copy link
Contributor

I was getting the oom problem on my MBP with 16GB of memory.
I just tried running with the latest head and sync quickly completed!
I did run with the profiling, and if it is of use to you, I can share the data it collected. here is the output of a fresh sync -e:

➜  aws git:(master) ✗ awless -e sync
[verbose] loading AWS session with profile 'default' and region 'us-east-1'
[extra]   no valid cached credentials, getting new credentials
[info]    running sync for region 'us-east-1'
[verbose] sync: *disabled* for service monitoring
[verbose] sync: *disabled* for resource storage[s3object]
[extra]   sync: fetched cdn service took 273.726287ms
[extra]   sync: fetched cloudformation service took 303.912209ms
[extra]   sync: fetched messaging service took 378.269565ms
[extra]   sync: fetched access service took 394.650609ms
[extra]   sync: fetched dns service took 511.214863ms
[extra]   sync: fetched lambda service took 555.809695ms
[extra]   sync: fetched storage service took 3.558290217s
[extra]   sync: fetched infra service took 5.960590517s
[info]    -> lambda: 3 functions
[info]    -> storage: 0 s3object, 126 buckets
[info]    -> infra: 17 repositories, 5 keypairs, 3 routetables, 62 images, 4865 snapshots, 0 loadbalancer, 1 launchconfiguration, 127 volumes, 3 internetgateways, 0 importimagetask, 0 targetgroup, 3 vpcs, 1 scalinggroup, 0 scalingpolicy, 0 containertask, 1 database, 65 instances, 0 listener, 6 availabilityzones, 5 dbsubnetgroups, 0 containercluster, 0 container, 38 elasticips, 0 containerinstance, 11 subnets, 40 securitygroups, 0 natgateway
[info]    -> cdn: 1 distribution
[info]    -> cloudformation: 3 stacks
[info]    -> messaging: 2 queues, 8 topics, 2 subscriptions
[info]    -> access: 2 instanceprofiles, 23 policies, 17 roles, 2 accesskeys, 18 users, 7 groups
[info]    -> dns: 3 zones, 255 records
[info]    sync took 7.207120851s

@simcap
Copy link
Contributor

simcap commented Aug 4, 2017

@deinspanjer Thanks for running the latest head and reporting your findings! That is very useful.

So you had the OOM issue on your 16GB MBP. All 16GB used up! Yikes!

Indeed if you do not mind the profiling *.prof files would be useful (procedure explained above as you might know). I am wondering if we could try out the new Mozilla Send service to exchange the files. You would drop the URL in this issue. Also if you do not mind you can add to the issue the output of: /usr/bin/time --verbose awless sync -e

Many thanks!

@deinspanjer
Copy link
Contributor

We can try the Send service, you just better hope you are the first one to try downloading the file. :)
https://send.firefox.com/download/87978b4cf9/#kZU_-jw2kjsbuyww1Q1VLw
The zip contains the text output of the -e run which has some timing information in it, I'm sorry I didn't run it with time --verbose, but if I get a chance, I'll try to do so soon.

simcap added a commit that referenced this issue Aug 7, 2017
Avoid snapshotting datastore for querying when not
necessary when building relations

(issue #116)

* Go profiling:
- making Snapshotting not on top 5 CPU anymore (now
below top25)
- making Snaphotting not on top 5 Mem anymore (now
below top30)
@simcap
Copy link
Contributor

simcap commented Aug 8, 2017

@deinspanjer I got them ok through the Send Service. But the prof files do not contain any metrics. I got after unzipping:

$ ll -h *{prof,txt}
-rw-r--r-- 1 simon simon 6,6K août   4 13:00 cpu-sync.prof
-rw-r--r-- 1 simon simon 6,7K août   4 13:00 mem-sync.prof
-rw-r--r-- 1 simon simon 1,1K août   4 13:01 sync-profile-stdout.txt

Doing a go tool pprof on the files I get empty metrics. For the sync-profile-stdout.txt I think there was confusion, I was talking about running the command preceded with /usr/bin/time --verbose.

... anyway lets leave the prof files aside.

To validate that we have a improvement and a fix for you, simply run:

/usr/bin/time --verbose `which awless` sync -e

... and outptut the result here. We will basically check that instead of using 8GB of RAM we have a lower Max Resident Size memory.

(you can even pull the latest master before doing that to get the latest improvement, see commit above)

Thanks.

@deinspanjer
Copy link
Contributor

Sorry, it seems the BSD (OSX) version of time doesn't support the verbose flag.

Here is the results of the -e run with the latest head:

➜  awless git:(master) /usr/bin/time awless --aws-profile default sync -e
[verbose] loading AWS session with profile 'default' and region 'us-east-1'
[extra]   no valid cached credentials, getting new credentials
[info]    running sync for region 'us-east-1'
[verbose] sync: *disabled* for service monitoring
[verbose] sync: *disabled* for resource storage[s3object]
[extra]   sync: fetched lambda service took 244.386895ms
[extra]   sync: fetched cdn service took 247.503701ms
[extra]   sync: fetched cloudformation service took 283.385906ms
[extra]   sync: fetched messaging service took 411.939093ms
[extra]   sync: fetched dns service took 705.141725ms
[extra]   sync: fetched storage service took 1.314146758s
[extra]   sync: fetched infra service took 2.506289269s
[extra]   sync: fetched access service took 2.742180083s
[info]    -> access: 18 users, 7 groups, 23 policies, 2 accesskeys, 2 instanceprofiles, 17 roles
[info]    -> lambda: 3 functions
[info]    -> cdn: 1 distribution
[info]    -> cloudformation: 3 stacks
[info]    -> messaging: 8 topics, 2 subscriptions, 2 queues
[info]    -> dns: 255 records, 3 zones
[info]    -> storage: 0 s3object, 126 buckets
[info]    -> infra: 0 loadbalancer, 0 listener, 0 natgateway, 3 routetables, 62 images, 17 repositories, 0 containercluster, 0 container, 0 containertask, 5 keypairs, 0 importimagetask, 0 scalingpolicy, 0 targetgroup, 1 database, 1 scalinggroup, 65 instances, 11 subnets, 6 availabilityzones, 1 launchconfiguration, 40 securitygroups, 127 volumes, 38 elasticips, 3 internetgateways, 4573 snapshots, 3 vpcs, 5 dbsubnetgroups, 0 containerinstance
[info]    sync took 4.113720758s
        5.01 real         5.15 user         1.33 sys

@simcap
Copy link
Contributor

simcap commented Aug 8, 2017

@deinspanjer Ok. Too bad. I am on macOS 10.12.1 and I got the flag --verbose for /usr/bin/time.

Anyway to close the issue and validate a reduction of memory consumption (even if I know there is a important one), I wanted demonstrative figures. In your case, it seems that *.prof files or /usr/bin/time --verbose is not helping but at least we know awless is syncing well for you now.

I will re-ping the other users to have them re-sync with the new version and make sure it is working for them as well.

Thanks.

@deinspanjer
Copy link
Contributor

Ugh, don't know what is up with my time. Maybe I overrode it with a brew coreutil or something?

Sorry my run isn't the demonstrative figure you were looking for, but I am certainly happen that it went from not working at all and failing after almost a minute to being nice and snappy with just a few seconds. :)

@cmcconnell1
Copy link

cmcconnell1 commented Aug 9, 2017

Somehow my ~/.awless data also seems to have gotten corrupt at some point.
Tried backing out to last stable version, etc. to no avail.

'Before purging ~/.awless dir, couldn't ever get past infra sync stage (CPU steadily increasing way out of control on the way to crashing system); after deleting the dir and its contents, awless is happy and functional again.

uname -a
Darwin foo 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun  4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64

pre purge:

awless -e sync
[verbose] loading AWS session with profile '<nil>' and region 'us-west-1'
[info]    running sync: fetching remote resources for local store
[extra]   sync: fetched lambda service took 657.955305ms
[extra]   sync: fetched messaging service took 730.175856ms
[extra]   sync: fetched cloudformation service took 789.867006ms
[extra]   sync: fetched cdn service took 2.127735293s
[extra]   sync: fetched dns service took 5.06643227s
[extra]   sync: fetched infra service took 7.235963111s
^C^C^C^C

post puge:

rm ~/.awless
time  awless -e sync --access
[verbose] loading AWS session with profile 'default' and region 'us-west-1'
[info]    running sync: fetching remote resources for local store
[extra]   sync: fetched access service took 4.102510104s
[info]    -> access: 14 groups, 28 roles, 1 accesskey, 35 users, 39 policies
[info]    sync took 4.11772139s

real	0m4.155s
user	0m0.357s
sys	0m0.057s
time awless -e sync
[verbose] loading AWS session with profile 'default' and region 'us-west-1'
[info]    running sync: fetching remote resources for local store
[verbose] sync: *disabled* for service monitoring
[verbose] sync: *disabled* for resource storage[s3object]
[extra]   sync: fetched cloudformation service took 321.981286ms
[extra]   sync: fetched cdn service took 540.884345ms
[extra]   sync: fetched lambda service took 695.038356ms
[extra]   sync: fetched messaging service took 711.331443ms
[extra]   sync: fetched access service took 976.478333ms
[extra]   sync: fetched dns service took 5.954210667s
[extra]   sync: fetched infra service took 7.327878909s
[extra]   sync: fetched storage service took 7.671635252s
[info]    -> access: 39 policies, 1 accesskey, 28 roles, 35 users, 14 groups
[info]    -> dns: 387 records, 25 zones
[info]    -> infra: 1 internetgateway, 9 elasticips, 5 targetgroups, 0 containercluster, 13 subnets, 4 loadbalancers, 3 databases, 0 container, 0 importimagetask, 1 dbsubnetgroup, 10 repositories, 2 availabilityzones, 1 vpc, 2 scalingpolicies, 80 instances, 0 natgateway, 7 listeners, 62 securitygroups, 40 images, 557 snapshots, 11 scalinggroups, 0 containerinstance, 12 routetables, 128 volumes, 11 launchconfigurations, 0 containertask, 6 keypairs
[info]    -> storage: 24 buckets, 0 s3object
[info]    -> cloudformation: 3 stacks
[info]    -> cdn: 0 distribution
[info]    -> lambda: 0 function
[info]    -> messaging: 8 topics, 3 subscriptions, 1 queue
[info]    sync took 7.833538888s

real	0m8.437s
user	0m40.595s
sys	0m3.936s

Thanks!

@simcap
Copy link
Contributor

simcap commented Aug 9, 2017

@cmcconnell1 I notice on pre-purge that your profile was <nil> meaning something went wrong on loading AWS env data.

Anyway happy it is working now.

Side note on awless dirs removal if needed:

  • rm ~/.awless remove all cloud data and awless config. Will go through a first install again on next run.
  • rm ~/.awless/awless.db remove awless config only. Will go through a first install again on next run.
  • rm ~/.awless/aws/rdf remove cloud data only.

@thoellrich
Copy link
Author

And one more from OP. Great progress guys! Thanks! Should we close it?

~/go/src/github.com/wallix/awless$ rm -rf ~/.awless && go version && git show-ref HEAD && ./awless --version && free && /usr/bin/time --verbose ./awless -e sync
go version go1.8.1 linux/amd64
7ed6ea57ceb7f2287b847af4aeda9273ffb1554b refs/remotes/origin/HEAD
awless version=v0.1.2
             total       used       free     shared    buffers     cached
Mem:      16427688   11711456    4716232       1664     544896    8290932
-/+ buffers/cache:    2875628   13552060
Swap:      1046524          0    1046524
Welcome to awless! Resolving environment data...

Found existing AWS region 'us-east-1'. Setting it as your default region.
[verbose] loading AWS session with profile 'default' and region 'us-east-1'
[extra]   no valid cached credentials, getting new credentials
[info]    Syncing new region 'us-east-1'
[verbose] sync: *disabled* for service monitoring
[verbose] sync: *disabled* for resource storage[s3object]
[extra]   sync: fetched lambda service took 441.862818ms
[extra]   sync: fetched cloudformation service took 460.366438ms
[extra]   sync: fetched dns service took 808.839415ms
[extra]   sync: fetched messaging service took 1.036584999s
[extra]   sync: fetched infra service took 1.040123339s
[extra]   sync: fetched cdn service took 5.394647883s
[extra]   sync: fetched access service took 11.118848225s
[extra]   sync: fetched storage service took 11.584488014s

All done. Enjoy!
You can review and configure awless with `awless config`

Now running: `awless sync`
[info]    running sync for region 'us-east-1'
[verbose] sync: *disabled* for service monitoring
[verbose] sync: *disabled* for resource storage[s3object]
[extra]   sync: fetched cdn service took 97.301351ms
[extra]   sync: fetched lambda service took 327.449264ms
[extra]   sync: fetched cloudformation service took 418.94218ms
[extra]   sync: fetched messaging service took 840.053734ms
[extra]   sync: fetched dns service took 939.850119ms
[extra]   sync: fetched infra service took 5.463647987s
[extra]   sync: fetched access service took 7.347877532s
[extra]   sync: fetched storage service took 12.277901076s
[info]    [I removed the counts - Tobias]
[info]    sync took 12.481333316s
        Command being timed: "./awless -e sync"
        User time (seconds): 1.69
        System time (seconds): 1.66
        Percent of CPU this job got: 13%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:25.12
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 102008
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 28051
        Voluntary context switches: 20621
        Involuntary context switches: 1217
        Swaps: 0
        File system inputs: 0
        File system outputs: 6392
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@simcap
Copy link
Contributor

simcap commented Aug 9, 2017

Thanks @thoellrich .

If after those fixes you find now awless usable in your daily tasks we will close indeed this issue since you had the most serious OOM.

@thoellrich
Copy link
Author

No need to keep it open, because I no longer see the OOM. If I find other stuff I'll open another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants