Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage and Garbage Collector #7

Closed
menardorama opened this issue Jun 1, 2017 · 14 comments
Closed

Memory usage and Garbage Collector #7

menardorama opened this issue Jun 1, 2017 · 14 comments

Comments

@menardorama
Copy link

Hi
after fixing the long historical data issue (even if I would'nt have done like that) I am facing another issue.

The process consume all memory on the server while waiting for the result of the db.

It's like if the the GC is not working at all.

The result for me is that I can't transfert the data as it consume the whole 32 GB of RAM and OOM Killer kill the process.

@zensqlmonitor
Copy link
Owner

zensqlmonitor commented Jun 1, 2017

I made different loads during the day and the memory footprint of the process was less than 2GB for the 4 tables with 100k/per batch and 200k/per batch .
It looks like the GC properly works.
Again, please create the indexes to avoid to scan the complete table and think about the separation of concerns: the ETL process has to be in a separate server for avoiding resources contention.

@zensqlmonitor
Copy link
Owner

zensqlmonitor commented Jun 2, 2017

Here you are some stats for a run of 12 hours:

influxdb-zabbix process

  • RSS: Avg 194 MB - Max 337 MB
  • VSZ: Avg 594 MB - Max 785 MB
  • I/O Reads/Writes: 0
  • Threads: 9

postgresql backend

  • Fetched per sec: Avg 7 K - Max 585 K
  • Blocks I/O Reads: Avg 114 K - Max 43 mio
  • Blocks I/O Hits : Avg 1,9 mio- Max 56 mio
  • Temporary files bytes: Avg 837 MB - Max 18 GB

@menardorama
Copy link
Author

Hi

Here is after 3 minutes :
capture d ecran 2017-06-02 a 11 49 51

@zensqlmonitor
Copy link
Owner

what's your configuration ?

@menardorama
Copy link
Author

Basically the server have the same specs

  • 2 Xeon 2.§Ghz
  • 32Go of RAM
  • Postgresql 9.6 with partitionning

Latest version of influxdb
Centos 7

@zensqlmonitor
Copy link
Owner

zensqlmonitor commented Jun 2, 2017

which GO version ? try to update with latest version.
About your config file: input rows / batch ?
Have you created the indexes ?

@menardorama
Copy link
Author

I am using Go 1.7.4 and the index has not been created.

Regarding the config :
inputrowsperbatch=50000
outputrowsperbatch=50000
interval=60

But now regarding the indexes, it's just a good to have and should . not have any impact on the memory consumption on the client side.

The thing is I have a 500 GB zabbix DB (most of the data is for the history table and I don't want to add more indexing weight.

Having a limit on the row to return is a workaround but not the real solution (on a DBA part...) for me.
A moving window based on the clock would be more light on the db side (as the ORDER BY force to get all the results in memory or worse in a temp file).

For a one year of historical the overload on the DB is just to much

@zensqlmonitor
Copy link
Owner

@menardorama a moving window based on number of days is now implemented

@menardorama
Copy link
Author

Hi

Thanks for your feedback, it's much better now on the DB side.

But there is still something wrong, I think I pointed out but I am not enough good in Go to propose a patch.

I'll try to explain my observation.

From what I understand, you app works in two steps

  • Extract from the Zabbix DB
  • Process all rows retreived from the sql query and store it in memory
  • Load it in InfluxDB.

My concern is that I have 57 millions of rows per week, and it does not fit in memory.

Another approach could be to process a batch of rows (at the fetch level) and insert them in influxdb.

This would be more scalable instead of waiting for the full fetch.

Another idea would be to spool the result to a tempfile at the fetch level and pass the filename to the influxdb processor.

Once again I'm sorry I am not good enough in Go to do it.

What do you think ?

@zensqlmonitor
Copy link
Owner

My concern is that I have 57 millions of rows per week, and it does not fit in memory.

You can now split the dataset to multiple dataset with the conf paramaters daysperbatch.
For example, in the configuration file, you set:

startdate="2017-01-01T00:00:00"
daysperbatch=15

=> process will start for data with timestamp between ]2017-01-01 and 2017-01-16[ and it will continue with the increment of 15 days, [2017-01-16 to 2017-01-31[, etc
Like this and related to the number of rows you got per days, you can adjust the batch number.

Have you tested the last version ?

@menardorama
Copy link
Author

Yes my comment was regarding the latest version.

And I can't put more RAM on my server (32GB already)

Setting daysperbatch=1 help a bit but it's already 57 Millions of rows and it consume all memory

@zensqlmonitor
Copy link
Owner

57 Millions for 1 day ? you just said it was for 1 week. Anyway, that's huge...
I can't do better and sorry but won't spool the result in disk.

@menardorama
Copy link
Author

menardorama commented Jun 9, 2017 via email

@zensqlmonitor
Copy link
Owner

zensqlmonitor commented Jun 9, 2017

Let's do it more granular.
I've just commited a moving window based in hours -> new parameter: hours per batch.
@menardorama could you please have a look ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants