Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about retention/data overwrite - no issue #4

Closed
HSkul opened this issue Nov 30, 2022 · 25 comments
Closed

Question about retention/data overwrite - no issue #4

HSkul opened this issue Nov 30, 2022 · 25 comments

Comments

@HSkul
Copy link

HSkul commented Nov 30, 2022

First of all thanks a lot of this code. I have been using this on a Raspberry Pi with Influxdb and Grafana and it has been working really well (I copied your graphs, really liked the way you presented the data). I'm in the process of moving the setup to a home-built NAS running OpenMediaVault and putting InfluxDB and Grafana into Docker containers so I want to make sure I do this properly for the long run.

On the RPi I have been running ForecastMetrics as a cron job every hour. It was just something I figured would work. But now that I'm reading up on this a little more I'm confused as to how much data will accumulate in InfluxDB since it doesn't look like old data is being overwritten and there is no retention policy (I'm using 1 year in InfluxDB). Originally when I was setting this up it didn't look like unnecessary data was being accumulated but now I'm not sure. What is your suggestion on how to populate an InfluxDB database?
If the answer is just a cron job I may see if it makes sense to put it inside a container (not sure why, just to see if I can do it as I'm learning to use containers). Any obvious issues with that?

Thanks

H

@tedpearson
Copy link
Owner

tedpearson commented Dec 1, 2022

Hi @HSkul! I'm so happy that you're making use of this project. I'm still using it as my primary forecasting tool, and plan to maintain it indefinitely.

As for the best way to run this, I use an hourly Systemd timer, which is pretty similar to a cronjob just with better logging via journalctl and easier disabling/enalbing.

You ask how data will accumulate in InfluxDB. Well, you are right to ask. You will accumulate a new series every hour, 24 per day, or 8760 series per year. In InfluxDB, high series cardinality can be an issue especially for memory usage. If you would find adding support for retention policies back in as an optional flag useful, I could look into it for you. (I could also look into an optional flag to overwrite data on a single time series if that would be interesting.)

Insert plug here for VictoriaMetrics, a drop-in replacement for InfluxDB or Prometheus. It's storing more time series in smaller space with less memory and cpu usage. I switched about 1 year ago after InfluxDB corrupted its database and crashed on every startup on my 4GB Pi 4.

Let me know your thoughts and we can continue this conversation.

Ted

@tedpearson
Copy link
Owner

Related: just added my VictoriaMetrics dashboard definition to the repo, since I noticed I only had the Influx version there.

@HSkul
Copy link
Author

HSkul commented Dec 1, 2022

Ted,
Fantastic. It would be great if either data overwrite or retention could be implemented. I always like to have things 'clean'. But only if it isn´t too much of a work.
The comment on VictoriaMetrics and InfluxDB is very helpful. I was wondering why you switched. Based on a quick search it looks like InfluxDB has a propensity for corruption (not sure if it is significant or not). I'm assuming writing to VictoriaMetric is identical to InfluxDB? I have scripts in Homekit writing various values (temperature, humidity, radon, motion) into InfluxDB and I'm planning an expansion. But now would be a good time to switch to VictoriaMetrics since I'm moving the database (and Grafana) to an (over specified) OpenMediaVault NAS.

Thanks,

Hjalti

@tedpearson
Copy link
Owner

I'll have a look then, I've created a new issue to track it.

You indeed can write identical influx line protocol to VictoriaMetrics, plus it also can scrape prometheus endpoints, and more.

@tedpearson
Copy link
Owner

Looks like in InfluxDB 2.x, each "bucket" has its own retention period: https://docs.influxdata.com/influxdb/v2.5/reference/internals/data-retention/#bucket-retention-period

So I don't think there's any need to add retention policy support, since you should be able to configure whatever bucket retention needed already.

I'll look into overwriting next.

tedpearson added a commit that referenced this issue Dec 2, 2022
@tedpearson
Copy link
Owner

I've added a new config option, overwrite_data that when set to true, will again result in a single metric for forecast data, with new forecasts overwriting the old.

I'm going to also upgrade the influx client to 2.x before releasing a new version.

@tedpearson
Copy link
Owner

You know, I should point out about VictoriaMetrics - it only by default supports metrics up to 2 days into the future, I actually maintain my own fork with that modified to 30 days in the future.

Issue: VictoriaMetrics/VictoriaMetrics#827 (comment)
My fork: https://github.com/tedpearson/VictoriaMetrics

@HSkul
Copy link
Author

HSkul commented Dec 3, 2022

Thanks a lot for all the information. This is really helpful. I'm actually running InfluxDB 1.8 at the moment, as the authentication seemed to be simpler (can avoid it). I have InfluxDB 1.8 and Grafana running in the containers and I'm working on figuring out how to run forecastmetrics as a cron job in a container as well (which port does it use?). I think what I'm going to do is to try to get this going so I can free up my Raspberry PI 2 for other uses (wall based Google calendar with Grafana graphs, my RPi 1b is struggling there). Then later I can rather easily get a separate container running with either InfluxDB 2 or VictoriaMetrics and swap by switching external ports.

Thanks again

@tedpearson
Copy link
Owner

Good news is that even when using the Influx 2.x client, authentication still works with Influx 1.8 by setting the "authentication token" to "username:password", leaving the organization blank, and setting the bucket to "database[/retention-policy]". https://github.com/influxdata/influxdb-client-go#influxdb-18-api-compatibility

ForecastMetrics doesn't have any incoming connections - it makes http GET requests to NWS/Visualcrossing if enabled, and then makes http POST requests to influx/VM.

Sounds like you have interesting projects! Enjoy :)

@tedpearson
Copy link
Owner

One other comment about using VictoriaMetrics - with the upgrade to the influx 2.x client it will require using vmauth (a simple auth proxy) with it, because VM doesn't support the "Auhtorization: Token ..." header: VictoriaMetrics/VictoriaMetrics#1897

@tedpearson
Copy link
Owner

Just released v3.2.0 with two new features:

  • overwrite_data: true option available in the config yaml. This will only write one forecast series instead of one per hour, and overwrite the previous forecast data each time.
  • Influx v2 client. The ForecastMetrics config has changed to match the new client:
    • org: "" - don't set this for influx 1.8
    • auth_token: user:pass - set it like this for influx 1.8
    • bucket: database[/retention-policy] - set to your db name if you want to just use the default autogen policy

Let me know if this release helps or you have any trouble.

@HSkul
Copy link
Author

HSkul commented Dec 3, 2022

This is great. Let me take it for a spin. I got my container running with v3.1.2 but the data is looking a bit different from the data that is being pulled by the RPi (at least what is stored in the InfluxDB container). I'll try v3.2.0 and see if there is any change with the overwrite. I'll keep you posted.

Thanks again.

@tedpearson
Copy link
Owner

I forgot to ask - do you happen know what version you were using on the RPi? There's no embedded version or anything - you'd have to remember what you downloaded or compare the bytes, sorry

@HSkul
Copy link
Author

HSkul commented Dec 4, 2022

Something is not right. I'm pretty sure I did everything the same when I built the image with v3.2.0 as what I did with v3.1.2 and now I'm getting this when I run forecastmetrics:

2022/12/03 23:24:27 Looking up NWS location
2022/12/03 23:24:27 Getting NWS forecast
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x72433e]

goroutine 1 [running]:
main.App.RunForecast({0xc0001807b0, {0x861a18, 0xc00005e8c0}, {0xc000180600}, {{0xc0001804e0, 0x1, 0x1}, {{0xc00001a5e8, 0x16}, {0xc00001ce70, ...}, ...}, ...}}, ...)
        /home/runner/work/ForecastMetrics/ForecastMetrics/main.go:144 +0x3be
main.main()
        /home/runner/work/ForecastMetrics/ForecastMetrics/main.go:103 +0x673

Here is my config file. I'm wondering if it is not set up correctly:

locations:
  - name: My Home
    latitude: xx.xxxxxx
    longitude: -yy.yyyyy

influxdb:
  # this is the IP of the InfluxDB container
  host: http://172.19.0.2:8086
  # for influx 1.8/VictoriaMetrics, use "user:password"
  auth_token: influxuser:psswd
  # for influx 1.8/VictoriaMetrics, use blank
  org: ""
  # for influx 1.8/VictoriaMetrics, use "database" or "database/retention-policy"
  bucket: myDBname

forecast:
  measurement_name: forecast

astronomy:
  enabled: true
  measurement_name: astronomy

sources:
  enabled:
    - nws
    - visualcrossing
  visualcrossing:
    key: __my_key__

http_cache_dir: /var/lib/forecastmetrics/cache
state_dir: /var/lib/forecastmetrics/state
# overwrite_data will write a single series of forecast data,
# instead of a new series each time the program runs. This works
# with influxdb but not with VictoriaMetrics.
overwrite_data: true

Thanks,
H

@tedpearson
Copy link
Owner

Nope, that's a bug I didn't notice because gasp I didn't try it with overwrite_data set to true. Releasing 3.2.1 now.

@tedpearson
Copy link
Owner

Added version information to the binary :)

@HSkul
Copy link
Author

HSkul commented Dec 4, 2022

OK it is working now. Or at least the same as v.3.1.2 I ran in the container. Still some issue with the data coming in compared to the RPi. My cloud cover is coming in at 100% (7 days out) and it never was predicted to be 100%. The humidity and temperature doesn't look the same as what the RPi is pulling in.
The version running on the RPi was installed in February and the size is 8012066 bytes. I think it is 3.1.2?

I need to have it run overnight and see what it looks like tomorrow.

Thanks!

@tedpearson
Copy link
Owner

I just started running the latest version and am also seeing some weirdness where refreshing the dashboard has metrics appearing and disappearing. Will need to investigate further tomorrow.

@tedpearson
Copy link
Owner

So I don't think there's anything inherently wrong here, at least with the non-overwrite version. The forecast metrics I'm generating still seem to be good. On my VictoriaMetrics system, I'm no longer getting the database as part of the metric, because it's now sent on the query string like /api/v2/write?bucket=ecobee&org=&precision=ns. So that's why my metrics were appearing and disappearing. I can fix that easy.

Any further thoughts on what you're seeing on your side?

@HSkul
Copy link
Author

HSkul commented Dec 6, 2022

Here are snippets of the graphs of two data streams:
Original running on RPi:
image

New one running in Docker container:
image

The top is the temperature and the scale is 0 to 35C. Both are actually showing warmer that what it is right now (I'm in southern PA, close to Philly, I used my coordinates from Google maps, I'm wondering if that is not correct?).

Anyway, let me play with this some more this weekend. I'm starting to think I messed up somewhere along the way.

H

@tedpearson
Copy link
Owner

Philly coords would look something like

locations:
  - name: Philly Phanatic
    latitude: 39.9056
    longitude: -75.1666

So if you're close to that you're probably good.

The cloud cover does look different between the two, you're sure they're set to the same location?

If the yellow/green dot graphs are temperature, they look more like wind directions to me. Temperatures don't jump like that but wind directions would jump from 350 degrees to 0 degrees.

@HSkul
Copy link
Author

HSkul commented Dec 11, 2022

Long story. Figured out that there was something wrong with InfluxDB database (probably because I copied over the folders and data from the Pi do the docker container). So I switched to Victoriametrics, got it set up in container and just got everything going (ForecastMetrics running in a separate container, values from Homekit being sent to VM periodically and Grafana now plotting data). I need to let it run for a while to ensure everything is working right. But my first impression is that the forecast data now makes sense. I don't now what the issue was with InfluxDB but everything looks right now., both temperature and precipitation probability and cloud cover (the data streams I have been using). The only thing is that I'm only getting ~3 days worth of forecast. Like I said, I'm letting it run for a while to see how this works out. I also need to figure out minor things like changing the retention to 1 year instead of the default of 1 month.

@tedpearson
Copy link
Owner

Hey that's awesome! As for 3 days of forecast, I can tell you what's up there - VM only supports 2 days into the future as I mentioned above. I rebuild it every once in a while allowing 30 days into the future, feel free to install a binary from here: https://github.com/tedpearson/VictoriaMetrics/releases/tag/v1.84.0-54741f6

Once you do that you should be good to go (other than precipitation amount forecasts, which are only forecast about 3 days out by NWS).

For retention, it's just a flag to the VM process: -retentionPeriod 1y https://docs.victoriametrics.com/#retention

@HSkul
Copy link
Author

HSkul commented Dec 12, 2022

Ahh yes, got it. I will recreate the VM container with your binary. Thanks a lot.

So far it looks like it is working except the Homekit automations but that is a Homekit issue (common).

Thanks again.

@HSkul
Copy link
Author

HSkul commented Dec 13, 2022

The victoriametrics binary that I got from the link above is not running on the plain vanilla Debian image. I get the following error when I log into the container:

victoria-metrics: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found

I'll try the Ubuntu image and if that doesn't work, I guess I'll have to compile it.

Edit: NM, it works with the Ubuntu image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants