Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Drop retention policy before creating it #2051

Merged
merged 7 commits into from Jul 14, 2020
Merged

Drop retention policy before creating it #2051

merged 7 commits into from Jul 14, 2020

Conversation

mkpankov
Copy link
Contributor

@mkpankov mkpankov commented Jul 7, 2020

Otherwise it fails to setup Influx if it's already setup

Signed-off-by: Michael Pankov work@michaelpankov.com


This change is Reviewable

Signed-off-by: Michael Pankov <work@michaelpankov.com>
@mkpankov mkpankov added the bug label Jul 7, 2020
@mkpankov mkpankov added this to the IML EX V3 milestone Jul 7, 2020
@mkpankov mkpankov requested review from jgrund, utopiabound and a team July 7, 2020 11:35
@mkpankov mkpankov self-assigned this Jul 7, 2020
@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

Why don't we need the same for CREATE DATABASE?

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

Looks like here begins a discrepancy with docker :)

When changing any of the following also change: docker/influxdb/setup-influxdb.sh

Signed-off-by: Michael Pankov <work@michaelpankov.com>
@mkpankov
Copy link
Contributor Author

mkpankov commented Jul 7, 2020

@ip1981 fixed the docker.

We don't need it for CREATE DATABASE because it doesn't fail when DB already exists, apparently.

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

I don't see it failing. What do I do wrong?

[root@adm ~]# influx -execute 'CREATE RETENTION POLICY "long_term" ON "iml_stats" DURATION 1w REPLICATION 1 SHARD DURATION 5d'
[root@adm ~]# echo $?
0
[root@adm ~]# influx -execute 'CREATE RETENTION POLICY "long_term" ON "iml_stats" DURATION 1w REPLICATION 1 SHARD DURATION 5d'
[root@adm ~]# echo $?
0

@mkpankov
Copy link
Contributor Author

mkpankov commented Jul 7, 2020

I don't know, for me it looked like this

    adm: ++ chroma-config setup admin lustre localhost --no-dbspace-check
    adm: Starting setup...
    adm: Clearing all sessions...
    adm: Starting InfluxDB...
    adm: Creating InfluxDB database...
    adm: Traceback (most recent call last):
    adm:   File "/bin/chroma-config", line 10, in <module>
    adm:     
    adm: load_entry_point('iml-manager==6.1.0', 'console_scripts', 'chroma-config')()
    adm:   File "/usr/share/chroma-manager/chroma_core/lib/service_config.py", line 1029, in chroma_config
    adm:     errors = service_config.setup(username, password, ntpserver, check_db_space)
    adm:   File "/usr/share/chroma-manager/chroma_core/lib/service_config.py", line 767, in setup
    adm:     self._setup_influxdb()
    adm:   File "/usr/share/chroma-manager/chroma_core/lib/service_config.py", line 347, in _setup_influxdb
    adm:     settings.INFLUXDB_IML_STATS_DB, settings.INFLUXDB_IML_STATS_LONG_DURATION,
    adm:   File "/usr/share/chroma-manager/chroma_core/lib/util.py", line 192, in try_shell
    adm:     raise CommandError(cmdline, rc, out, err)
    adm: chroma_core.lib.util.CommandError
    adm: : Command failed: ['influx', '-database', 'iml_stats', '-execute', 'CREATE RETENTION POLICY "long_term" ON "iml_stats" DURATION 52w REPLICATION 1 SHARD DURATION 5d']
    adm:     return code 1
    adm:     stdout: ERR: retention policy already exists
    adm: 
    adm:     stderr: retention policy already exists
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

Oh, I get it. Somehow DURATION is involved.

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

So, you can't create a policy with different duration.

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

One concern remains:

Dropping a retention policy will permanently delete all measurements and data stored in the retention policy.

It looks like the data might be lost accidentally.

@mkpankov
Copy link
Contributor Author

mkpankov commented Jul 7, 2020

Isn't it the same with CONTINUOUS QUERY? We drop them too

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

Isn't it the same with CONTINUOUS QUERY? We drop them too

We drop the queries, but not the data they produce, AFAIK:

Continuous queries (CQ) are InfluxQL queries that run automatically and periodically on realtime data and store query results in a specified measurement.

Copy link
Member

@jgrund jgrund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more investigation / clarification.

According to the influx docs (https://docs.influxdata.com/influxdb/v1.8/query_language/manage-database/#create-retention-policies-with-create-retention-policy)

A successful CREATE RETENTION POLICY query returns an empty response. If you attempt to create a retention policy identical to one that already exists, InfluxDB does not return an error. If you attempt to create a retention policy with the same name as an existing retention policy but with differing attributes, InfluxDB returns an error.

@ip1981
Copy link
Member

ip1981 commented Jul 7, 2020

I think we need CREATE or ALTER if already exists.

@mkpankov
Copy link
Contributor Author

mkpankov commented Jul 7, 2020

@jgrund I think we want to allow the attributes to be different. In my case it's probably because I was testing tunable retention policy locally, but I'd say this is useful overall.
If this drops produced data though it's not really viable.

@mkpankov
Copy link
Contributor Author

mkpankov commented Jul 7, 2020

Ok I'll try to determine if it's already present, and ALTER in that case.

@jgrund
Copy link
Member

jgrund commented Jul 7, 2020

Ok I'll try to determine if it's already present, and ALTER in that case.

What happens to existing data when a RP is altered?

@mkpankov
Copy link
Contributor Author

mkpankov commented Jul 8, 2020

Seems existing data is intact after altering RP

> show retention policies
name      duration  shardGroupDuration replicaN default
----      --------  ------------------ -------- -------
autogen   24h0m0s   2h0m0s             1        true
long_term 8736h0m0s 120h0m0s           1        false
> select * from "iml_stats"."long_term"."host"
name: host
time                host       mean_lnet_mem_used
----                ----       ------------------
1594121400000000000 mds1.local 1765505
1594121400000000000 mds2.local 1765505
1594121400000000000 oss1.local 1765505
1594121400000000000 oss2.local 1765505
1594123200000000000 mds1.local 1765505
1594123200000000000 mds2.local 1765505
1594123200000000000 oss1.local 1765505
1594123200000000000 oss2.local 1765505
1594125000000000000 mds1.local 1765505
1594125000000000000 mds2.local 1765505
1594125000000000000 oss1.local 1765505
1594125000000000000 oss2.local 1765505
1594126800000000000 mds1.local 1765505
1594126800000000000 mds2.local 1765505
1594126800000000000 oss1.local 1765505
1594126800000000000 oss2.local 1765505
1594195200000000000 mds1.local 1764875.2564102565
1594195200000000000 mds2.local 1765505
1594195200000000000 oss1.local 1765505
1594195200000000000 oss2.local 1765505
1594197000000000000 mds1.local 1765505
1594197000000000000 mds2.local 1765505
1594197000000000000 oss1.local 1765505
1594197000000000000 oss2.local 1765505
> alter retention policy long_term on iml_stats duration 1w
> show retention policies
name      duration shardGroupDuration replicaN default
----      -------- ------------------ -------- -------
autogen   24h0m0s  2h0m0s             1        true
long_term 168h0m0s 120h0m0s           1        false
> select * from "iml_stats"."long_term"."host"
name: host
time                host       mean_lnet_mem_used
----                ----       ------------------
1594121400000000000 mds1.local 1765505
1594121400000000000 mds2.local 1765505
1594121400000000000 oss1.local 1765505
1594121400000000000 oss2.local 1765505
1594123200000000000 mds1.local 1765505
1594123200000000000 mds2.local 1765505
1594123200000000000 oss1.local 1765505
1594123200000000000 oss2.local 1765505
1594125000000000000 mds1.local 1765505
1594125000000000000 mds2.local 1765505
1594125000000000000 oss1.local 1765505
1594125000000000000 oss2.local 1765505
1594126800000000000 mds1.local 1765505
1594126800000000000 mds2.local 1765505
1594126800000000000 oss1.local 1765505
1594126800000000000 oss2.local 1765505
1594195200000000000 mds1.local 1764875.2564102565
1594195200000000000 mds2.local 1765505
1594195200000000000 oss1.local 1765505
1594195200000000000 oss2.local 1765505
1594197000000000000 mds1.local 1765505
1594197000000000000 mds2.local 1765505
1594197000000000000 oss1.local 1765505
1594197000000000000 oss2.local 1765505

@ip1981
Copy link
Member

ip1981 commented Jul 8, 2020

I guess that's the point why ALTER exist. Meanwhile continuos queries cannot be altered, only dropped.

@jgrund
Copy link
Member

jgrund commented Jul 8, 2020

Ok, so we should move this patch to use alter on create failure due to existing RP then

Signed-off-by: Michael Pankov <work@michaelpankov.com>
Signed-off-by: Michael Pankov <work@michaelpankov.com>
@mkpankov mkpankov requested a review from jgrund July 10, 2020 06:33
@mkpankov
Copy link
Contributor Author

ALTER on CREATE failure implemented

docker/influxdb/setup-influxdb.sh Outdated Show resolved Hide resolved
chroma_core/lib/service_config.py Outdated Show resolved Hide resolved
Signed-off-by: Michael Pankov <work@michaelpankov.com>
@mkpankov mkpankov requested a review from jgrund July 14, 2020 07:31
Signed-off-by: Michael Pankov <work@michaelpankov.com>
@mkpankov mkpankov requested a review from a team July 14, 2020 10:59
@jgrund jgrund merged commit f5f980a into master Jul 14, 2020
@jgrund jgrund deleted the drop branch July 14, 2020 13:04
beevans pushed a commit to beevans/integrated-manager-for-lustre that referenced this pull request Aug 6, 2020
* Drop retention policy before creating it

Signed-off-by: Michael Pankov <work@michaelpankov.com>

* Do the same for docker

Signed-off-by: Michael Pankov <work@michaelpankov.com>

* Alter on error

Signed-off-by: Michael Pankov <work@michaelpankov.com>

* Remove extra format

Signed-off-by: Michael Pankov <work@michaelpankov.com>

* Don't compare error message

Signed-off-by: Michael Pankov <work@michaelpankov.com>

* Fix second script

Signed-off-by: Michael Pankov <work@michaelpankov.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants