Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Delete series for object storage #2780

Merged
merged 17 commits into from Dec 29, 2020

Conversation

Harshitha1234
Copy link
Contributor

@Harshitha1234 Harshitha1234 commented Jun 18, 2020

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Design proposal for delete series #1598

Verification

@Harshitha1234
Copy link
Contributor Author

cc @metalmatze @bwplotka :)

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments/suggestions but overall good work! Concise and easy to read. I don't think we should delve too much into the exact API contracts as that will most likely be done when you are going to code this. Also, I like the "undeletions" functionality. It's definitely a very nice touch!

@@ -269,7 +269,7 @@ func (c *memcachedClient) Stop() {
c.workers.Wait()
}

func (c *memcachedClient) SetAsync(ctx context.Context, key string, value []byte, ttl time.Duration) error {
func (c *memcachedClient) SetAsync(_ context.Context, key string, value []byte, ttl time.Duration) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the changes in this file are unrelated? 👁️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done by me in the PR #2775. I think this proposal branch has been created from that PR branch or something like this.

* **Perspectives to deal with Downsampling of blocks having tombstones:**
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs.
* **Block with tombstones and max duration still hasn’t passed:** Perform compaction.
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we elaborate more on what is this threshold?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This threshold is something that Prometheus does as well. We need to check for the exact value and maybe should want to link here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we want to start with the 5% that Prometheus has.
We want to follow up on that threshold on IRC/Matrix and ask why it was chosen. In the future, we can always be smarter about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this to the Prometheus community meeting, happening next week :)




1. External tool operating to perform deletions on the object storage.(Details still to be discussed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably unrealistic to have this since the tool would have to coordinate its actions with potentially more than 1 compactor running on a bucket. IMHO it's not worth considering this as a feasible option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think we should remove it from the proposal entirely?

* The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor to create a tombstone file, if there’s a match for the details entered. Afterwards the tombstone file is uploaded to object storage making it accessible to other components.
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request.
* Store Gateway masks the series on processing the tombstones from the object storage.
* **Perspectives to deal with Downsampling of blocks having tombstones:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's also worth noting that we shouldn't consider blocks for the deletion API operations that have a deletion-mark.json i.e. about to be completely deleted themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Definitely that's a good point

* **Block with tombstones and max duration still hasn’t passed:** Perform compaction.
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%)
* For undoing deletions of a time series there are two proposed ways
* API to undelete a time series - maybe delete the whole tombstones file?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a deletion happens during a compaction pass? In that case, we most likely should return an error to the caller, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compactor is continuously running compactions and return errors to the caller might result in a lot of errors all the time, which would be annoying.

For now, we are going to check how Prometheus handles this.




* Tombstones should be append only, so that we can solve these during compaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solve these ... what is "these"? :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tombstones, will update.



* Tombstones should be append only, so that we can solve these during compaction.
* We don’t want to add this feature to the sidecar. The sidecar is expected to be kept lightweight.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 concur with this

docs/proposals/deletions_object_storage.md Outdated Show resolved Hide resolved

* We propose to implement deletions using the tombstones approach.
* A user is expected to enter the following details for performing deletions:
* **external labels**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why external labels and not metric labels?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that it absolutely correct. We actually want to use label matchers here, so that we can match metric labels as well as external labels.

* **start timestamp**
* **end timestamp** (start and end timestamps of the series data the user expects to be deleted)
* **maximum waiting duration for performing deletions** where a default value is considered if explicitly not specified by the user. (only after this time passes the deletions are performed by the compactor in the next compaction cycle)
* The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor to create a tombstone file, if there’s a match for the details entered. Afterwards the tombstone file is uploaded to object storage making it accessible to other components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor

This means the API is exposed by the compactor. I personally see the compactor more an "off-the-grid" component and I'm not sure that design-wise it's good that it will expose this API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones.
Additionally, by design, we thought that having the compactor "off-the-grid" and thus only accessible by an administrator solves a few permission problems right away.

We'll add this point to the proposal.

docs/proposals/deletions_object_storage.md Outdated Show resolved Hide resolved

* Add the deletion API (probably compactor) that only creates tombstones
* Store Gateway should be able to mask based on the tombstones from object storage
* A web UI for the deletion API?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this proposal smaller and remove the UI part form this proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, this proposal in itself is quite complex and we will remove the UI from for now. Later, we will follow up on this.

@@ -269,7 +269,7 @@ func (c *memcachedClient) Stop() {
c.workers.Wait()
}

func (c *memcachedClient) SetAsync(ctx context.Context, key string, value []byte, ttl time.Duration) error {
func (c *memcachedClient) SetAsync(_ context.Context, key string, value []byte, ttl time.Duration) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done by me in the PR #2775. I think this proposal branch has been created from that PR branch or something like this.

@Harshitha1234 Harshitha1234 changed the title Proposal: Delete series for object storage [WIP] Proposal: Delete series for object storage Jun 24, 2020
@@ -0,0 +1,86 @@
# Delete series for object storage

**Date:** <2020-06-17>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to have certain format, please other proposals, something like:

---
title: Thanos Remote Write
type: proposal
menu: proposals
status: accepted
owner: brancz
---

Copy link
Member

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Harshitha1234, @bwplotka and I went through these comments during our weekly meeting. :)


* We propose to implement deletions using the tombstones approach.
* A user is expected to enter the following details for performing deletions:
* **external labels**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that it absolutely correct. We actually want to use label matchers here, so that we can match metric labels as well as external labels.

* **start timestamp**
* **end timestamp** (start and end timestamps of the series data the user expects to be deleted)
* **maximum waiting duration for performing deletions** where a default value is considered if explicitly not specified by the user. (only after this time passes the deletions are performed by the compactor in the next compaction cycle)
* The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor to create a tombstone file, if there’s a match for the details entered. Afterwards the tombstone file is uploaded to object storage making it accessible to other components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones.
Additionally, by design, we thought that having the compactor "off-the-grid" and thus only accessible by an administrator solves a few permission problems right away.

We'll add this point to the proposal.

docs/proposals/deletions_object_storage.md Outdated Show resolved Hide resolved
* **Perspectives to deal with Downsampling of blocks having tombstones:**
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs.
* **Block with tombstones and max duration still hasn’t passed:** Perform compaction.
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we want to start with the 5% that Prometheus has.
We want to follow up on that threshold on IRC/Matrix and ask why it was chosen. In the future, we can always be smarter about this.

* **Block with tombstones and max duration still hasn’t passed:** Perform compaction.
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%)
* For undoing deletions of a time series there are two proposed ways
* API to undelete a time series - maybe delete the whole tombstones file?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compactor is continuously running compactions and return errors to the caller might result in a lot of errors all the time, which would be annoying.

For now, we are going to check how Prometheus handles this.




* Tombstones should be append only, so that we can solve these during compaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tombstones, will update.


* Add the deletion API (probably compactor) that only creates tombstones
* Store Gateway should be able to mask based on the tombstones from object storage
* A web UI for the deletion API?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, this proposal in itself is quite complex and we will remove the UI from for now. Later, we will follow up on this.

@Harshitha1234
Copy link
Contributor Author

Thank you for the feedback all the points were really worth noting 💪

@Harshitha1234
Copy link
Contributor Author

@Harshitha1234 Harshitha1234 force-pushed the proposal-deletions branch 3 times, most recently from 4c95b50 to 0dbf838 Compare July 8, 2020 10:48
Harshitha1234 and others added 6 commits July 8, 2020 16:42
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
* **Why Compactor**? : The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones.
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request.
* Store Gateway masks the series on processing the tombstones from the object storage.
* **Perspectives to deal with Downsampling of blocks having tombstones:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**Perspectives to deal with Downsampling of blocks having tombstones:** Unfinished?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you mean Compactions maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! 🤦

* **Why Compactor**? : The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones.
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request.
* Store Gateway masks the series on processing the tombstones from the object storage.
* **Perspectives to deal with Downsampling of blocks having tombstones:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Perspectives to deal with Downsampling of blocks having tombstones:**
* **Perspectives to deal with `compaction` of blocks having tombstones:**

* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request.
* Store Gateway masks the series on processing the tombstones from the object storage.
* **Perspectives to deal with Downsampling of blocks having tombstones:**
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs.
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the compaction occurs.

* **label matchers**
* **start timestamp**
* **end timestamp** (start and end timestamps of the series data the user expects to be deleted)
* **maximum waiting duration for performing deletions** where a default value is considered if explicitly not specified by the user. (only after this time passes the deletions are performed by the compactor in the next compaction cycle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am missing what this is, can we reword?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this feature to the future work :)

Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
@Harshitha1234
Copy link
Contributor Author

Made new changes as per the discussions in my weekly meeting with @metalmatze @bwplotka. :)

TODOs from our meeting:

  • Add more info about what we do with compactions
  • What to do if compaction level is max and there are tombstones
  • Postpone undeletes feature & max waiting period feature considered while performing actual deletions & separate component to later proposal
  • Keep design clear what’s now what’s later.
  • Downsampling : copy tombstones

Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi everyone! Thanks for this proposal, it seems super well thought out! Did leave some thoughts around the proposed approach, which I hope are useful :)

Also, I have some ideas on how to improve the structure, I see we are mixing the alternatives in the "Proposed Approach" section and also having a separate "Alternatives" section. Can we move all the info regarding alternatives to the "Alternatives" section, and keep the approach, problems with approach and action plan together?

This way, we can read everything about each approach separately, but I am not super familiar with the Thanos style guide so feel free to ignore my suggestions :)

* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request.
* Store Gateway masks the series on processing the tombstones from the object storage.
* **Perspectives to deal with Compaction of blocks having tombstones:**
* **Block with tombstones** Have a threshold to perform deletions on the compacted blocks ([In Prometheus](https://github.com/prometheus/prometheus/blob/f0a439bfc5d1f49cec113ee9202993be4b002b1b/tsdb/compact.go#L213), the blocks with big enough time range, that have >5% tombstones, are considered for compaction.) We solve the tombstones, if the tombstones are greater than than the threshold and then perform compaction. If not we attach the tombstone file to the new block. If multiple blocks are being compacted, we merge the tombstone files of the blocks whose threshold is not met.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not we attach the tombstone file to the new block. If multiple blocks are being compacted, we merge the tombstone files of the blocks whose threshold is not met.

Hrm, curious why this is different from how Prometheus does it. In Prometheus if we are compacting multiple blocks together, then we always remove the data regarding tombstones, regardless of the the threshold. https://github.com/prometheus/prometheus/blob/348ff4285ffa59907c9d7fd7eb3cb7d748f42758/tsdb/compact.go#L777-L806

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think the 5% rule should apply only for blocks that have a max level of compaction. Otherwise, all should be applied during compaction as it's essentially almost free to apply deletions during compaction.

The only caveat was about some delay time for tombstones to ensure undelete. This functionality was hower decided to be handled in later efforts, right?

* **Blocks without tombstones:** Downsampling happens...

##### Problems during implementations and possible ideas to solve :
During the implementation phase of this proposal one of the first problems we came across was that we had to pull the index of all the blocks for creating or appending a tombstone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a problem? This is what we need to do if we get a query with the matchers as well right?

Copy link
Member

@bwplotka bwplotka Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, Compactor is not responsible for that. It does not load or need to load all the blocks in the system in order to craft tombstones. That's why we cannot use Prometheus format tombstone and have to figure out something else. If we would required to load indexes for ALL blocks in order to create deletion request - this will never scale.

We need to delegate logic of figuring our which series ID to delete in leafs for querying and during compaction for rewrite.

The tricky part is to know which blocks have the data to be deleted. We thought about simple Series call with skipChunk options

* **Idea 3:** If we want to have an API for deletions, one way is to have the Store API configured in such a way that the compactor can use it to check for a match.
*Edge case:* Do we rebuild all the rows of a tombstone? Because we need to be aware that the tombstone is being included(or not) in the given Store API call.

*We're starting with (2) Idea 2*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Are we implementing (2) Idea 2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's be clear about this.


*We're starting with (2) Idea 2*

#### Considerations :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what this considerations sections refer to. Is this for the alternative (2 Idea 2) or for the original proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intended for the original proposal :)

docs/proposals/deletions_object_storage.md Show resolved Hide resolved
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, amazing! Thanks for massive input @pracucci @gouthamve on your side. Hopefully we can make something that will be useful for Cortex as well, so let's collab 🤗

## Proposed Approach

* We propose to implement deletions using the tombstones approach.
* To start off we use the Prometheus tombstone file format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to point out why this is not possible, and let's put that to Alternatives

* **Idea 3:** If we want to have an API for deletions, one way is to have the Store API configured in such a way that the compactor can use it to check for a match.
*Edge case:* Do we rebuild all the rows of a tombstone? Because we need to be aware that the tombstone is being included(or not) in the given Store API call.

*We're starting with (2) Idea 2*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's be clear about this.

##### Problems during implementations and possible ideas to solve :
During the implementation phase of this proposal one of the first problems we came across was that we had to pull the index of all the blocks for creating or appending a tombstone.

##### Proposed Alternative Approaches:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we would do

## Alternative sections with couple of alternative, but with clear separation. e.g

## Proposed Solution

* CLI.
* Custom format global - single file per request

.. elaborate on how and what challenges / open questions are.
 
## Alternatives

### Where Deletion API sits
 
1. Compactor: 
  * Pros: Easiest as it's single writer in Thanos (but not for Cortex)
  * Cons: Does not have quick access to querying or block indexes
1. Store GW:
  * Pros: Does have quick access to querying or block indexes
  * Cons: There would many writers, and store never was about to write things, it only needed read access

### How we store tombstone and in what format

1. Prometheus Format per block
1. Prometheus Format global
1. Custom format per block
1. Custom format global - appendable file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can explain that * Custom format global - single file per request idea is good because:

Chalenges

  • How to "compact" / "remove" those files when applied.
  • Do we want those deletion to by applied only during compaction or also for already compacted blocks. If yes how? 5%? And how to tell that when is 5%? What component would do that?
  • Any rate limiting, what if there are too many files? Would that scale?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus other ideas to talk about later like:

Optimization for file names?

Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
@bwplotka
Copy link
Member

Hey, can we finalize this and make sure it's clean for other ppl? We just share it to Cortex devs, they would be happy to contribute (:

@Harshitha1234
Copy link
Contributor Author

Yes, sure. PTAL I've made the latest changes based on our last discussion about the proposal, if something doesn't seem right please let me know 🙂

Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is a first step, but to me it leaves more open questions than answers. I understand the need to "start" with something, so LGTM! In particular:

  • I agree with global tombstones (well, I proposed it, so I'm just agreeing with myself)
  • It's likely we'll find out that file name format may open to optimisations, but I would leave this decision to the implementation phase

Let's go! 🚀


## Considerations

* The old blocks and the tombstones are deleted during compaction time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this opens to a race condition.

Above it's mentioned this:

Store Gateway masks the series on processing the global tombstone files from the object storage.

If the store gateway scans deleted tombstones before the old blocks are effectively offloaded by the store gateway (we do soft delete first, apply a delay and then delete hard delete them) we may open to a time window during which the tombstone is deleted, the store gateway removed it from its local map, the old block is still queried and we do re-expose deleted series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has to be delayed-deleted or something else... It's not easy, because:

  • How we know there are no more stuff to delete then in mentioned block?
  • What if someone backfills block that touches deleted period?
  • What if min-time is .. not specified? Does it has to be?

Behaviour on all those questions have to be defined in this proposal, explicit and simple to understand for user of deletion API I think ):

## Proposed Approach

* We propose to implement deletions via the tombstones approach using a CLI tool.
* A tombstone is proposed to be a **Custom format global - single file per request**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found the exact format of this file documented. Anyway, I would add the timestamp of the deletion request inside as well. Would allow to build hard deletion grace periods, without having to read it from the object storage (object creation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still missing @Harshitha1234

Copy link
Member

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to finally merge. Seems like we at least agree on the first steps so far 😊

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Some further comments. I think we have some further questions to define in this proposal. Let's try to find a simple solution that will work (: and simplify things for first iteration. What matters is that it has to be simple to use.

## Proposed Approach

* We propose to implement deletions via the tombstones approach using a CLI tool.
* A tombstone is proposed to be a **Custom format global - single file per request**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still missing @Harshitha1234

* We propose to implement deletions via the tombstones approach using a CLI tool.
* A tombstone is proposed to be a **Custom format global - single file per request**.
* **Why Custom format global - single file per request**? :
* https://cloud-native.slack.com/archives/CL25937SP/p1595954850430300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

We need more details, can we use full sentence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also design is to make sure we have full view, so I would rather explain instead of linking some further reading (:

docs/proposals/deletions_object_storage.md Show resolved Hide resolved
* During compaction or downsampling, we check if the blocks being considered have series corresponding to a tombstone file, if so we delete the data and continue with the process.

## Open Questions
* Optimization for file names?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe more why we would need this? (:


## Open Questions
* Optimization for file names?
* Do we create a tombstone irrespective of the presence of the series with the given matcher? Or should we check for the existence of series that corresponds to the deletion request?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved to proposal - we would love to avoid doing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have this in alternatives

## Open Questions
* Optimization for file names?
* Do we create a tombstone irrespective of the presence of the series with the given matcher? Or should we check for the existence of series that corresponds to the deletion request?
* Should we consider having a threshold for compaction or downsampling where we check all the tombstones and calculate the percentage of deletions?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this should be covered as well here 🤔

As per - how to delete stuff from already compacted blocks section


## Considerations

* The old blocks and the tombstones are deleted during compaction time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has to be delayed-deleted or something else... It's not easy, because:

  • How we know there are no more stuff to delete then in mentioned block?
  • What if someone backfills block that touches deleted period?
  • What if min-time is .. not specified? Does it has to be?

Behaviour on all those questions have to be defined in this proposal, explicit and simple to understand for user of deletion API I think ):

Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
Signed-off-by: Harshitha1234 <harshithachowdary.t17@iiits.in>
@bwplotka
Copy link
Member

Let's get back to this... is it ready for review? 🤗

@Harshitha1234
Copy link
Contributor Author

Yes, it is :)

@stale
Copy link

stale bot commented Dec 29, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Dec 29, 2020
@Harshitha1234
Copy link
Contributor Author

Still valid! @bwplotka Are there any key changes that we expect in this proposal? :)

@stale stale bot removed the stale label Dec 29, 2020
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just implement this 🤗

@bwplotka bwplotka merged commit a680903 into thanos-io:master Dec 29, 2020
@bwplotka
Copy link
Member

Merged! Thanks @Harshitha1234 🤗

@Harshitha1234
Copy link
Contributor Author

No, just implement this 🤗

Oh yeah! Let's do this then!! 💪


## Future Work

* Performing actual deletions in the object storage and rewriting the blocks based on tombstones by the compactor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question is still unknown to me. How does compactor planning work with global tombstones?
Or we just don't do anything in the planner and only delete data when compaction triggers automatically or perform block rewrite using the rewrite tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants