Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for data streams #8

Merged
merged 2 commits into from
Oct 22, 2023
Merged

Add support for data streams #8

merged 2 commits into from
Oct 22, 2023

Conversation

eyadmba
Copy link

@eyadmba eyadmba commented Oct 15, 2023

PR for issue #7

This PR demonstrates what I think should change to support data streams but I made those changes quickly on GitHub's online code editor, so maybe I forget to add comma somewhere. It's only to demonstrate. If we like it, I'll pull it down and properly run it and test it.

@vduseev
Copy link
Owner

vduseev commented Oct 16, 2023

Thanks a lot for the PR @eyadmba 👏

I agree with your observations on this issue. To be honest, I had no idea data streams require explicit action/op-type specification for bulk actions.

I can't come up with a better way to add this than the is_data_stream flag. I thought about adding a new enum called IndexType with two values: Normal and DataStream. But that's confusing to old school Elasticsearch users because Index Type used to mean something else. There is also an option to add another value to RotateFrequency called DATA_STREAM. But it looks weird since "Data Stream" is not a rotation frequency. Just trying to figure out a good way to account for "what if they add another index mode called Data Hose in the future". A boolean flag seems to be the best choice right now.

Inside the code I would actually make use of the self._get_index() function but modify it to use the is_data_stream variable. So that if is_data_stream == True it returns self._get_never_index_name().

In that case the code will look like this

                index = self._get_index()
                actions = [
                  {
                    '_index': index,
                    '_source': record,
                    # op_type must be explicitly set to 'create' for bulk operations
                    # on data streams. See issue #7.
                    '_op_type': 'create' if self.is_data_stream else 'index'
                  } for record in logs_buffer
                ]

If you could please make these changes and update the main README as well I would appreciate it a lot. I will add a test that explicitly covers this scenario before the end of the year.

@eyadmba
Copy link
Author

eyadmba commented Oct 16, 2023

Thanks for the quick response! I did incorporate the suggestions, and I ran the tests locally to make sure I didn't affect existing functionality.
I did also test it with a data stream as well but manually. I don't know how you wanna go about automating the data stream integration tests because they require the creation of an indexing template specific for data streams.

Here's what I did, I created an index template that will create a data stream when a document is indexed using it:

PUT /_index_template/org-logs
Content-Type: application/json

{
  "index_patterns": "org-logs-*",
  "data_stream": {
    // this data_stream object here is what differentiates a 
    // data stream-specific index template from a regular one.
    "timestamp_field": {
      "name": "@timestamp"
    }
  },
  "priority": 200,
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    }
  }
}

and then I set the logging handler's index_name="org-logs-prod" and confirmed that the log record does get indexed into that data stream.

@vduseev vduseev merged commit ccd29c5 into vduseev:main Oct 22, 2023
@vduseev
Copy link
Owner

vduseev commented Oct 22, 2023

Thank you for your contribution @eyadmba! 🎉 I appreciate it a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants