Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple time macro in source path from time spec #84

Closed
shawncao opened this issue Jan 25, 2021 · 3 comments · Fixed by #86
Closed

Decouple time macro in source path from time spec #84

shawncao opened this issue Jan 25, 2021 · 3 comments · Fixed by #86
Assignees
Labels
enhancement New feature or request

Comments

@shawncao
Copy link
Collaborator

Currently we support roll spec with time MACROs (date, hour, min, second) by specifying MACRO pattern in time spec. I think we should decouple this to get a better flexibility.

For example, below spec should be a legit spec:

test-table:
  ...
  source: s3://xxx/{date}/{hour}/
  time:
      type: column
      column: col2
      pattern: UNIXTIME_MS

This spec is basically asking us to scan s3 file path with supported macros in it, but the time is actually from an existing column. My understanding is we don't support MACRO parsing if we don't specify it in time spec.

cc @chenqin

@shawncao shawncao added the enhancement New feature or request label Jan 25, 2021
@shawncao shawncao added this to the Community Ready milestone Jan 25, 2021
@chenqin
Copy link
Contributor

chenqin commented Jan 27, 2021

True, we should consider extra timestamp from record instead of using batch time macro

for swap, we don't have timestamp...

genSpecPerFile(table, version, files, specs, 0);

for roll, we use timebucket to infer batch timestamp aka watermark

const auto watermark = now - i * curUnitInSeconds;

void genSpecPerFile(const TableSpecPtr& table,
const std::string& version,
const std::vector<FileInfo>& files,
std::vector<std::shared_ptr<IngestSpec>>& specs,
size_t watermark) noexcept {

@chenqin
Copy link
Contributor

chenqin commented Jan 27, 2021

If no objection, will prepare a pr shortly.

@shawncao
Copy link
Collaborator Author

shawncao commented Jan 27, 2021 via email

@shawncao shawncao self-assigned this Jan 27, 2021
@shawncao shawncao linked a pull request Jan 27, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Nebula Data Source
Awaiting triage
Development

Successfully merging a pull request may close this issue.

2 participants