Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mb-707 add partitions to athena based on year and month #3274

Merged
merged 4 commits into from Jan 8, 2020

Conversation

@rdhariwal
Copy link
Contributor

rdhariwal commented Jan 6, 2020

Description

Athena queries have been slowing down due to lack of partitioning. This PR extends the table creation logic and adds partitioning capability so we can run queries like:

SELECT actions_executed,
         elb_status_code,
         *
FROM alb_logs
WHERE elb_status_code = 500
        AND month =12
        AND year=2019 limit 100

Setup

Add any steps or code to run in this section to help others prepare to run your code:

make build_tools
# delete table alb_logs
query-lb-logs -v true

Then navigate to athena in aws and run

SHOW Partitions `alb_logs`;

Code Review Verification Steps

  • Code follows the guidelines for Logging
  • Tested in the Experimental environment (for changes to containers, app startup, or connection to data stores)
  • Request review from a member of a different team.
  • Have the Jira acceptance criteria been met for this change?

References

Screenshots

image

@rdhariwal rdhariwal requested review from mr337 and chrisgilmerproj Jan 6, 2020
@rdhariwal rdhariwal marked this pull request as ready for review Jan 6, 2020
@rdhariwal

This comment has been minimized.

Copy link
Contributor Author

rdhariwal commented Jan 6, 2020

This tool has been ran for experimental, staging and prod. So all logs are partitioned by month and year in all environments.

Copy link
Contributor

mr337 left a comment

Just to clarify, is it implied that if the alb table exists it will be in partitions already? If already exists and not in partitions do is there a manual process?

@rdhariwal

This comment has been minimized.

Copy link
Contributor Author

rdhariwal commented Jan 7, 2020

Just to clarify, is it implied that if the alb table exists it will be in partitions already? If already exists and not in partitions do is there a manual process?

That is correct, tables in athena are just scaffolding to query the data from the bucket. So my view is they are throwaway. So I am leaning towards drop and re-create them and partitioning made sense as part of setup. Have any concerns Lee?

@mr337

This comment has been minimized.

Copy link
Contributor

mr337 commented Jan 7, 2020

So I am leaning towards drop and re-create them and partitioning made sense as part of setup. Have any concerns Lee?

Nope just wanted to see if we run this the query will magically take care of partitioning. If not, which I believe I understand that it wont, we may need some docs :)

@rdhariwal

This comment has been minimized.

Copy link
Contributor Author

rdhariwal commented Jan 7, 2020

So I am leaning towards drop and re-create them and partitioning made sense as part of setup. Have any concerns Lee?

Nope just wanted to see if we run this the query will magically take care of partitioning. If not, which I believe I understand that it wont, we may need some docs :)

👍 I can also make a query param that will just add partitions. I already have an if exists clause in partition creation. Documenting makes sense, let me take care of that!

Copy link
Contributor

chrisgilmerproj left a comment

🚀 - I'm approving but I'd love for you to have more doc updates in this PR.

… value for athena logs table
@rdhariwal rdhariwal changed the title add partitions to athena based on year and month mb-707 add partitions to athena based on year and month Jan 8, 2020
@rdhariwal

This comment has been minimized.

Copy link
Contributor Author

rdhariwal commented Jan 8, 2020

🚀 - I'm approving but I'd love for you to have more doc updates in this PR.

more docs here: transcom/ppp-infra#786

@@ -243,10 +247,22 @@ func CheckEnv(serviceAthena *athena.Athena, logger, infoLogger *log.Logger, v *v

if !isTableFound {
logger.Println("Table not found, creating table....")
err := createLogTable(serviceAthena, logger, infoLogger, dbName, logBucket)
err = createLogTable(serviceAthena, logger, infoLogger, dbName, logBucket)

This comment has been minimized.

Copy link
@chrisgilmerproj

chrisgilmerproj Jan 8, 2020

Contributor

This seems to be changing the value of err outside the scope. I am not sure you wanted to do that. Maybe a better setup here is to be specific with the error name so you don't trigger the shadow variables linter.

Suggested change
err = createLogTable(serviceAthena, logger, infoLogger, dbName, logBucket)
errCreateLogTable := createLogTable(serviceAthena, logger, infoLogger, dbName, logBucket)
Copy link
Contributor

chrisgilmerproj left a comment

🚀 - this looks good but I was curious about one bit of code that looked odd to me. Can you double check?

@rdhariwal

This comment has been minimized.

Copy link
Contributor Author

rdhariwal commented Jan 8, 2020

🚀 - this looks good but I was curious about one bit of code that looked odd to me. Can you double check?

refactored err variable usage. I went back and forth on it and landed on avoiding re-using the var when possible is a good practice.

@rdhariwal rdhariwal merged commit 019378b into master Jan 8, 2020
16 checks passed
16 checks passed
auto-approve dependabot PRs
Details
ci/circleci: acceptance_tests_experimental Your tests passed on CircleCI!
Details
ci/circleci: acceptance_tests_local Your tests passed on CircleCI!
Details
ci/circleci: acceptance_tests_staging Your tests passed on CircleCI!
Details
ci/circleci: build_app Your tests passed on CircleCI!
Details
ci/circleci: build_migrations Your tests passed on CircleCI!
Details
ci/circleci: build_storybook_app Your tests passed on CircleCI!
Details
ci/circleci: build_tasks Your tests passed on CircleCI!
Details
ci/circleci: build_tools Your tests passed on CircleCI!
Details
ci/circleci: check_generated_code Your tests passed on CircleCI!
Details
ci/circleci: client_test Your tests passed on CircleCI!
Details
ci/circleci: integration_tests Your tests passed on CircleCI!
Details
ci/circleci: pre_deps_golang Your tests passed on CircleCI!
Details
ci/circleci: pre_deps_yarn Your tests passed on CircleCI!
Details
ci/circleci: pre_test Your tests passed on CircleCI!
Details
ci/circleci: server_test Your tests passed on CircleCI!
Details
@rdhariwal rdhariwal deleted the rd-partition-athena branch Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.