Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch insert post values and other migrations to speed up post import #3453

Merged
merged 5 commits into from
Dec 14, 2018

Conversation

rjmackay
Copy link
Contributor

This pull request makes the following changes:

  • Increase batch size to 200
  • Merge in: feat: Cache form attributes #3451
  • Bulk insert post values
  • Clean up weird behaviour when inserting post tags. We were looking up every single tag by tag, then by ID. Now I'm assuming a numeric value == an id. (And validation happens elsewhere)

Test checklist:

  • composer test
  • I certify that I ran my checklist

Ping @ushahidi/platform

@rjmackay
Copy link
Contributor Author

It's worth noting that this and bits of #3439 could easily be applied to csv imports and make them much much faster.

@coveralls
Copy link

coveralls commented Dec 11, 2018

Coverage Status

Coverage increased (+0.006%) to 22.753% when pulling 182b621 on migration/batch-insert-post-values into 888f4a5 on migration-tool.

@rjmackay rjmackay mentioned this pull request Dec 11, 2018
4 tasks
@@ -26,7 +26,14 @@ class ImportMappingRepository /*extends EloquentRepository*/ implements ImportMa

public function create(ImportMapping $model) : int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my profiling, it looks a bit like adding the cache-on-write actually made this slower not faster, but I need to find a larger dataset to test on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting we should definitely check to see why that might be

return $tag['id'];
}

if (is_numeric($tag)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a new assumptions that a numeric tag is always the ID. I did consider dropping this parsing entirely, but I think we might be relying on it for CSV imports to map tag names in the CSV to an ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are definitely relying on it so it needs to remain sadly, it's actually hugely useful in relation to tagging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it needed to remain as it was? or is this partial refactor (assuming numeric = tag id) ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of the importer I could pass tags as ['id' => 1] style and that'd be completely safe to skip the lookup on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no I think this is fine

if ($entity->values) {
$this->updatePostValues($id, $entity->values);
$postsById = collect($newPostIds)->combine($collection);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't make sense, I can try to make it more readable. It's an ugly transformation that might be clearer with a foreach instead of all the map/flatten/groupby stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this style but can you expand the comment to be more like:
"reformat set of Posts as...group by 'key' bulk insert attribitues then return X..."

@rjmackay rjmackay force-pushed the migration/batch-insert-post-values branch from 7897634 to 313836f Compare December 11, 2018 20:55
@rjmackay rjmackay closed this Dec 11, 2018
@rjmackay rjmackay changed the base branch from migration/import-incidents to migration-tool December 12, 2018 02:27
@rjmackay rjmackay reopened this Dec 12, 2018
Copy link
Contributor

@willdoran willdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test the cache weirdness but I think this can merge with that understanding

- Add createManyValues to UpdatePostValueRepository interface
- Implement createManyValues for various value repos
- Refactor PostRepository->createMany() to aggregate values and then send them to repo in bulk
@rjmackay rjmackay force-pushed the migration/batch-insert-post-values branch from 313836f to c020099 Compare December 13, 2018 22:30
@rjmackay rjmackay merged commit 136ca32 into migration-tool Dec 14, 2018
@rjmackay rjmackay deleted the migration/batch-insert-post-values branch December 14, 2018 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants