-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alloc genotype memory from mmap #808
Conversation
Speedy! My only option on the cluster at the moment is to try this out on an network drive. I will also do the same thing on my machine, using an NMVe drive with the same ram/disk ratio to get a feel for if the performance is viable and how much the network impacts that. |
Top level interface for Python is |
Some performance numbers:
I'm not sure why the magnetic run had so many more page faults. Will do some more testing after it's easier to set the mmap location. |
That's looking feasible, right. What is the non-mmap time? |
First line of the table - 8:11. |
Oops. Didn't realise. Doh! Thanks 🤦 |
Codecov Report
@@ Coverage Diff @@
## main #808 +/- ##
==========================================
- Coverage 93.50% 93.21% -0.30%
==========================================
Files 17 17
Lines 5821 5835 +14
Branches 1041 1044 +3
==========================================
- Hits 5443 5439 -4
- Misses 248 265 +17
- Partials 130 131 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This is ready for review - stacked on #808 |
I'll take a look at this once #808 is in and the diff is easy to review. |
99fb936
to
c126563
Compare
Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, just a couple of comments.
goto out; | ||
} | ||
self->mmap_buffer = mmap( | ||
NULL, self->mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, self->mmap_fd, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unnecessary complexity, but once the genotypes are loaded into the mmap, we could speed things up by re-opening it with only PROT_READ
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think the benefit would be marginal, and it might cause a full flush of the file.
Worth following up if you like though, we can open an issue?
) | ||
a1.assert_data_equal(a2) | ||
|
||
def test_mmap_missing_dir(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test mmap to dir with the wrong perms to check we get a nice message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
num_threads=num_threads, | ||
mmap_temp_dir=tmpdir, | ||
) | ||
a1.assert_data_equal(a2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test that mmap file has been removed after generating the ancestors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, done
Comments addressed, should be ready to go |
Here's a quick stab that should basically work; fancy trying it out @benjeffery?