Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WT-3093 Padding the WT_RWLOCK structure grew the WT_PAGE structure. #3212

Merged
merged 11 commits into from
Dec 23, 2016

Conversation

keithbostic
Copy link
Contributor

Padding the WT_RWLOCK structure grew the WT_PAGE structure.

…inlocks.

WiredTiger currently has no need for a separate allocation, I suspect the API
is the way it is because it was translated from the original POSIX pthread
implementation.

Additionally, remove the lock name field from the read/write lock structure,
shrinking the lock from 16B to 8B, the name field was never used, and it
should be easy to identify the read/write lock's purpose from the enclosing
structure. This means we no longer need two separate structures (the lock and
the lock plus name), which simplifies the actual implementation.
@keithbostic keithbostic self-assigned this Dec 20, 2016
…e-length

column-store RLE array off-page into a separate allocation (instead of just
the array itself), and moving the number-of-entries for the leaf pages out of
the per page-type union.

The latter change simplifies a bunch of stuff, row-store and fixed-length
column-store no longer require a structure in the union at all, and lots
of the #define's to handle that go away.
Make gcc5 happy with how I'm declaring the structures.
@keithbostic
Copy link
Contributor Author

@michaelcahill, @agorrod:

In summary, I first backed out the change that grew the WT_PAGE structure.

Second, I inlined the WT_RWLOCK structures inside the structures using them instead of allocating separate memory (that was always true for the read/write lock in the WT_PAGE structure, and matches the WT_SPINLOCK structures). I can't think of any reason to allocate WT_RWLOCK structures, I suspect it was done that way because it made sense for the old POSIX pthread read/write lock implementation, and the API wasn't changed when we wrote our own read/write locks). At the same time, I dropped the WT_RWLOCK.name field (we never used), which makes the WT_RWLOCK structure 8B and simplifies the implementation.

In hindsight, I suspect I might instead have tried to create two versions of the WT_RWLOCK macro, one that's not padded (for WT_PAGE), and one that is padded (for everywhere else), although that approach has some obvious problems with typecasting.

The reason I'm thinking about that is because we have similar issues with WT_SPINLOCK. Read/write and spinlock padding is 128B per WT_DATA_HANDLE structure. I can argue that's OK, but it's also a chunk of space a data handle doesn't need, unless you think either of the locks in that structure are likely sharing problems. Of course, we could introduce sharing problems in the future, which is why I thought of having padded and not-padded versions of the structures, so we'd call out where we're padding and where we're not.

Anyway, back to the branch: third, I realized I was in shooting distance of getting the WT_PAGE structure down to 64B, there was 8B I'd known was available there for awhile, so I re-structured to get that 8B, and we're now at 72B.

It's a proof of concept, and can easily be discarded, there aren't any bug fixes here.

@keithbostic
Copy link
Contributor Author

keithbostic commented Dec 21, 2016

@agorrod, @michaelcahill, I took a look at the numbers, both with a default tcmalloc library and with the MongoDB tcmalloc library.

Allocating 20 million objects and calculating the overhead; the WT_PAGE structure in this branch is 72B, it was previously 88B, and the recent change bumped it to 136B.

object size(b)  object base(KB) actual size(KB) overhead(KB)
 56             1093750         1257472         163722 (+15%, 8B/object)
 64             1250000         1257472           7472 (+0.5%, 0.4B/object)
 72             1406250         1607428         201178 (+14%, 10B/object)
 88             1718750         1920772         202022 (+11%, 10B/object)
136             2656250         3191556         535306 (+20%, 27B/object)

So, there's a clear 64B bucket effect, but the absolute overhead cost of 72B vs 64B, for a 32KB (or even 4KB page), is close to zero.

I wouldn't turn down a 64B WT_PAGE size, and I suspect there are will be some efficiencies in hitting the tcmalloc bucket dead on, but there's clearly not enough space savings to push for it.

@michaelcahill
Copy link
Contributor

Thanks @keithbostic, this change LGTM. I'll merge.

@michaelcahill michaelcahill merged commit 9216a5b into develop Dec 23, 2016
@michaelcahill michaelcahill deleted the wt-3093-wt_page-grow branch December 23, 2016 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants