New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WT-2381: dump utility discards table config #2493
Conversation
The loadtext command requires a URI argument.
Rewrite the command-line tool entries that refer to "tables or files" to simply refer to tables, it's simpler and less confusing, and users are unlikely to be using file URIs.
Fix for WiredTiger "simple" table handling. Simple tables have column-group entries, but they aren't listed in the metadata's table entry. Figure out if it's a simple table and in that case, retrieve the column-group entry and use the value from its "source" file.
@keithbostic I added a new unit test. It fails 3 out of 5 scenarios. The good news is that your change on this branch does help - on develop it fails 4 out of 5 scenarios. |
complext tables.
indent/whitespace
Include all of the WT_SESSION::create config in the ordinary LSM metadata so it is merged correctly into the dump header. Provide an upgrade path for LSM metadata in the old format. ** Backwards bracking change for LSM: ** once metadata is upgraded to the new format, LSM trees cannot be opened with older versions of WiredTiger.
Allow any URI through with an underlying type of file, that allows the creation of top-level lsm:XXX objects, that is, LSM objects that aren't underneath tables.
LSM doesn't support column-store keys, don't try to test that combination.
Parenthesize a macro argument.
Add LSM tables to the dump test. After dumping/re-loading the database, confirm that the contents of the database are the same by comparing the objects returned by wt list. Replace simple_populate_check_cursor/complex_populate_check_cursor with simple_populate_check/complex_populate_check, then we don't have to open a cursor.
Rework the dump utility and the dump library support. Previously, the metadata cursor returned a full set of WT_SESSION.create configuration values, basically, the requested configuration values plus the default values, where the requested configuration values overrode the default configuration values. This doesn't work because dump takes a few configuration strings and collapses them into a single string, and the default configuration values start overriding real configurtion values. Change the metadata cursor to return only the requested configuration values instead, and change dump to add in the default values when it's collapsing the strings into a single string. Add a function __wt_schema_create_final; it takes a set of configuration strings, adds in the default WT_SESSION.create configuration values, and collapses them into a single string. Add a function __wt_config_strip_and_collapse; it behaves similarly to __wt_config_collapse, except it doesn't add in the default strings. Fix bugs where we weren't copying returned metadata strings into local memory. Fix bugs where we weren't correctly parsing the URIs in the metadata file.
One of the changes in 77ac147 changed the test for column-group and index names, and the changed version matches both simple and complex entries. Leave the changed test alone, instead don't look for separate column-group and index entries in the case of a simple table.
WT-2381 Rewrite LSM metadata to fix dump / load.
And it removes the need for |
Whitespace.
Michael notes dump no longer needs to use the metadata:create URI, that simplifies the change, most importantly, we no longer need two versions of config_collapse.
@michaelcahill, I've gone ahead and pushed the change you suggested. As far as I'm concerned, this one is ready for merge. |
config retention.
Thanks, @keithbostic, lgtm. I'll merge. |
WT-2381: dump utility discards table config
@sueloverso, I ended up waiting for a bunch more tests to run today, so I went a little further on this one.
I think what's going on here is that simple tables are a special case: there's no
colgroup
entry in a simple table's metadata entry, and we're not finding the table's underlying file information.This works for complex tables because their column-groups are listed in the table's metadata entry: even though we will write the same (incorrect) information for the complex table that we write for the simple table, it will be overridden by the correct information stored for the specific column-groups.
I did a "fix" by special-casing simple tables in the dump code, but that could be completely wrong. I don't have a handle on how this "ought" to behave, it's just the only path to victory I saw.
I thought a little bit about testing: the only thing that came to mind was changing test_dump to compare the metadata before and after the dump and re-load (dump the
WiredTiger.wt
file before and after the dump/re-load and assert it's the same). Obviously, you have to strip all of the checkpoint information and maybe some other stuff before that will work -- I didn't go far down that path. We'd need some way to parse the dump output in Python, too. There's a Python package to do that kind of parsing (pyparsing, or here), but we'd have to make sure it's installed to use it?Anyway, hope this is useful to you, just toss the branch if it's not.