Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WT-147: Dynamic Index creation. Use bulk=unordered #2189

Merged
merged 26 commits into from Sep 22, 2015

Conversation

ddanderson
Copy link
Member

No description provided.

ddanderson and others added 18 commits May 29, 2015 08:59
the exclusive handle to be reopened by the same session multiple times;
the handle is unlocked when the reference count drops to zero.
Needed for WT-147.
to insert a single index entry into apply_single_idx().
WT-147.
a given year, not just the first one found.
…ocked,

using session flags.  This does not yet work (fails some tests).
btrees that have "shared locking".  Only enable this feature selectively,
controlled by a session flag.  For now, only creating and filling
an index needs it.
The feature allows us to stop worrying about schema lock
interactions when building LSM indexes on existing tables.
The regular worker-thread switch semantic causes problems when
creating an index on an existing table.
It means the index creation code can treat LSM and non-LSM tables
consistently.
WT-147 Add undocumented bulk=unordered for LSM cursors.
so that LSM files are filled in the application thread to avoid locking
issues.  Index cursors know that bulk=unordered only applies for the
index file, not for column group files in the main table.
@ddanderson
Copy link
Member Author

@agorrod, here's my latest attempt. Unfortunately, there's a different problem. This hangs in test_backup02.py during the populate. Here's a reduced test case:

#!/usr/bin/env python
#

import wiredtiger, wttest
from helper import compare_files,\
    complex_populate, complex_populate_lsm, simple_populate

# test_backup03.py
#    Utilities: wt backup
# Test cursor backup with target URIs
class test_zz(wttest.WiredTigerTestCase):

    # This test is written to test LSM hot backups: we test a simple LSM object
    # and a complex LSM object, but we can't test them both at the same time
    # because we need to load fast enough the merge threads catch up, and so we
    # test the real database, not what the database might look like after the
    # merging settles down.
    #
    # The way it works is we create 4 objects, only one of which is large, then
    # we do a hot backup of one or more of the objects and compare the original
    # to the backup to confirm the backup is correct.
    pfx = 'test_zz'

    # Create a large cache, otherwise this test runs quite slowly.
    def setUpConnectionOpen(self, dir):
        wtopen_args = 'create,cache_size=1G'
        conn = wiredtiger.wiredtiger_open(dir, wtopen_args)
        self.pr(`conn`)
        return conn

    # Test backup with targets.
    def test_backup_target(self):
        self.big = 1
        self.list = [1]
        self.tty('--')
        self.tty('PART 0')
        simple_populate(self, 'table:test_zz.1', 'key_format=S', 1000)
        self.tty('PART 1')
        simple_populate(self, 'lsm:test_zz.2', 'key_format=S', 200000)
        self.tty('PART 2')
        complex_populate(self, 'table:test_zz.3', 'key_format=S', 1000)
        self.tty('PART 3')
        complex_populate_lsm(self, 'table:test_zz.4', 'key_format=S', 1000)
        self.tty('DONE populate')

        # Backup needs a checkpoint
        self.session.checkpoint(None)


if __name__ == '__main__':
    wttest.run()

Here are backtraces from the relevant threads. Thread 1 is the application thread:

* thread #1: tid = 0x18f4f78, 0x00007fff8d9df166 libsystem_kernel.dylib`__psynch_mutexwait + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8d9df166 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff97e35696 libsystem_pthread.dylib`_pthread_mutex_lock + 480
    frame #2: 0x00000001045123c9 libwiredtiger-2.6.2.dylib`__wt_spin_lock(session=0x0000000100874600, t=0x0000000103804a40) + 25 at mutex.i:159
    frame #3: 0x000000010451220a libwiredtiger-2.6.2.dylib`__wt_curfile_open(session=0x0000000100874600, uri=0x00000001067720a0, owner=0x0000000106772cb0, cfg=0x00007fff5fbfc240, cursorp=0x0000000106772e10) + 634 at cur_file.c:531

   528           * failing with EBUSY due to a database-wide checkpoint.
   529           */
   530          if (LF_ISSET(WT_DHANDLE_EXCLUSIVE))
-> 531              WT_WITH_CHECKPOINT_LOCK(session, ret =
   532                  __wt_session_get_btree_ckpt(
   533                  session, uri, cfg, flags));

    frame #4: 0x00000001045a5660 libwiredtiger-2.6.2.dylib`__wt_open_cursor(session=0x0000000100874600, uri=0x00000001067720a0, owner=0x0000000106772cb0, cfg=0x00007fff5fbfc240, cursorp=0x0000000106772e10) + 992 at session_api.c:281
    frame #5: 0x0000000104512cde libwiredtiger-2.6.2.dylib`__wt_curindex_open(session=0x0000000100874600, uri=0x0000000104683414, owner=0x0000000000000000, cfg=0x00007fff5fbfc240, cursorp=0x00007fff5fbfc210) + 1550 at cur_index.c:497
    frame #6: 0x00000001045a54ea libwiredtiger-2.6.2.dylib`__wt_open_cursor(session=0x0000000100874600, uri=0x0000000104683414, owner=0x0000000000000000, cfg=0x00007fff5fbfc240, cursorp=0x00007fff5fbfc210) + 618 at session_api.c:265
    frame #7: 0x00000001045a784d libwiredtiger-2.6.2.dylib`__session_open_cursor(wt_session=0x0000000100874600, uri=0x0000000104683414, to_dup=0x0000000000000000, config=0x00000001045d8367, cursorp=0x00007fff5fbfc2b8) + 973 at session_api.c:359
    frame #8: 0x0000000104595413 libwiredtiger-2.6.2.dylib`__fill_index(session=0x0000000100874600, table=0x0000000106d08ba0, name=0x0000000104683414) + 179 at schema_create.c:350
    frame #9: 0x00000001045944fb libwiredtiger-2.6.2.dylib`__create_index(session=0x0000000100874600, name=0x0000000104683414, exclusive=0, config=0x000000010445494c) + 3179 at schema_create.c:542
    frame #10: 0x0000000104592943 libwiredtiger-2.6.2.dylib`__wt_schema_create(session=0x0000000100874600, uri=0x0000000104683414, config=0x000000010445494c) + 467 at schema_create.c:699
    frame #11: 0x00000001045a5d30 libwiredtiger-2.6.2.dylib`__wt_session_create(session=0x0000000100874600, uri=0x0000000104683414, config=0x000000010445494c) + 736 at session_api.c:395

   393      WT_DECL_RET;
   394
-> 395      WT_WITH_SCHEMA_LOCK(session,
   396          WT_WITH_TABLE_LOCK(session,
   397          ret = __wt_schema_create(session, uri, config)));
   398      return (ret);


    frame #12: 0x00000001045a7d50 libwiredtiger-2.6.2.dylib`__session_create(wt_session=0x0000000100874600, uri=0x0000000104683414, config=0x000000010445494c) + 800 at session_api.c:442
    frame #13: 0x000000010445fa73 _wiredtiger.so`_wrap_Session_create(self=<unavailable>, args=<unavailable>) + 252 at wiredtiger_wrap.c:6187

The application is holding the SCHEMA_LOCK (frame 11) and wants the CHECKPOINT_LOCK (in frame 3).

Thread #7 is the LSM worker thread:


  thread #7: tid = 0x18f4fd1, 0x00007fff8d9df166 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #0: 0x00007fff8d9df166 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff97e35696 libsystem_pthread.dylib`_pthread_mutex_lock + 480
    frame #2: 0x00000001045af129 libwiredtiger-2.6.2.dylib`__wt_spin_lock(session=0x0000000100875200, t=0x0000000103804c40) + 25 at mutex.i:159
    frame #3: 0x00000001045aec69 libwiredtiger-2.6.2.dylib`__wt_session_get_btree(session=0x0000000100875200, uri=0x0000000106c266b0, checkpoint=0x0000000000000000, cfg=0x0000000105209c90, flags=264) + 953 at session_dhandle.c:496

   493              F_CLR(dhandle, WT_DHANDLE_EXCLUSIVE);
   494              WT_RET(__wt_writeunlock(session, dhandle->rwlock));
   495
-> 496              WT_WITH_SCHEMA_LOCK(session,
   497                  WT_WITH_HANDLE_LIST_LOCK(session, ret =
   498                  __wt_session_get_btree(
   499                  session, uri, checkpoint, cfg, flags)));

   frame #4: 0x00000001045ae845 libwiredtiger-2.6.2.dylib`__wt_session_get_btree_ckpt(session=0x0000000100875200, uri=0x0000000106c266b0, cfg=0x0000000105209c90, flags=264) + 373 at session_dhandle.c:317
    frame #5: 0x0000000104512234 libwiredtiger-2.6.2.dylib`__wt_curfile_open(session=0x0000000100875200, uri=0x0000000106c266b0, owner=0x0000000000000000, cfg=0x0000000105209c90, cursorp=0x0000000105209c60) + 676 at cur_file.c:531

frame #5: 0x0000000104512234 libwiredtiger-2.6.2.dylib`__wt_curfile_open(session=0x0000000100875200, uri=0x0000000106c266b0, owner=0x0000000000000000, cfg=0x0000000105209c90, cursorp=0x0000000105209c60) + 676 at cur_file.c:531
   528           * failing with EBUSY due to a database-wide checkpoint.
   529           */
   530          if (LF_ISSET(WT_DHANDLE_EXCLUSIVE))
-> 531              WT_WITH_CHECKPOINT_LOCK(session, ret =
   532                  __wt_session_get_btree_ckpt(
   533                  session, uri, cfg, flags));
   534          else

    frame #6: 0x00000001045a5660 libwiredtiger-2.6.2.dylib`__wt_open_cursor(session=0x0000000100875200, uri=0x0000000106c266b0, owner=0x0000000000000000, cfg=0x0000000105209c90, cursorp=0x0000000105209c60) + 992 at session_api.c:281
    frame #7: 0x00000001045a784d libwiredtiger-2.6.2.dylib`__session_open_cursor(wt_session=0x0000000100875200, uri=0x0000000106c266b0, to_dup=0x0000000000000000, config=0x00000001045cd6af, cursorp=0x0000000105209d28) + 973 at session_api.c:359
    frame #8: 0x000000010448dfb7 libwiredtiger-2.6.2.dylib`__wt_bloom_finalize(bloom=0x0000000106c38830) + 167 at bloom.c:213
    frame #9: 0x000000010455d482 libwiredtiger-2.6.2.dylib`__lsm_bloom_create(session=0x0000000100875200, lsm_tree=0x0000000100384340, chunk=0x0000000105dcf760, chunk_off=1) + 690 at lsm_work_unit.c:425
    frame #10: 0x000000010455ce3e libwiredtiger-2.6.2.dylib`__wt_lsm_work_bloom(session=0x0000000100875200, lsm_tree=0x0000000100384340) + 366 at lsm_work_unit.c:228
    frame #11: 0x000000010455f340 libwiredtiger-2.6.2.dylib`__lsm_worker_general_op(session=0x0000000100875200, cookie=0x0000000103811fc8, completed=0x0000000105209eb8) + 592 at lsm_worker.c:74
    frame #12: 0x000000010455eea2 libwiredtiger-2.6.2.dylib`__lsm_worker(arg=0x0000000103811fc8) + 338 at lsm_worker.c:122
    frame #13: 0x00007fff97e3805a libsystem_pthread.dylib`_pthread_body + 131
    frame #14: 0x00007fff97e37fd7 libsystem_pthread.dylib`_pthread_start + 176
    frame #15: 0x00007fff97e353ed libsystem_pthread.dylib`thread_start + 13

The LSM worker thread has obtained the CHECKPOINT_LOCK (frame 5), and needs the SCHEMA_LOCK (frame 3).

I suppose we could always grab the checkpoint lock before taking the schema lock in __wt_session_create (maybe only need to do this if we are creating an index). Any other ideas?

Unfortunately, building an index and doing a checkpoint, in their own worst case, may be long operations. I wonder if there's a way to build the index file first as sort of a temp file (outside of the WT schema), and when that is finished, then take the schema lock and officially insert it? We'd want to keep a lock on the main file so inserts wouldn't happen in the meantime. That's a bigger project, perhaps as a followup.

@agorrod
Copy link
Member

agorrod commented Sep 16, 2015

@ddanderson I wasn't able to reproduce the hang you reported. I've pushed a change that relaxes the bulk=unordered semantic, meaning we don't get exclusive access any longer. Can you retest to see if it resolves the issue for you?

@ddanderson
Copy link
Member Author

@agorrod Thanks! Perhaps you didn't see the same hang because of timing of asynchronous checkpoints. I didn't see it if I changed the line: simple_populate(self, 'lsm:test_zz.2', 'key_format=S', 200000) to have a lower number.

At any rate, your change fixes my hang, I'm bumping all the number of entries way up now to make sure.

@ddanderson
Copy link
Member Author

@agorrod I've done some extra testing and I'm satisfied with this.

@@ -331,6 +331,11 @@ __wt_conn_btree_open(
F_SET(btree, LF_ISSET(WT_BTREE_SPECIAL_FLAGS));

WT_ERR(__wt_btree_open(session, cfg));
if (F_ISSET(dhandle, WT_DHANDLE_EXCLUSIVE) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddanderson I don't really understand the condition here. Could you add a comment to explain why it's necessary?

@agorrod
Copy link
Member

agorrod commented Sep 17, 2015

@ddanderson I had a couple of minor questions, but this is generally looking good to me now. Well done!

@ddanderson
Copy link
Member Author

@agorrod, I tightened up the conditional and commented the best I could. If a dhandle used by a bulk cursor is allowed to be shared, then test_bulk02.test_bulkload_backup.test_bulk_backup fails with an assertion at txn_ckpt.c:1152 in __wt_checkpoint_sync:

    WT_ASSERT(session, bm != NULL);

I haven't gotten very far at looking at this, and I don't have any deeper understanding of why a bulk dhandle cannot use the sharing code. If you do, maybe we can make the comment more insightful.

@agorrod
Copy link
Member

agorrod commented Sep 18, 2015

@ddanderson Sorry I didn't get back to this today. It's top of my list next week.

…nglements with the WT_CURSOR_INDEX implementation, which is concerned with reading from indexes.
@michaelcahill
Copy link
Contributor

@ddanderson, after reviewing where this branch ended up, I pushed cfd31ce in an attempt to simplify things. This populates the index source directly (usually "file:foo" or "lsm:foo" rather than "index:foo"). I think this means we don't need to filter out flags because we avoid the issue of an index cursor opening the column groups.

Can you please review? @agorrod is otherwise happy, I'll do a final review and merge if you agree with this approach.

@ddanderson
Copy link
Member Author

@michaelcahill LGTM, that's a very nice simplification.

@agorrod
Copy link
Member

agorrod commented Sep 21, 2015

@ddanderson I'm happy with this change now. @michaelcahill Would you like to do a final review before merging this change?

@michaelcahill
Copy link
Contributor

Thanks @ddanderson -- congratulations on seeing this through! I'll merge now...

michaelcahill added a commit that referenced this pull request Sep 22, 2015
WT-147: Dynamic Index creation.  Use bulk=unordered
@michaelcahill michaelcahill merged commit 959376c into develop Sep 22, 2015
@michaelcahill michaelcahill deleted the index-create-lsm3 branch September 22, 2015 05:59
@agorrod
Copy link
Member

agorrod commented Sep 22, 2015

Woo! Thanks for your patience with my review turnarounds @ddanderson

@ddanderson
Copy link
Member Author

@agorrod and @michaelcahill, thanks guys! Feels good to get it done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants