Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit close+fix panics #12

Merged
merged 5 commits into from Oct 17, 2015

Conversation

Projects
None yet
2 participants
@zenhack
Copy link
Owner

zenhack commented Oct 7, 2015

@kalbasit, this is the branch on which I'm addressing #11. It's not done yet, but I've made progress; the panics from *_destroy are all gone, but there's some other stuff going on. So, not ready for a merge, but maybe an opinion.

@zenhack zenhack added the bug label Oct 7, 2015

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 7, 2015

This also builds atop some of your work, so we should get that finished up and then I should rebase.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 7, 2015

alright cool, I'll address your comment on #10 and review this one tonight.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 7, 2015

In the meantime, it looks like you have a data race https://travis-ci.org/zenhack/go.notmuch/jobs/84025268#L600

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 7, 2015

Thanks for pointing out the race. That isn't surprising, I'll have to think about the best way to address it.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 11, 2015

@zenhack about the data race perhaps atomic.LoadPointer might help. Mutexes could be another way to go, but not sure if we want to put Mutex logic in a SetFinalizer, that might break GC.

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 11, 2015

The race condition is actually a higher-level problem than just a non-atomic memory read. The failure case is this:

  1. In the gc, live() reports that an object is still live.
  2. Outside the gc, somebody calls Close() on a parent object (invalidating the result from (1)).
  3. The caller of live() releases the underlying object (which has already been freed in (2)).

I think we'd actually have to use a sync.RWMutex; the dependency graph isn't even a tree (though it is a DAG, thank god) so trying to use plain old locks would be really messy. It should work to do the following in Close():

  1. Acquire a write lock on the receiver.
  2. Recursively acquire read locks on the parents.
  3. Check if the object is live, free if so (what we're doing now).
  4. Release all the locks.

I've got some other stuff I need to work on today. If I don't hear back from with you with better ideas (which I unfortunately don't have) and/or problems I'll go ahead with that.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 12, 2015

The sync.RWMutex is what I meant earlier and I feel it's the only solution. However, I fear that, if we did end up in a dead lock situation, it will this time be the GC that is dead locked.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 12, 2015

<off-topic>
Today I'm wondering if it would be a much better investment to write the core of notmuch (indexing and searching) using bleve or ElasticSearch (Best would be a swappable backend). Thoughts? Let's talk in #14 about this.
</off-topic>

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 12, 2015

I'll go ahead with the rwmutex solution then -- it'll be a bit tricky, but I don't have any better ideas.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 12, 2015

agree, I can stress-test the bindings within my project, let me know when your PR is ready to be tested.

@zenhack zenhack force-pushed the add-explicit-close+fix-panics branch from 6405dec to 08d0bef Oct 12, 2015

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 15, 2015

@kalbasit, fixed the panics. I should probably squash some of the history, but barring any comments from you, this is what it'll look like. PTAL.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 15, 2015

@zenhack I'll have a look tomorrow morning, thank you.

cstruct.go Outdated
"unsafe"
)

// cStruct is the common representation of almost all of our wrapper types.

This comment has been minimized.

Copy link
@kalbasit

kalbasit Oct 16, 2015

Contributor

minor nitpick, you've got two spaces between all and of

cstruct.go Outdated
cptr unsafe.Pointer

// Parent object. Holding a pointer to this in Go-land makes the reference
// visible to the garbage collector, and thus prevents it from being

This comment has been minimized.

Copy link
@kalbasit

kalbasit Oct 16, 2015

Contributor

ditto between it and from

cstruct.go Outdated
if c.live() {
err = f()
}
if c.parent != nil {

This comment has been minimized.

Copy link
@kalbasit

kalbasit Oct 16, 2015

Contributor

you should probably defer this entire block (the unlock block), in case f() panics for some reason, defering the unlock logic will make it always run no matter what. Most servers tends to recover from panics and the current implementation will lead to a deadlock.

@kalbasit

This comment has been minimized.

Copy link
Contributor

kalbasit commented Oct 16, 2015

@zenhack minor comments but otherwise LGTM!

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 17, 2015

Cool, I'll address those and clean up the history soon, thanks.

zenhack added some commits Oct 7, 2015

Do major refactoring re: resource management.
Changes to the API:

* Most types have a Close() method, which can be used to explicitly free
  the memory used by the type. The garbage collector will still pick
  them up eventually, but since the go GC sometimes doesn't make the
  smartest decisions about foreign objects, this is available.

Functional changes to the implementation:

* Close() methods (which are called by GC) are now much safer and more
  robust:
  * They are now thread-safe.
  * It is safe to call them more than once.
  * It is safe to call Close() on a child object whose parent has
    already been Close()d.

The above issues were causing runtime panics under various
circumstances. There are a few of these still to hunt down.

The details of the internals are described in the file
README-INTERNALS.md

Organizational changes to the implementation:

* Everything with a Close() method is defined in terms of a type
  cStruct, which captures most of the common functionality between them.
  There's generally less boilerplate, and allows only writing the tricky
  synchronization stuff for use with Close()/GC once.
Fix erroneous parent assignments.
I'm not really sure why this was like this in the first place? This
seems to have fixed the panics.
Defer post-finalizer actions.
This could prevent a deadlock, in the event that f panics and something
tries to recover.

@zenhack zenhack force-pushed the add-explicit-close+fix-panics branch from 6cd72c7 to 8046a7b Oct 17, 2015

@zenhack

This comment has been minimized.

Copy link
Owner Author

zenhack commented Oct 17, 2015

Alright, addressed your comments and cleaned up the history. I'm going to let travis do its thing before merging.

@zenhack zenhack merged commit 8046a7b into master Oct 17, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.