Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application *locks* has stopped on double write-lock #19

Closed
ddosia opened this issue Oct 21, 2015 · 8 comments
Closed

Application *locks* has stopped on double write-lock #19

ddosia opened this issue Oct 21, 2015 · 8 comments

Comments

@ddosia
Copy link

ddosia commented Oct 21, 2015

I have two actors which works approximately in the same time. Each of them begins transaction. Each of them acquires read lock on the same oid(). Then first tries to upgrade read lock to write lock. Second does the same and application crashes immediately:

Logs of the first actor:

Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
(n1@dch-mbp)1> application:ensure_all_started(locks).
{ok,[locks]}
(n1@dch-mbp)2> {Agent, TrRes} = locks:begin_transaction().
{<0.46.0>,{ok,[]}}
(n1@dch-mbp)3> locks:lock(Agent, [table], read).
{ok,[]}
(n1@dch-mbp)4> locks:lock(Agent, [table], write).
=ERROR REPORT==== 21-Oct-2015::14:45:19 ===                                                                                                                                                                [20/376]
** Generic server locks_server terminating 
** Last message in was {'$gen_cast',{surrender,[table],<0.55.0>}}
** When Server state == {st,{locks_server_locks,locks_server_agents},
                            {dict,2,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]},
                                  {{[],[],[],[],[],[],[],
                                    [[<0.55.0>|#Ref<0.0.0.76>]],
                                    [],[],[],[],[],[],
                                    [[<0.46.0>|#Ref<0.0.0.69>]],
                                    []}}},
                            <0.44.0>}
** Reason for termination == 
** {function_clause,[{locks_server,queue_entries_,
                                   [[{entry,<0.55.0>,<0.53.0>,4,direct}]],
                                   [{file,"src/locks_server.erl"},{line,211}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,214}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,214}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,212}]},
                     {locks_server,queue_entries,1,
                                   [{file,"src/locks_server.erl"},{line,207}]},
                     {locks_server,notify,3,
                                   [{file,"src/locks_server.erl"},{line,193}]},
                     {locks_server,handle_cast,2,
                                   [{file,"src/locks_server.erl"},{line,142}]},
                     {gen_server,handle_msg,5,
                                 [{file,"gen_server.erl"},{line,604}]}]}

=INFO REPORT==== 21-Oct-2015::14:45:19 ===
    application: locks
    exited: shutdown
    type: temporary
** exception error: {cannot_lock_objects,[{req,[table],
                                               read,
                                               ['n1@dch-mbp'],
                                               0,all},
                                          {req,[table],write,['n1@dch-mbp'],1,all}]}
     in function  locks_agent:await_reply/1 (src/locks_agent.erl, line 397)
     in call from locks_agent:lock_/6 (src/locks_agent.erl, line 380)
(n1@dch-mbp)5> application:which_applications().
[{stdlib,"ERTS  CXC 138 10","1.19.4"},
 {kernel,"ERTS  CXC 138 10","2.16.4"}]

Logs of the second actor:

Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
(n2@dch-mbp)1> 
User switch command
 --> r 'n1@dch-mbp'
 --> c
Eshell V5.10.4  (abort with ^G)
(n1@dch-mbp)1> {Agent, TrRes} = locks:begin_transaction().
{<0.55.0>,{ok,[]}}
(n1@dch-mbp)2> locks:lock(Agent, [table], read).
{ok,[]}
(n1@dch-mbp)3> locks:lock(Agent, [table], write).
** exception error: {cannot_lock_objects,[{req,[table],
                                               read,
                                               ['n1@dch-mbp'],
                                               0,all},
                                          {req,[table],write,['n1@dch-mbp'],1,all}]}
     in function  locks_agent:await_reply/1 (src/locks_agent.erl, line 397)
     in call from locks_agent:lock_/6 (src/locks_agent.erl, line 380)

I am new to locks so I am trying to learn how it works. In some sense I need lock upgrade functionality, that is why I was curious how it works. Maybe I miss something and what I did goes against very basics of what locks should do.

@uwiger
Copy link
Owner

uwiger commented Oct 21, 2015

Could you try the PR above ( #20 )? I added a test case, which did fail before this fix.

@ddosia
Copy link
Author

ddosia commented Oct 22, 2015

It doesn't crash any more, but it hangs forever on both sides when I am trying to acquire write lock.
My naive understanding of upgrade lock is like that: both acquires read lock, first tries to upgrade to write lock and this implies that it releases read lock and stands in the end of the queue, second does the same and should release read lock and step after first one. Maybe I misunderstand how lock upgrade works? Why then deadlock detection mechanism doesn't prevent me from this?

@uwiger
Copy link
Owner

uwiger commented Oct 31, 2015

It's not a question of the deadlock resolution algorithm, but rather of the lock upgrade semantics. Specifically, the locks_server handles the trivial case of upgrade when there's one read lock, but when there are several read locks, it can't differentiate between agents that want nothing more than a read lock and agents that are holding a read lock but hoping to upgrade.

@uwiger
Copy link
Owner

uwiger commented Oct 31, 2015

I'm at a Halloween party, so probably not sober enough to tackle the issue right now, nor would it likely be socially acceptable. ;-)

If contribs are offered, I'll gratefully review them. Otherwise, I'll take a look at this later.

@uwiger
Copy link
Owner

uwiger commented Oct 31, 2015

Another problem is that the test case needs to verify that the two write lock requests reach different results (currently, they both time out, which is wrong).

@uwiger
Copy link
Owner

uwiger commented Nov 1, 2015

I've pushed some fixes to the uw-lock_upgrade3 branch. They seem to fix the problem.

Could you try to verify at your end?

@ddosia
Copy link
Author

ddosia commented Nov 3, 2015

It works now, first actor obtains write lock immediately after second tries to acquire write lock.
thanks!

@uwiger
Copy link
Owner

uwiger commented Nov 3, 2015

Thanks! I've merged PR #20 into master.

@uwiger uwiger closed this as completed Nov 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants