Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

platform/posix/semaphore.c 中的 tb_semaphore_wait 实现存在bug #252

Closed
duyanning opened this issue Dec 5, 2023 · 2 comments
Closed

Comments

@duyanning
Copy link
Contributor

duyanning commented Dec 5, 2023

085: tb_long_t tb_semaphore_wait(tb_semaphore_ref_t self, tb_long_t timeout)
086: {
087:     // check
088:     tb_atomic32_t* semaphore = (tb_atomic32_t*)self;
089:     tb_assert_and_check_return_val(semaphore, -1);
090: 
091:     // init
092:     tb_long_t   r = 0;
093:     tb_hong_t   base = tb_cache_time_spak();
094: 
095:     // wait
096:     while (1)
097:     {
098:         // get post
099:         tb_long_t post = (tb_long_t)tb_atomic32_get(semaphore);
100: 
101:         // has signal?
102:         if (post > 0)
103:         {
104:             // semaphore--
105:             tb_atomic32_fetch_and_sub(semaphore, 1);
106: 
107:             // ok
108:             r = post;
109:             break;
110:         }
111:         // no signal?
112:         else if (!post)
113:         {
114:             // timeout?
115:             if (timeout >= 0 && tb_cache_time_spak() - base >= timeout) break;
116:             else tb_msleep(200);
117:         }
118:         // error
119:         else
120:         {
121:             r = -1;
122:             break;
123:         }
124:     }
125: 
126:     return r;
127: }

问题场景:
若有信号量:
tb_semaphore_ref_t sem;
其值为1.

A、B两个线程并发执行tb_semaphore_wait,
A执行99行,获得post为1.
此时发生线程切换,B执行99行,同样获得post为1.
再次线程切换,A顺利通过102行的if,然后再次线程切换,B也顺利通过102行的if。
再次线程切换,A通过105行的tb_atomic32_fetch_and_sub将信号量的值减为0.
再次线程切换,B通过105行的tb_atomic32_fetch_and_sub将信号量的值减为-1.
再次线程切换,A执行108-109两行,结束循环,返回了1.
再次线程切换,B执行108-109两行,结束循环,返回了1.

我是怎么发现这个问题的?
我用一个信号量做锁,保护一个输出流,
结果总是发现偶尔会发生怪现象:
tb_stream_sync中通过tb_queue_buffer_pull_init获得缓冲区中的元素个数为,比如200,
然后等到执行tb_queue_buffer_pull_exit时,函数内部第一行的assert就被冒犯了:
tb_assert_and_check_return(buffer && buffer->head && size <= buffer->size);
原因是最后那个 size <= buffer->size 不成立。

我再三确认,对这个输出流的所有操作,都被同一个信号量保护着。
我就怀疑是信号量的问题,
发现用的信号量竟然不是linux的信号量实现,而是tbox自己的信号量实现。

这个问题的深层次根源在于:
信号量是不可能在用户模式下实现的,必须在内核模式下实现。
用户模式下只能实现自旋锁。

我看tbox里有依赖linux系统的信号量实现,请问我怎么才能用到那个实现?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Title: platform/posix/semaphore.c

@duyanning
Copy link
Contributor Author

我找到办法了,是当系统为linux时,
xmake/check_interfaces.lua 中并没有对semaphore.h的存在性进行检测

我复制了一行上去,就好了:
3VHVWU_)CU3V@_9G@{R4%HI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants