Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fiber.top() too inaccurately measures information about time consumption of the fiber #10018

Open
ochaton opened this issue May 16, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@ochaton
Copy link
Member

ochaton commented May 16, 2024

Tarantool version: 3.0.2

Context is provided to @sergepetrenko

Repro: https://gist.github.com/ochaton/21ffce3128a1f52b1e3b34cdc9e5be90

TLDR:

This code is evaluated correctly

local do_stress = function()
	fiber.yield()
	fiber.name("stress")
	local N = 1000
	while N > 0 do
		for _ in box.space._space:pairs() do
			N = N - 1
			if N % 100 == 0 then fiber.yield() end
			if N <= 0 then break end
		end
	end
end

This code not:

		local do_replace = function(n)
			local fib = require 'fiber'
			fib.yield()
			local limit = 100
			box.begin()
				for i = 1, 1000 do
					limit = limit -1
					box.space.test:replace({(n + i)%1e6, fib.id(), fib.time()})
					if limit <= 0 then
						box.commit()
						box.begin()
						limit = 100
					end
				end
			box.commit()
		end

This is awful:

			local fib = require 'fiber'
			box.begin()
			for i = 1, 100 do
				box.space.test:replace({(n + i)%1e6, fib.id(), fib.time()})
			end
			box.commit()
			fib.yield()

This code works ok:

			local fib = require 'fiber'
			fib.yield()
			local limit = 100
			box.begin()
				for i = 1, 1000 do
					limit = limit -1
					box.space.test:replace({(n + i)%1e6, fib.id(), fib.time()})
					if limit <= 0 then
						box.commit()
						fib.yield()
						box.begin()
						limit = 100
					end
				end
			box.commit()
@ochaton ochaton added the bug Something isn't working label May 16, 2024
@sergepetrenko sergepetrenko self-assigned this May 16, 2024
@ochaton ochaton changed the title fiber.top() provides too inaccurately counts information about time consumption of the fiber fiber.top() too inaccurately measures information about time consumption of the fiber May 17, 2024
@sergepetrenko
Copy link
Collaborator

It seems our cbus endpoint fetch callback (used in fiber_pool as well) may be scheduled before EV_CHECK watchers, because it uses an EV_ASYNC watcher. And all the time spent inside the EV_ASYNC watcher is lost, because we assume an event loop iteration starts with a EV_CHECK watcher execution and ends with EV_PREPARE one.

Example code with no yields:

                        local fib = require 'fiber'                              
                         local limit = 100                                        
                         fib.name("worker")                                       
                         box.begin()                                              
                                 for i = 1, 1000000 do                            
                                         limit = limit -1                         
                                         box.tuple.new{(n + i)%1e6, fib.id(), fib.time()}
                                         if limit <= 0 then                       
                                                 box.commit()                     
                                                 box.begin()                      
                                                 limit = 100                      
                                         end                                      
                                 end                                              
                         box.commit()

With it one event loop iteration takes ~18 seconds on my pc, but fiber.top() registers only a fraction of this time (18500 microseconds)

@sergepetrenko
Copy link
Collaborator

Note how cbus_endpoint accepts a fetch_cb callback invoked on an async watcher:

ev_async_init(&endpoint->async,
(void (*)(ev_loop *, struct ev_async *, int)) fetch_cb);

And fiber_pool passes fiber_pool_cb() as fetch_cb to endpoint_create:
cbus_endpoint_create(&pool->endpoint, name, fiber_pool_cb, pool);

fiber_pool_cb immediately calls an idle fiber, if there is one in the pool, resulting in some work being done inside the EV_ASYNC watcher before the EV_CHECK watcher. And essentially losing track of the time it took.
fiber_pool_cb(ev_loop *loop, struct ev_watcher *watcher, int events)
{
(void) loop;
(void) events;
struct fiber_pool *pool = (struct fiber_pool *) watcher->data;
/** Fetch messages */
cbus_endpoint_fetch(&pool->endpoint, &pool->output);
struct stailq *output = &pool->output;
while (! stailq_empty(output)) {
struct fiber *f;
if (! rlist_empty(&pool->idle)) {
f = rlist_shift_entry(&pool->idle, struct fiber, state);
fiber_call(f);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants