Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrayqueue #5

Closed
wants to merge 4 commits into from
Closed

Arrayqueue #5

wants to merge 4 commits into from

Conversation

DallasC
Copy link

@DallasC DallasC commented Apr 2, 2021

Still an early draft and wanted to get your opinion on it.

Summary

I used Crossbeam's ArrayQueue to create a multi-threaded buffer. This allows any thread to write log messages into the buffer without having to acquire mutex locks.

When the ArrayQueue is full the current thread that would overfill:

  1. Gets a lock to the underlying Writer
  2. Drains the queue
  3. writes to the file using the os i/o

While this is happening other threads can still continue to do work and write to the buffer. In the case that logs are being written to the buffer faster than it can be drained then those threads would attempt to acquire the lock for the file i/o and wait. I don't think this should be happening often though.

Comparison Overview

Old (Mutex) New (ArrayQueue)
New log!
1 Acquire mutex lock
2 check level (warn, err..)
3 construct msg based on config
4 write msg to underlying
5 free lock
New log!
1 check level
2 construct msg
3 write msg to ArrayQueue
4. If queue is full this thread gets a
mutex lock, drains the queue, writes
to file. Other threads can still write
during this time.

With the normal simple log there can be lock contention when multiple threads want to write a log entry at the same time. In addition, it is locked not just when writing the log message to the Writer but also while the log message is being constructed.

With the new version there is no need to acquire a lock when writing a log entry. When the ArrayQueue is close to being full it gets drained by whatever thread puts it over it's limit. While this is happening other threads can still write without waiting.

Benchmarks

I added some simple benchmarks using criterion. There are three benchmarks:

  1. arrayqueue: The new arrayqueue based log
  2. simple: simple log writing to file (like in the example)
  3. simplebuf: same as simple but using a BufWriter around the file.

The benchmark spawns threads and on each thread just loops sending a simple log message including the thread# and msg#.

Running the benchmarks yourself

In the benches folder at the top of each file there is const NTHREADS: u32 = 4;. Right now you have to manually set this in each file to the number threads you want to spawn.

After that is set from the root run:

mkdir log
cargo bench

log is where the files from the bench get written to.

Right now the logs aren't automatically cleared so don't forget to clear the log folder each time before running bench.

Things still need to do

  • Flush the current buffer if there's an error
  • Find a good size for the buffers. Right now I just randomly picked some numbers
  • Do some profiling

Resolve #1

Rust RFC 565 deprecated the need to implement description().
Now we can inherit from fmt::Display and just use to_string() directly.
New method: We use Crossbeam ArrayQueue as a multithreaded buffer.
We can write log messages into this buffer from any thread without
locking. When the queue is close to full we get a lock to the underlying
file, drain the queue, and write to the file. During this time other
threads are still able to commit messages to the buffer.

Old method. When a log had to be written the thread would get a lock to
the Write, build the log entry, write to the Write, release the lock. If
multiple threads have to write at the same time it would cause lock
contention.
@jinmingjian
Copy link
Contributor

@DallasC I am very appreciated for your contribution!

Under normal circumstances, developers mainly ignore the performance side of logging. The truth is, the locking may cause hiccups in the multi-thread system, which is harmful for system running. However, this harm could be solved by with small efforts.

Your way, in fact, is in a little hard way. But I am happy to review your PR. Let's to see if we can got more interesting solution.

@jinmingjian
Copy link
Contributor

jinmingjian commented Apr 3, 2021

Real good from my eye:

  1. Very clear explanations and instructions and are alert on dependencies
  2. General nice engineering practice

Some could be improve:

  1. bench panics in my running! Primary checking shows you are young for concurrent programming. The contended "add" in your ThreadWriter causes many logic mistakes in your codes. For example, the self.buf may be not full when if check(so pass), but full at at inner of Ok(_) then panics.

  2. another engineering flaw (and also for similar solutions on the world): almost 100% logs in buffer lost when crashes happens.

  3. Wild performance spoiler. A performance flaw is solved with introducing another performance flaw:) And your solution will not be always winner. (although batch is good)

Would you like to try some DB works?:) I invite you to join my TensorBase BE project. When you have some nice practices in that, you will come back to find all the problems here are easy to avoid and the solution is very simple and concise to achieve!

@jinmingjian
Copy link
Contributor

Just leave this project alone for while, hope to see you in the TensorBase.

@jinmingjian jinmingjian closed this Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lock-free-ized
2 participants