If you split the algorithm into fewer tasks than cores (coarse-grained granularity), you are not taking advantage of all the resources (link)
It's based on the one presented by Intel in their Threading Methodology: Principles and Practices document. (link)
Unless it is imperative, don't include blocking operations inside a critical section. (link)
A elegant solution to this problem has been implemented, as the Initialization-on-demand holder idiom (https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom). (link)
A good example of such documentation can be found in the compute() method of the ConcurrentHashMap class. (link)
Avoid executing inside the critical section the code you don't control. F (link)
These variables are classes that support atomic operations on single variables. They include a method, denominated by compareAndSet(oldValue, newValue), that includes a mechanism to detect if assigning to the new value to the variable is done in one step. (link)
If only one of the tasks modifies the data and the rest of the tasks read it, you can use the volatile keyword without any synchronization or data inconsistency problem. (link)
Another option you have is to use something like ConcurrentHashMap<Thread, MyType> and use it like var.get(Thread.currentThread()) or var.put(Thread.currentThread(), newValue) (link)
Get the information about the system dynamically (for example, in Java you can get it with the method Runtime.getRuntime().availableProcessors()) and make your algorithm use that information to calculate the number of tasks it's going to execute. (link)
They are Coffman's conditions, which are as follows: (link)
This design pattern defines how to use global or static variables locally to tasks. (link)
In this circumstance, a lock provides poor performance because all the read operations can be made concurrently without any problem. (link)
A memory model describes how individual tasks interact with each other through memory and when changes made by one task will be visible to another. (link)