# Parallelism on JVM I

In this lecture, we study basic primitives used to express parallel computation on JVM. The are many forms of parallelism.

* GPU
* Custom Parallel Hardware
* Muti-core CPU
* Multi processors
* Distributed systems

We study a specific parallel programming environment, but we strive to be general. The ideas and algorithms you see generalize to other models easily.

**Our parallel programming model assumption**

- **Multicore or multiprocessor system with shared memory.**
- **Our programs run on JVM runtime, which executes on top of an operating system**

Operating system – software that manages hardware and software
resources, and schedules program execution.

Process – an instance of a program that is executing in the OS.

The same program can be started as a process more than once, or even
simultaneously in the same OS.

Each time a process is started, while it executes,the operation systems assign it some resources, execution time on CPU, file handler, or network ports etc,.

Each process is assigned unique idenitifier.

A process is most coarse grained unit of concurrency on a shared memory system.

The operating system multiplexes many different processes and a limited number of CPUs, so that they get _time slices_ of execution. This mechanism is called _multitasking_.

Two different processes cannot access each other’s memory directly – they are isolated. For us it means, processes cannot easily communicate. While operating system primitives like pipes, allow two processes to exchange information, interprocess communication is not usually straight forward.

We therefore have more fine grained programming primitive.

Each process can contain multiple independent concurrency units called
**threads**.

Threads can be started from within the same program, and they share the same memory address space.

Each thread has a program counter and a program stack. The program stack is a region of memory that sequence of method invocations currently being executed. Program counter describes the current position in the program method.

JVM threads cannot modify each other’s stack memory. Stack entries which correspond to local entries, which are accessible to the thread that owns the stack. They can only modify the heap memory. To communicate JVM threads must modify heap memory.

Each JVM process starts with a main thread. This thread executes the main method of Scala program. In normal sequential program, we use only main thread to execute the program. However in parallel program we must start multiple threads, and the operating system assigns it to the available CPU.

To start additional threads:

1. Define a `Thread `subclass.
2. Instantiate a `new Thread` object.
3. Call `start` on the `Thread` object.

The `Thread` subclass defines the code that the thread will execute. The same custom `Thread` subclass can be used to start multiple threads.

In [1]:
class HelloThread extends Thread {
override def run() {
println("Hello world!")
}
}
val t = new HelloThread
t.start()
t.join()

Hello world!


defined [32mclass[39m [36mHelloThread[39m
[36mt[39m: [32mHelloThread[39m = Thread[Thread-0,5,]

When main thread encounters `t.start()` it starts a new thread of type `HelloThread`. The two threads then execute in parallel. When the main thread calls join, it halts it execution untill `HelloThread` completes. After the completion of `HelloThread` execution, the main thread can proceed.

Let's do another experiment. Define a new `HelloThread` like this.

In [2]:
class HelloThread extends Thread {
override def run(): Unit =  {
println("Hello") 
println("world!")
}
}

defined [32mclass[39m [36mHelloThread[39m

Let's run two `HelloThread` in parallel.

In [3]:
def main():Unit = {
 val t = new HelloThread
 val s = new HelloThread
 t.start()
 s.start()
 t.join()
 s.join()
}

defined [32mfunction[39m [36mmain[39m

Let's run the `main` method several times. 

In [5]:
main()

Hello
world!
Hello
world!


In [6]:
main()

Hello
world!
Hello
world!


Above demo shows two different statements executing in two threads, can overlap aribtarly. Sometimes, we would like to ensure sequence of statements to execute once as if they are one statements. Here we want to make sure two such sequences in two different threads cannot overlap. Either `t` or `s` executes all of its statements first. In cocurrent programming we call this atomicity.

* The previous demo showed that separate statements in two threads can overlap.

* In some cases, we want to ensure that a sequence of statements in a specific thread executes at once.
* An operation is atomic if it appears as if it occurred instantaneously from the point of view of other threads.

In order to see why atomicity is important, let's demo.
```scala
private var uidCount = 0L
def getUniqueId(): Long = {
uidCount = uidCount + 1
uidCount
}
```
When a thread calls `getUniqueId()` the value it gets is not returned to any other thread.

We define a new method that starts a new thread that uses `getUniqueId()`.
```scala
def startThread() = {
    val t = new Thread
    override def run() = {
        val uids = for(i <- 1 to 10) yield getUniqueId()
        println(uids)
    }
    t.start()
    t
}
```

We start two such threads and see what happens.

```scala

startThread()
startThread()

```
        

In [7]:
private var uidCount = 0L
def getUniqueId(): Long = {
uidCount = uidCount + 1
uidCount
}
def startThread() = {
    val t = new Thread{
    override def run() = {
        val uids = for(i <- 1 to 10) yield getUniqueId()
        println(uids)
        }
    }
    t.start()
    t
}
startThread()
startThread()


Vector(1, 2, 4, 5, 7, 8, 9, 10, 12, 13)
Vector(1, 3, 6, 9, 11, 14, 15, 16, 17, 18)


defined [32mfunction[39m [36mgetUniqueId[39m
defined [32mfunction[39m [36mstartThread[39m
[36mres6_3[39m: [32mThread[39m = Thread[Thread-7,5,main]
[36mres6_4[39m: [32mThread[39m = Thread[Thread-8,5,]

The numbers obtained the two threads are not at all unique. In particular 1,9 repeats itself. The `getUniqueID()` does not execute atomic fashion. Seperate statements in its body can interleave arbitrarly when executing on different processes. As a consequence invocation of `getUniqueId` do not return unique values.

A thread could read `uidCount` and adds 1 to it. But before the actuall assignment takes place, the second thread does the same thing. If both thread execute the assignment, they will both try to write value 1 back into `uidCount` as a result they both return 1. 


Scala and Java achieve atomicity by using synchronized block. Code block after a synchronized call on object x is never executed by two threads at the same time.
JVM ensures this by storing monitor on each object. Atmost one thread can own a monitor at a time. For example if  thread `T0` owns a monitor on object `x`, another `T1` thread cannot acquire the monitor before `T0` releases it.

`synchronized` method must be invoked on some object. Let's see a demo.

In [10]:
private val x = new AnyRef {}
private var uidCount = 0L
def getUniqueId(): Long = x.synchronized{
uidCount = uidCount + 1
uidCount
}

defined [32mfunction[39m [36mgetUniqueId[39m

The method `getUniqueId` is surrounded by synchronized block on object `x`. To verify atomicity,

In [11]:
def startThread() = {
    val t = new Thread{
    override def run() = {
        val uids = for(i <- 1 to 10) yield getUniqueId()
        println(uids)
        }
    }
    t.start()
    t
}
startThread()
startThread()


Vector(11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
Vector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)


defined [32mfunction[39m [36mstartThread[39m
[36mres10_1[39m: [32mThread[39m = Thread[Thread-9,5,main]
[36mres10_2[39m: [32mThread[39m = Thread[Thread-10,5,]

Now we get unique ids.