What is concurrency and parallelism.
What is a data race and a deadlock and how to avoid them
What is a goroutine, a channel, a select statement, a wait group, a mutex.
How to write a concurrent program with those tools.
Mutex
Channel
Concurrency
Parallelism
Wait group
Deadlock
Data Race
Select statement
A process is an instance of a program that is currently executed by a computer.
\nIf you want to see the list of processes that run on your machine, you can open a terminal and run the following command (on UNIX systems) :
\n$ ps xau
\nYou will see the complete list of processes that are running on your machine. This command will output the following informations :
\nUSER : the user that launched the process
PID : the id of the process (each process on a coputer has an unique id, therefore it makes esy to identicate one process)
%CPU : Will display the percetage of the CPU usage of the process
%MEM : the percentage of the memory used
COMMAND : the command of the process including the arguments
Here we have two processes
They were both started with the same command ./programFoo
They have two different process ids : 65865 and 65689.
Both were launched today at 11:37 and 11:38
On Windows you can type :
\n$ tasklist
\n\nA thread represents the sequential execution of a set of instructions. A single process can create multiple threads.
\nFor each process we have at least one thread of execution.
\nBy creating more than one thread we create multiple execution streams that may share some data.
\n\n“Concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome”1 Concurrency refers to an ability. Go supports concurrency. We can write a Go program that will run a set of tasks concurrently.
\n\nThe previous notion of concurrency can be mixed up with parallelism. “Concurrency is not parallelism” (Rob Pike) 2. Parallelism refer to tasks that are executed simultaneously (at the same time). A concurrent program might run tasks in parallel.
\n\nA data race may occur if two (or more) threads of execution access a shared variable
\nAND when one (or more) thread(s) want to update the variable
AND there is no “explicit mechanism to prevent the accesses from being simultaneous” in the threads
Imagine that you have developed an e-commerce website that handles request concurrently.
\nYou have two clients at the same time that are interested by the same product : a computer screen. The current stock of the product is stored in a database.
\ntime | \nJohn | \nJeanne | \nStock in Database | \n
---|---|---|---|
0 | \nLoad the product page | \nLoad the product page | \n1 | \n
1 | \nPress the order button | \nWait | \n1 | \n
2 | \nStock OK ? => Yes | \nPress the order button | \n1 | \n
3 | \nUpdate the stock to 0 | \nStock OK ? => Yes | \n1 | \n
4 | \n\n | Update the stock to 0 | \n0 | \n
5 | \n\n | Display confirmation page | \n0 | \n
6 | \n... | \nDisplay confirmation page | \n0 | \n
Let’s take our example step by step :
\nAt time 0 John and Jeanne load the same product page.
At time 1 John press the order button
When John clicks on the order button our script will check the stock in database.
At that time it’s OK we have 1 in stock (see the last column) We launch the update process in database that will take two units of time.
At time 3 and 4 the product is being updated. At the the beginning of time 5 the product stock is therefore 0 in database...
But Jeanne has also pressed the order button . The update of the stock is also launched at time 4.
We have sold something that we do not have in stock. This is a data race. It happened because we have two threads (John and Jeanne) that want to access in write to the same shared variable, and we had no mechanism to prevent the simultaneous access.…
\n\nWhen you build and test your program you can use the Go race detector. It will detect potential errors :
\n$ go build -race -o myProgramName main.go
\n\nA deadlock occurs when two components of a program are waiting for each others. In such a case the whole system is blocked. No progress can be made in execution of the whole program.
\n\nThis problem was originally created by Edsger W. Dijkstra.
\nThe problem is stated as follow :
\nWe have a round table
5 philosophers are seated.
Each philosopher has a spaghetti plate in front of them
Between each plate there is a fork placed
The rules of the diner are the following :
\nTo eat, a guest must have two forks
A guest can only use the left and the right fork
We need to design a program for each philosopher.
\nA solution can be the following :
\nRepeat until the diner is finished:
\nWhen the right fork is available take it
When the left fork is available take it
When you have both forks start to eat 100 gr of spaghetti
Release the two forks
Let’s load the program into the brain of each philosopher and start the diner. Each philosopher will be a thread in our main “diner” program.
\nWhen you start the diner the following actions will be trigerred :
\nThe philosopher 1 will take the fork I (since it’s available)
The philosopher 2 will take the fork II
The philosopher 3 will take the fork III
The philosopher 4 will take the fork IV
The philosopher 5 will take the fork V
In this situation the program of each philosopher as executed the first instruction of our program :
\nWhen the right fork is available take it
\nThen the second instruction :
\nWhen the left fork is available take it
\nCannot be executed because the left fork is not available. The program is blocked. It’s a Deadlock !
\n\nThe main tool to create concurrent systems in Go is the goroutine.
\n\nIn each program there is a goroutine this is the main goroutine. In order to demonstrate it let’s build a simple program and let’s make it panic. We will see in the stacktrace (the error messages) that there is a goroutine behind it :
\n// concurrency/main-goroutine/main.go \npackage main\n\nfunc main(){\n panic("show me the goroutine")\n}
\nLet’s build the program (go build) and launch the executable. We have the following stacktrace :
\npanic: show me the goroutine\ngoroutine 1 [running]:\nmain.main()
\nWe have a single goroutine. The main goroutine that has the index 1.
\n\nA goroutine is a function that executes independently from the rest of the program. The basic element of a goroutine is a function. Every function can become goroutines. Launching a goroutine is as simple as launching a function except that you just add the “go” word just before the function call.
\n\nWe define a function printNumber
:
func printNumber(){ \n i :=0 \n for { \n time.Sleep(1 * time.Second) \n i++ \n fmt.Println(i) \n } \n}
\nIt begins with the initialization of a variable i
at the value 0.
Then it starts an infinite loop (with the for instruction).
\nIn this infinite loop we make the program pause for one second with time.Sleep(1 * time.Second)
.
After that we increment i
(i++
)
and then we print i
After that we declare our main function (the main goroutine) :
\nfunc main(){ \n fmt.Println("launch goroutine") \n go printNumber() \n fmt.Println("launch goroutine") \n go printNumber() \n time.Sleep(1 * time.Minute) \n}
\nIn this program we launch 2 goroutines sequentially.
At the end we pause the program 1 minute.
Why this last pause ? When you do not add the pause the program will just stop right after it’s execution because the thread of execution has reached the end of the main function. If we want to see our goroutine execute we have to wait them to be launched. Launching a goroutine does not block the main thread.
\nHere is the complete program :
\n// concurrency/goroutine-example/main.go \npackage main\n\nimport ( \n "fmt" \n "time" \n)\n\nfunc main(){ \n fmt.Println("launch first goroutine") \n go printNumber() \n fmt.Println("launch second goroutine") \n go printNumber() \n time.Sleep(1 * time.Minute) \n}\n\nfunc printNumber(){ \n i :=0 \n for { \n time.Sleep(1 * time.Second) \n i++ \n fmt.Println(i) \n } \n}
\nThe execution of this program will output the following :
\nlaunch first goroutine \nlaunch second goroutine \n1 \n1 \n2 \n2 \n3 \n3 \n4 \n4
\n\nGoroutines can communicate with each others through channels. A channel can be seen as a pipeline of data between two goroutines. This pipeline can only support a specific type.
\nChannels can be :
\nSend Only
Receive only
Bidirectional (it can send or receive)
A channel that can can be used to send values of type T is denoted : chan<- T
A channel that can can be used to receive values of type T is denoted : <-chan T
A channel that can can be used to send and receive values of type T is denoted : chan T
The zero value of a channel type is nil
Channels are initialized with the make
built-in :
In order to initialize a bidirectional unbuffered channel of int
you can use the following code :
ch1 := make(chan int)
\nIn order to initialize a bidirectional buffered channel of string
you can use the following code :
ch2 := make(chan string, 3)
\n3 is the capacity of the channel. This is the space allocated by Go to store the values sent to the channel.
\n\nA channel is unbuffered when you do not specify it’s capacity when you create it. A channel with a size of zero is also unbuffered.
\nTo create an unbuffered channel you can use the following code source :
\nch3 := make(chan float)
\nOr by specifying explicitely a size of 0 (which is equivalent to the previous notation) :
\nch4 := make(chan float, 0)
\n\nA buffered channel is a channel where you specify the buffer size when you create it.
\nch6 := make(chan float, 16)
\nHere we have created a buffered channel with capacity of 16.
\nLet’s say that we have a channel called ch5
. To send an element to this channel we will use the arrow syntax : <-
.
Those two characters convey the idea of data that flows from the right to the left.
\npackage main\n\nfunc main() {\n ch5 := make(chan int, 2)\n ch5 <- 42\n}
\nIn the previous snippet :
\nWe initialize a bidirectional buffered channel of integers (ch5
)
Then we send the number 42 into the channel with the syntax :
\nSend statements have some specific rules :
\nThe channel and the expression are evaluated before communication
You can send on a channel if it’s open. If you send on a closed channel your program will panic !
If you send on a nil channel it blocks your program forever
A channel can be closed with the built-in close
. Closing a channel indicate that “no more values will be sent on the channel”3.
You cannot close a receive only channel.
You cannot send data on a closed channel.
You cannot close a channel already closed.
You can receive data on a closed channel (see next section)
package main\n\nimport "log"\n\nfunc main() {\n var received int\n ch5 := make(chan int, 2)\n ch5 <- 42\n close(ch5)\n}
\nHere we send the value 42 into the channel, then we close it with the close built-in : close(ch5)
.
In order to receive something from a channel you can use two types of syntax
\n\npackage main\n\nimport "log"\n\nfunc main() {\n var received int\n ch5 := make(chan int, 2)\n ch5 <- 42\n ch5 <- 41\n received = <-ch5\n log.Println(received)\n}
\nIn this code snippet:
\nWe create a buffered channel of integers : ch5
We send the values 42 and 41 into the channel
We set the value of received
to <-ch5
We print the value of received
The previous script will output : 42.
\n\nThis syntax is used in order to be sure that our channel is not closed or empty. (source :
x, ok = <-ch \nif !ok {\n log.Println("channel is empty or closed")\n}
\nIn this example we have the variable x that will hold the value of what is sent in the channel but we add an ok variable that will help us determine if the channel is empty or closed.
\nThe value of ok is a boolean and will be equal to :
\ntrue
: is everything is ok, we received a value. (nominal case)
false
: if the channel is empty or closed.
package main\n\nimport "fmt"\n\nfunc main() {\n ch5 := make(chan int, 2)\n ch5 <- 42\n close(ch5)\n received, ok := <-ch5\n fmt.Println(received, ok)\n}
\nThis program will output :
\n42 true
\nThe channel is closed so ok is true and we still received a value from the channel 42 (was sent before closing)
\n\nWhen the arrow points to the channel we send data into the channel
Otherwise we receive data from the channel
There is no buffer allocated to store the message sent, as a consequence the sender blocks until another goroutine receive the message.
\n\n// concurrency/channel-capacity-send/main.go \npackage main\n\nimport (\n "log"\n "time"\n)\n\nfunc main() {\n ch := make(chan int)\n go dummy(ch)\n log.Println("waiting for reception...")\n ch <- 45\n log.Println("received")\n}\n\nfunc dummy(c chan int) {\n time.Sleep(3 * time.Second)\n <-c\n}
\nWe create an unbuffered channel of integers ch
A second goroutine is launched with go dummy(ch)
The goroutine will wait 3 seconds then receive data on the input channel.
\nIn the main goroutine we send the value 45 into the channel.
This program outputs
\n2021/02/15 21:53:27 waiting for reception...\n2021/02/15 21:53:30 received
\nThe send operation blocks the main goroutine
It is unblocked when the dummy goroutine receive the value sent
A buffered channel has a buffer allocated. Therefore the sender needs only to wait until the data has been efectively copied to the internal buffer.
\nLet’s take the previous program and add a capacity to the channel :
\n// concurrency/channel-capacity-send-buffered/main.go \npackage main\n\nimport (\n "log"\n "time"\n)\n\nfunc main() {\n ch := make(chan int, 1)\n go dummy(ch)\n log.Println("waiting for reception...")\n ch <- 45\n log.Println("received")\n}\n\nfunc dummy(c chan int) {\n time.Sleep(3 * time.Second)\n <-c\n}
\nThe program output is :
\n2021/02/15 21:56:05 waiting for reception...\n2021/02/15 21:56:05 received
\nYou can note that the send operation is not blocking. 45 is written to the channel buffen and then the next line of code is executed.
\n\nFor buffered and unbuffered channels a receive operation blocks the goroutine until a message is received.
\n\nThe builtin len
will return the number of elements queued into the channel buffer
The builtin cap
will return the buffer capacity
\n | Unbuffered channel | \nBuffered channel | \n
---|---|---|
Init. | \n
| \nmake(chan float, 16) | \n
Send is blocking.. | \nuntil the recipient has received the value | \nuntil the data is copied to the buffer | \n
Receive is blocking... | \n\n | \n |
Here is an example4 :
\n// concurrency/channel-usecase/main.go\npackage main\n\nimport (\n "time"\n)\n\nfunc main() {\n syncCh := make(chan bool)\n // launch a second goroutine\n go func() {\n longTask2()\n // finished\n syncCh <- true\n }()\n longTask()\n // blocks until the second goroutine has finished\n <-syncCh\n}\n\nfunc longTask2() {\n time.Sleep(1 * time.Second)\n}\n\nfunc longTask() {\n time.Sleep(3 * time.Second)\n}
\nThe unbuffered channel is used here to synchronize the main goroutine with the secong goroutine. The receive operation <-syncCh
is blocking until the other goroutine has finished. To signal that it has finished the second goroutine will send the value “true” in the channel.
Buffered channels are used to limit limit throughput between goroutines
\n// concurrency/deadlock/main.go\npackage main\n\nimport (\n "log"\n)\n\nfunc main() {\n ch := make(chan int, 1)\n go dummy(ch)\n log.Println("waiting for reception...")\n ch <- 45\n ch <- 58\n ch <- 100\n}\n\nfunc dummy(c chan int) {\n smth := <-c\n log.Println("has received something", smth)\n}
\nThis code will cause a dedlock. Here is the execution result :
\n2021/02/16 11:19:57 waiting for reception...\n2021/02/16 11:19:57 has received something 45\nfatal error: all goroutines are asleep - deadlock!\n\ngoroutine 1 [chan send]:\nmain.main()\n /Users/maximilienandile/Documents/DEV/goBook/concurrency/sendBlocking/main.go:15 +0xea\n\nProcess finished with exit code 2
\nWe have created an buffered channel with a capacity equal to 1
This channel is passed to a new goroutine that will receive data on the channel (the dummy
function).
We send three values on the channel : 45, 58 and 100.
The first value is received by the dummy goroutine.
\nWhen we send the third value on the channel the main goroutine will be blocked
\nThe program will wait indefinitely.
// concurrency/deadlock-unbuffered/main.go\npackage main\n\nfunc main() {\n ch := make(chan int)\n ch <- 5\n}
\nThis simple program will cause a deadlock. The main goroutine is waiting indefinitely until a receipient receive the data sent.
\n\nSelect statements are used to choose “which of a set of possible send or receive operations will proceed”5.
\nA select statement is similar to a switch case statement but for communication operations.
In a select statement you have cases and an optional default case.
The first non-blocking case will be chosen
If 2 or more cases are not blocking a single one is chosen via an “uniform pseudo-random” selection.
If all cases are blocking, then the default case is chosen
// concurrency/select-without-default/main.go\npackage main\n\nimport "log"\n\nfunc main() {\n\n ch1 := make(chan string, 1)\n ch2 := make(chan string, 1)\n ch1 <- "test"\n\n select {\n case rec, ok := <-ch1:\n if ok {\n log.Printf("received on ch1 : %s", rec)\n }\n case rec, ok := <-ch2:\n if ok {\n log.Printf("received on ch2 : %s", rec)\n }\n }\n log.Println("end")\n}
\nWe have a select statement with 2 receive operations.
We are waiting for reception on ch1
and ch2
.
The first non blocking case is chosen => here the first case case rec, ok := <-ch1:
ch1
is the first channel to receive a message. Here is the output of this program :
2021/02/16 17:05:27 received on ch1 : test\n2021/02/16 17:05:27 end
\n\n// concurrency/select-with-default/main.go\npackage main\n\nimport "log"\n\nfunc main() {\n\n ch1 := make(chan string, 1)\n ch2 := make(chan string, 1)\n select {\n case rec, ok := <-ch1:\n if ok {\n log.Printf("received on ch1 : %s", rec)\n }\n case rec, ok := <-ch2:\n if ok {\n log.Printf("received on ch2 : %s", rec)\n }\n default:\n log.Println("default case")\n }\n log.Println("end")\n}
\nThe default
case is chosen
Nothing has been sent on the two channels
The two cases of the select statement are blocking, the communication operations cannot proceed.
An empty select statement blocks the goroutine indefinitively.
\npackage main\n\nfunc main() {\n select {}\n}
\nThis program will cause a deadlock.
\n\nI will detail in this section some common use cases of channel and selects that I find interesting.
\n\nA program has the ability to stop in a clean and ordened way if it implements a “graceful shutdown” mechanism.
\nThis mechanism has to detect that the program has been interrupted. The os/signal
package has a dedicated function called Notify
that can detect OS signals such as an “interrupt”.
After detection of such a signal the program needs to implement a specific shutdown logic (save some state, delete some temporary objects, close some connections ...)
\nIn this section we will take the example of an HTTP server :
\n// concurrency/server-graceful-shutdown/main.go\npackage main\n\nimport (\n "context"\n "fmt"\n "log"\n "net/http"\n "os"\n "os/signal"\n "syscall"\n "time"\n)\n\nfunc main() {\n // create the notification channel\n bye := make(chan os.Signal)\n signal.Notify(bye, os.Interrupt, syscall.SIGTERM)\n\n mux := http.NewServeMux()\n mux.Handle("/status", http.HandlerFunc(\n func(w http.ResponseWriter, r *http.Request) {\n fmt.Fprintf(w, "OK")\n },\n ))\n srv := &http.Server{\n Addr: ":8081",\n Handler: mux,\n }\n // launch the server in another goroutine\n go func() {\n // launch the server\n err := srv.ListenAndServe()\n if err != nil && err != http.ErrServerClosed {\n log.Fatalf("server: %q\\n", err)\n }\n }()\n // wait for os signal\n sig := <-bye\n // the code below is executed when we receive an os.Signal\n log.Printf("detected os signal %s \\n", sig)\n ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)\n err := srv.Shutdown(ctx)\n cancel()\n if err != nil {\n log.Fatal(err)\n }\n}
\nsignal.Notify
will use the channel bye
to notify to our program when the program receive :
an interruption signal os.Interrupt
os.SIGTERM
A classical server with one route \"/status\"
is created.
The server is launched into a new goroutine
On the main goroutine we wait for a potential os signal with sig := <-bye
Once the signal has been received we execute this code :
log.Printf("detected os signal %s \\n", sig)\nerr := srv.Shutdown(context.Background())\nif err != nil {\n log.Fatal(err)\n}
\nWe call srv.Shutdown
which will gracefully shuts down the server.
It will “first close all open listeners, then close all idle connections, and then wait indefinitely for connections to return to idle and then shut down” 6
A timeout is an amount of time that is allowed before an event takes place. It can be interesting to implement a timeout when your program rely on external resources that might become unavailable. Without a timeout your program might wailt indefinitely for the resource.
\nWith a select statement we can add a timeout case :
\n// concurrency/timeout/main.go\npackage main\n\nimport (\n "log"\n "time"\n)\n\nfunc main() {\n ch := make(chan int, 1)\n select {\n case rec, ok := <-ch:\n if ok {\n log.Printf("received %d", rec)\n }\n case rec, ok := <-time.After(time.Second * 3):\n if ok {\n log.Printf("operation timed out at %s", rec)\n }\n }\n}
\ntime.After
returns a receive only channel of time.Time
elements
After 3 seconds the timeout occurs
The program output is :
\n2021/02/16 17:44:12 operation timed out at 2021-02-16 17:44:12.316518 +0100 CET m=+3.00403174
\n\nA wait group is a synchronization tool provided by the standard library. It can be used to wait for a group of goroutines to finish their tasks.
\n\n// concurrency/wait-group-without/main.go\npackage main\n\nimport (\n "fmt"\n "time"\n)\n\nfunc main() {\n fmt.Printf("Program start \\n")\n for i := 0; i < 10; i++ {\n go concurrentTaks(i)\n }\n finishTask()\n fmt.Printf("Program end \\n")\n\n}\n\nfunc finishTask() {\n fmt.Println("Executing finish task")\n}\n\nfunc concurrentTaks(taskNumber int) {\n fmt.Printf("BEGIN Execute task number %d \\n", taskNumber)\n time.Sleep(100 * time.Millisecond)\n fmt.Printf("END Execute task number %d \\n", taskNumber)\n}
\nWe will launch 10 goroutines that will call the function concurrentTaks(i)
where i is the index of the loop counter.
After launching those goroutines we will call a function finishTask
The previous program will output :
\nProgram start\nExecuting finish task\nProgram end
\nIt seems that we are not even launching our goroutines ! Why ? Thats because launching a goroutine does not block the main goroutine .…
\nWe can modify our source code to add a wait group :
\n// concurrency/wait-group/main.go\npackage main\n\nimport (\n "fmt"\n "sync"\n "time"\n)\n\nfunc main() {\n fmt.Printf("Program start \\n")\n // initialize the wait group\n var waitGroup sync.WaitGroup\n waitGroup.Add(10)\n for i := 0; i < 10; i++ {\n go concurrentTaks(i, &waitGroup)\n }\n waitGroup.Wait()\n finishTask()\n fmt.Printf("Program end \\n")\n\n}\n\nfunc finishTask() {\n fmt.Println("Executing finish task")\n}\n\nfunc concurrentTaks(taskNumber int, waitGroup *sync.WaitGroup) {\n fmt.Printf("BEGIN Execute task number %d \\n", taskNumber)\n time.Sleep(100 * time.Millisecond)\n fmt.Printf("END Execute task number %d \\n", taskNumber)\n waitGroup.Done()\n}
\nThis code snippet will output the following :
\nProgram start\nBEGIN Execute task number 3\nBEGIN Execute task number 9\nBEGIN Execute task number 7\n...\nEND Execute task number 9\nEND Execute task number 7\nEND Execute task number 3\nExecuting finish task\nProgram end
\nWe have the expected result. Our goroutines have executed. Let’s see what have changed.…
\nAt the beginning of the program we have created a variable of type sync.WaitGroup
We indicate to the wait group that he needs to wait for 10 units of work to be done.
The internal counter of the wait group will be incremented.
Then we pass the a pointer to the wait group to our goroutines.
go concurrentTaks(i, &waitGroup)
\nThe function concurrentTaks is also modified (it’s signature, but also its body) :
\nfunc concurrentTaks(taskNumber int, waitGroup *sync.WaitGroup) {\n fmt.Printf("BEGIN Execute task number %d \\n", taskNumber)\n time.Sleep(100 * time.Millisecond)\n fmt.Printf("END Execute task number %d \\n", taskNumber)\n waitGroup.Done()\n}
\nWe are calling the method waitGroup.Done()
at the end of the function.
The Done
method will decrement the internal counter.
When the internal counter has reached 0 (meaning that all our goroutines have finished) the main goroutine is released.7.
\nfor i := 0; i < 10; i++ {\n //...\n}\nwaitGroup.Wait() // we block the main goroutine (to wait for the previous goroutines to finish)\nfinishTask()\nfmt.Printf("Program end \\n")
\n\nvar waitGroup sync.WaitGroup
\nwaitGroup.Add(1)
\nwaitGroup.Done()
\nwaitGroup.Wait()
\nAn e-commmerce website saves each visit in a database. For each visit we have :
\nAn unique id (UUID v4)
\nThe page visited (string)
\nThe date of the visit
A session hash
\nex : 8722b6f78a69aeac3736bfcaa1dd7e4e7a77834dec2adf27e007ed1c998b34df
It is used to group page views by sessions
The marketing team wants to to list the most popular products and their related performance.
\nThey want to have for each date the number of visits for each page :
\nDate : 25 January 2020 :
\n/product-45 : 254 visits
/product-22 : 2345 visits
/home : 2000 visits.
.…
Date : 30 January 2020 :
\n/product-45 : 125 visits
/home : 2000 visits.
...
After querying the database of visits you have extracted a JSON file with the following format :
\n{\n "20-02-2019": [\n {\n "id": "b5c92427-7095-470e-985b-8df7ebe7ed2f",\n "page": "/product-8",\n "sessionHash": "48cd2ad413df8fd263d745275a6a9a95b9cad5374b67d6877c764fb40f29fd0c"\n },\n {\n "id": "bdbd3948-b336-4cb1-badd-390dfe8c0459",\n "page": "/product-1",\n "sessionHash": "c02793f8fb7bb09efb7721aa5962c1b8de809e7ac720463f6ea66c0f4f7957b2"\n },\n {\n "id": "b9a70a12-f6b1-4d7c-ae61-73fcd80b9992",\n "page": "/product-8",\n "sessionHash": "774f6b0f07c2ee4bce93ee36a9678b2f1022cda646ce36b1fc3b3205dee5646a"\n }\n ],\n "19-02-2019": [\n {\n "id": "9c2887c4-19ed-4cce-8998-3fe8b3662fba",\n "page": "/product-6",\n "sessionHash": "5d91fe5022b6186dc1fb81d8f31122fccf0a036f5f381d7dfc298b6aaf889e35"\n },\n {\n "id": "adbf1651-f6bd-453a-97af-c860d153e981",\n "page": "/product-3",\n "sessionHash": "48cd2ad413df8fd263d745275a6a9a95b9cad5374b67d6877c764fb40f29fd0c"\n },\n {\n "id": "80a09881-675f-4569-bdb0-28c6983aaee4",\n "page": "/basket",\n "sessionHash": "5d91fe5022b6186dc1fb81d8f31122fccf0a036f5f381d7dfc298b6aaf889e35"\n }\n ]\n}
\nYou are asked to build a program with the following input/output :
\nINPUT : A JSON file similar to the previous snippet.
\nOUTPUT : the statistics for each day formatted in JSON.
Example OUTPUT :
\n[\n {\n "date": "20-02-2019",\n "byPage": {\n "/product-8": 2,\n "/product-1": 1\n }\n },\n {\n "date": "19-02-2019",\n "byPage": {\n "/basket": 1,\n "/product-3": 1,\n "/product-6": 1\n }\n }\n]
\nYour program should use channels and wait groups.
\n\nYou will need to load the contents of the INPUT file into memory
You will also need to unmarshal the JSON data loaded
Think about how you can split the work into multiple independant tasks.
Create a worker function that will handle a single tasks
Your workers can communicate with your main goroutine through channels
Create a directory
And then launch the command go mod init name/of/your/module to initialize the module
First we create a struct that will allow us to unmarshall the given input :
\n// concurrency/application1/visit/visit.go\npackage visit\n\ntype Visit struct {\n ID string `json:"id"`\n Page string `json:"page"`\n SessionHash string `json:"sessionHash"`\n}
\nThis struct type is placed into a new package visit.
\n// concurrency/application1/main.go\n\ndata, err := ioutil.ReadFile("data.json")\nif err != nil {\n log.Fatal(err)\n}\ndayStats := make(map[string][]visit.Visit)\nerr = json.Unmarshal(data, &dayStats)\nif err != nil {\n log.Fatal(err)\n}
\nThe file data.json is loaded
Next we initialize a map dayStats
that will contain the unmarshaled data
visit.Visit
.json.Unmarshal
parses the input json and stores the result into dayStats
.
We can split the work by date. Each daily load of visits can be treated independently. We create a Task
struct :
type Task struct {\n Date string\n Visits []visit.Visit\n}
\nWe will give to each worker a variable of type Task
through an input channel.
The output of our progam is a slice of daily statistics. We create a type struct DailyStat
to hold the stats for a particular day :
type DailyStat struct {\n Date string `json:"date"`\n ByPage map[string]int `json:"byPage"`\n}
\nThe final output has the type []DailyStat
(a slice of the previous struct)
var w8 sync.WaitGroup\nw8.Add(len(dayStats))
\nWe initialize a wait group: w8
Right after initialization we add to it’s internal counter the value numberOfWorkers
By doing so we prepare our program to wait for each worker to finish
inputCh := make(chan Task, 10)\noutputCh := make(chan DailyStat, len(dayStats))
\ninputCh
will be used to send the tasks to the workers.
This channel has a buffer of 10
\nBy setting the buffer size to 10 we will control the “throughput” (the rate at which our tasks will be processed)
The main goroutine can send 10 task maximum into the channel
When 10 tasks are sent, the main goroutine will block until it can add a new task
outputCh
will be used to communicate the results from workers to the main goroutine
// concurrency/application1/main.go \n\nfunc worker(in chan Task, workerId int, out chan DailyStat, w8 *sync.WaitGroup) {\n for received := range in {\n m := make(map[string]int)\n for _, v := range received.Visits {\n m[v.Page]++\n }\n out <- DailyStat{\n Date: received.Date,\n ByPage: m,\n }\n fmt.Printf("[worker %d] finished task \\n", workerId)\n }\n // when the channel is closed the for loop is exited\n log.Println("worker quit")\n w8.Done()\n}
\nThe worker is a function. It takes as input an input channel and an output channel.
\nWith a for loop we range over message received from the input channel.s
Each message is a variable of type Task
With a map we will count the visits for each page.
\nKey : the page name (ex : /product-8)
Value: the number of visits (ex : 458)
Once the visits are counted we send the result to the output channel.
The for loop will exit when the close signal is sent to the input channel
At the end the wait group counter is decreased by one : w8.Done()
We signal to the main goroutine that one goroutine has finished it’s mission.
// create the workers\nfor k := 0; k < numberOfWorkers; k++ {\n go worker(inputCh, k, outputCh, &w8)\n}\n// send the tasks\nfor date, visits := range dayStats {\n inputCh <- Task{\n Date: date,\n Visits: visits,\n }\n}\n// we say that we will not send any new data on the input channel\nclose(inputCh)\n// wait for all tasks to be completed\nw8.Wait()
\nWe create our workers with a for loop
We iterate over dayStats
to send the tasks to the input channel
We close the input channel when all tasks are sent.
w8.Wait()
will wait for all workers to finish
// when all treatment is finished we close the output channel\nclose(outputCh)\n// collect the result\ndone := make([]DailyStat, 0, len(dayStats))\nfor out := range outputCh {\n done = append(done, out)\n}\n\nres, err := json.Marshal(done)\nif err != nil {\n log.Fatal(err)\n}\nerr = ioutil.WriteFile("results.json", res, 0644)\nif err != nil {\n log.Fatal(err)\n}\nlog.Println("done !")
\nThe last part is simple
We close the output channel
Then we collect the results from the output channel (with a for loop)
Each result is appended to the done
slice
done
is then marshalled to JSON and the result is persisted to the result.json file.
A mutex is a synchronization tool. Mutex is the abbreviation of Mutual Exclusion. Let’s make a detour via logic theory to better understand this concept.
\nWe say that two events are mutually exclusive if they cannot occur at the same time. For example the two events “the bike turn on the left” and “the bike turn on the right” are mutually exclusive. The biker cannot turn left and right at the same time.
\nThis mutual exclusion property is interesting for us. When we build concurrent programs a data race may occur.
\nA data race may occur if two (or more) threads of execution access a shared variable
\nAND when one (or more) thread(s) want to update the variable
AND there is no “explicit mechanism to prevent the accesses from being simultaneous” in the threads
We can avoid a data race by using a mutex. I’ll show you how :
\n\n// concurrency/mutex-bug/main.go\npackage main\n\nimport (\n "fmt"\n "log"\n "net/http"\n)\n\nvar requestCount int\n\nfunc main() {\n http.HandleFunc("/status", status)\n err := http.ListenAndServe(":8090", nil)\n if err != nil {\n log.Fatal(err)\n }\n}\n\nfunc status(w http.ResponseWriter, req *http.Request) {\n requestCount++\n fmt.Fprintf(w, "OK - count : %d \\n", requestCount)\n}
\nWe created here a simple HTTP web server.
It has one single route : \"/status\"
We created a global variable requestCount
of type int
This variable is incremented each time a request is sent to the route \"/status\"
Let’s create another program that will call this server...
\n// concurrency/mutex-server-call/main.go\npackage main\n\nimport (\n "io/ioutil"\n "log"\n "net/http"\n "sync"\n)\n\nfunc main() {\n var w8 sync.WaitGroup\n w8.Add(10)\n for k := 0; k < 10; k++ {\n go caller(&w8)\n }\n w8.Wait()\n}\n\nfunc caller(w8 *sync.WaitGroup) {\n for k := 0; k < 100; k++ {\n res, err := http.Get("http://localhost:8090/status")\n if err != nil {\n log.Fatal(err)\n }\n defer res.Body.Close()\n s, err := ioutil.ReadAll(res.Body)\n if err != nil {\n log.Fatal(err)\n }\n log.Println(string(s))\n }\n w8.Done()\n}
\nThis program will launch 10 goroutines that will each send 100 requests to the server concurrently (10 * 100 = 1,000 requests). Note that we used a wait group to wait for all goroutines to finish.
\nThis program will print the body received at each request. Here are the 3 last lines
\n//...\n2021/02/17 18:43:49 OK - count : 989 \n2021/02/17 18:43:49 OK - count : 990 \n2021/02/17 18:43:49 OK - count : 991
\nWow ! We made 1,000 request but our counter displays 991 ...
\n\nLet’s add a mutex to our previous code
\n// concurrency/mutex-added/main.go\npackage main\n\nimport (\n "fmt"\n "log"\n "net/http"\n "sync"\n)\n\nvar mu sync.Mutex\nvar requestCount int\n\nfunc main() {\n http.HandleFunc("/status", status)\n err := http.ListenAndServe(":8091", nil)\n if err != nil {\n log.Fatal(err)\n }\n}\n\nfunc status(w http.ResponseWriter, req *http.Request) {\n mu.Lock()\n requestCount++\n mu.Unlock()\n fmt.Fprintf(w, "OK - count : %d \\n", requestCount)\n}
\nI created a new global variable named mu
of type sync.Mutex
.
In the status
function we added 2 new lines before and after the incrementation
Before incrementing we call mu.Lock()
After we call mu.Unlock()
If we test the program the counter will work as expected this time. sync.Mutex
API The mutex API is simple. It has 2 exported methods : Lock()
: this method will attempt to acquire the lock, If the lock is already in use by another goroutine it will block until it’s free
Unlock()
: this method will release the lock. It will generate a runtime error if the mutex has not been locked before
There is a pretty common idiom called the Mutex hat8. In our previous example, we created a global mutex. Doing so is not advised. Mutexes are usually used in struct types.
\nIt is common to see a field of type sync.Mutex in a struct. Generally it is placed on top of the shared ressource it has to protect.
\nHere is an example from the standard library (file database/sql/sql.go , package sql)
\ntype DB struct {\n //...\n\n mu sync.Mutex // protects following fields\n freeConn []*driverConn\n connRequests map[uint64]chan connRequest\n nextRequest uint64 // Next key to use in connRequests.\n numOpen int // number of opened and pending open connections\n\n //...\n}
\nHere the mutex mu protects 4 fields. It is placed on top of those fields (like an hat)
\n\nDesigning concurrent programs is not easy. It requires some experience, but also some methodology. In this section we detail some tips that you can use to compose concurrent programs.
\nWe will use some concepts and ideas developped by T. Mattson in it’s excellent book
A task is an operation (that can be composed of sub-operations). A task can be composed of :
\nInput/Output operations
\nReading a file
Opening a tcp connection
A “computational” operation
\nGet the remainder of an integer division
Iterate over a slice to find a specific element index.
A task can be a single function, a method or a group of functions and methods.
\n\nWhen considering a task it can be helpfull to isolate clearly :
\nThe input : the data used in the task
\nEx :
\nA slice of integers
A map
A pointer to a variable
The output : this is what is returned by the task.
\nEx :
\nA modified version of the input data.
The result of a computation
A date, a string, a duration
A dependency is a link between two or more tasks. We can characterize two types of dependencies :
\nOrder dependency : the taks needs to be performed before or after another one.
Data dependency : the execution of two tasks depend on the same piece of data
Usually the order dependency is quite easy to understand and detect. For instance let’s take the example of an ecommerce website. We have isolated the following two tasks :
\nProcessing the payment of the order
Shipping the parcel
It’s obvious that we need to process the payment before sending the goods. There is an order dependency between those two tasks. One task needs to be done before the other one.
\n\nLet’s keep the example of the ecommerce software. Usually this kind of applications store the product details and the stocks for each product. You have the three following tasks :
\nDecreasing the stock when an order has been placed
Reading the stock to display on the front end of the site
Those two tasks depend on the stock data. In the first task (decreading the stock) we update it. In the second one we just read it. Those two operations are dependant on the stock variable.
\nWhat happens if you read the stock and you update it at the exact same time ? You might display to the client wrong stock information. Data dependency cause data races if you do not implement mechanisms to prevent them.
\n\nThe first step is to identify :
\nTasks
Tasks input / output
Dependencies between tasks
The next step is to use those informations to build your program.
\nIf you have no dependencies between your tasks you are in a dream situation. All your tasks can be executed concurrently.
If you notice an order dependency then you can build a chain of workers.
\nLet’s say you have 3 groups of tasks
We can create 3 workers
The workers will execute concurrently but you have to make sure that tasks will be executed in the given order
It can be achieved with 3 channels linking the 3 workers
When you spot a data dependency your program can be subject to data races.
\nOf course those 3 situations may appear at the same time. As a consequence writing a concurrent program might be difficult. However mapping tasks and dependencies certainly help in the design process.
\n\n// concurrency/test-question-1/main.go\npackage main\n\nimport "fmt"\n\nfunc main() {\n go count()\n}\n\nfunc count() {\n for k := 0; k < 10; k++ {\n fmt.Println(k)\n }\n}
\nWhat is the output of program 1?
True or False ? In a Go program you have at least one goroutine.
How to initialize an unbuffered channel of uint8
?
In which case(s) a send operation is blocking ?
In a select statement, in which case(s) the default case is executed ?
True or False ? You should always pass a copy of a wait group to a goroutine, otherwise a data race may occur.
Fill in the blank. The method ____ from sync.WaitGroup
decrements it’s internal counter.
Describe (with your own words) what a Mutex is.
Can you detect data races when you build your programs ?
What is the output of the program 1?
\nNothing
In this program we launch a goroutine.
The main goroutine will NOT WAIT for the secong goroutine to return.
You can use a wait group for instance to wait for the end of the second goroutine
True or False. In a Go program you have at least one goroutine.
\nTrue
It is called the “main goroutine”
How to initialize an unbuffered channel of uint8
?
with the make
builtin
make(chan uint8)
In which case(s) a send operation is blocking ?
\nFor unbuffered channels
\nFor buffered channels
\nIn a select statement, in which case(s) the default case is executed ?
\nA select statement list communication operations
When no communication operation can proceed the default case is executed.
True or False ? You should always pass a copy of a wait group to a goroutine, otherwise a data race may occur.
\nFalse
You should always pass a pointer to a wait group to your goroutines
Fill in the blank. The method ____ from sync.WaitGroup
decrements it’s internal counter.
Done()
Add(-1)
is also valid.
Describe (with your own words) what a Mutex is.
\nA Mutex is a synchronization primitive that is used to synchronize the usage of a shared resource.
It is a protection against data races.
Can you detect data races when you build your programs ?
\nYes !
Go can detect them.
Use the -race
flag when you build your program.
A goroutine can be created with the go
statement
Any function can be executed in a goroutine
\ngo myFunc(a,b)
A data race may occur if two (or more) threads of execution access a shared variable
\nAND when one (or more) thread(s) want to update the variable
AND there is no “explicit mechanism to prevent the accesses from being simultaneous” in the threads
To detect data race you can use the -race flag when building or testing your programs
To protect a shared resource you can use a mutex.
\nvar mu sync.Mutex
A deadlock occurs when two components of a program are waiting for each others
A channel is typed communication pipeline
Goroutines can communicate together thanks to channels
Channels are allocated with the make builtin
\nBidirectional
\nch := make(chan int)
Send only
\nch := make(chan-> int)
Receive only
\nch := make(chan<- int)
A channel can be buffered or unbuffered. The buffer size is set when the channel is created
\nUnbuffered :
\nch := make(chan int)
Buffered :
\nch := make(chan int,8)
The arrow operator is used to send/receive data to/from a channel
\nSend :
\nsyncCh <- true
Receive
\nreceived := <-syncCh
Closing a channel = no more values will be sent on the channel
\nclose(ch)
The builtin len
returns the number of elements queued in a channel.
Select statements are used to choose “which of a set of possible send or receive operations will proceed”9.
\nselect {\ncase rec, ok := <-ch1:\n if ok {\n log.Printf("received on ch1 : %s", rec)\n }\ncase rec, ok := <-ch2:\n if ok {\n log.Printf("received on ch2 : %s", rec)\n }\ndefault:\n log.Println("default case")\n}
\nThe first non blocking operation will be executed.
When all operations are blocking the default case is executed (if present)
sync.WaitGroup
is a handy type struct that can be used to wait for a group of goroutines to finish
// in the main goroutine\nvar w8 sync.WaitGroup\nw8.Add(numberOfWorkers)\n\n// when a worker has finished\nw8.Done()\n\n// blocks until all goroutines are done\nw8.Wait()
\nSource : https://en.wikipedia.org/wiki/Concurrency_(computer_science)↩︎
https://vimeo.com/49718712↩︎
https://golang.org/ref/spec#Close↩︎
Taken from https://golang.org/doc/effective_go#channels↩︎
Source : https://golang.org/ref/spec#Select_statements↩︎
Source : Go Source code documentation↩︎
https://golang.org/src/sync/waitgroup.go↩︎
Identified by Dmitri Shuralyov : https://dmitri.shuralyov.com/idiomatic-go#mutex-hat↩︎
Source : https://golang.org/ref/spec#Select_statements↩︎
Previous
\n\t\t\t\t\t\t\t\t\tData storage : files and databases
\n\t\t\t\t\t\t\t\tNext
\n\t\t\t\t\t\t\t\t\tLogging
\n\t\t\t\t\t\t\t\t