Exploring Concurrency Models: Introduction
Preface: The software development has changed a lot in last 10 years, where the big focus is on performance and advent of multicore cpus, has supported the need/advent of concurrent programs. This blog series is an attempt to understand, and to benchmark various concurrency models prevelant in the industry. I have used Java as my primary language, but some of the concurrency models are not natively present in Java(some of them still can be written, but they will be pain to reproduce and unoptimised), so I had to use other languages too someplaces.
Benchmarking
I have written microbenchmarks for the codes in this series. All the codes are available at github at:-https://github.com/mangatmodi. Please note that the microbenchmarks are seldom accurate and are impacted by many environment factors like - current stress on the machine, network latency, caching of data at many levels. Affects of these factors can be minimised to some extent by following practices:-
- Its a good practice to warm up the machine for Java and other JIT based languages, as it will eliminate the compilation time in the computation.
- Run the tests multiple times at different times and then take average, it will help to normalise noise in the numbers due to external factors.
- Remove the results which lies outside 95% confidence interval, to eliminate the outliers.
Java Performance: The Definitive Guide by Scott Oaks is an excellent text covering the basics and setup of the microbenchmarks. Again I have not tried to go very technical with the accuracy of the benchmarks as the focus of this blog series is to study and implement difference between the concurrency models. A comparative result of benchmarks will give a rought estimate of what kind of model excels in which problems.
Finding the perfect problem!
In theory any problem which can solved with divide and conquer strategy can be exploited for concurrency. I have selected following two problems to implement :-
MergeSort: It is a classic divide and conquer algorithm, where you simply divide the array into smaller arrays until they are trivially sorted, and then you merge. The worst time complexity is O(nlog n), however with perfect concurrency model, it is equal to the number of levels of recursion, as each the steps in the same recursive level are . So the complexity reduces to O(log n). Below figure shows the recursion tree here.
WebScrap + Count: Another popular problem with concurrent engineers, where give a url(root) —
- We download the webpage and count the number of words.
- Extract the links in the text referring to same domain.
- Do it recursively for the newly found links in the previous step.
The tasks -1,2 can be done parallely with each other and between the links which don’t exist in sam heirarchy. The solution will have recursion tree significantly wider than mergesort, and a single step for merge(count).
This blog is first part of the series, in next blog we will be looking at our first concurrency model also we will see the challenges and hidden issues in the solution of the above problems.