February 11, 2008
Say you are in a situation where you have a file with a huge number of records to be processed and the processing of one record does not need data from the processing of previous records (ie. a perfectly paralellizable situation), what can you do to speed up things? Well, here’s what I did when a client recently made a request for statistical data for 300K records instead of his usual request of 20 records as the program that we had earlier made for the purpose wasn’t really made to run fast and wasn’t multi threaded.
Use the “split” command to split a file by number of lines into an appropriate number of files depending upon the configuration of your hardware. Use the “-l” option to specify the number of lines in each file. Then run multiple instances of your program to process the different files in parallel using an “&”. Use “wait” to wait for all background tasks to end. And finally when things are done, merge the different output files together with “cat” in append mode. Voila! Things finish much, much faster.
I used the above steps to make 20 files with 15K lines each since the server I was running the script on was a Sun Solaris 10 T2000 system with 32GB RAM which has an octa-core processor supposedly capable of running 32 threads in parallel. It worked like a charm!
Sample script follows:
split -l 15000 originalFile.txt
for f in x*
do
runDataProcessor $f > $f.out &
done
wait
for k in *.out
do
cat $k >> combinedResult.txt
done
3 Comments |
Code, Tips | Tagged: Linux, Shell script, Solaris, Threads, Unix |
Permalink
Posted by Onkar Joshi
February 1, 2008
Here is a very simple and straightforward benchmark to demonstrate how synchronization in Java can affect speed of execution to different extents in Java 1.4 and Java 6.

In the attached image (click above for full size), you can see that for Java 1.4 the synchronized method needs over 700% more time to do its work compared to the non-synchronized method. But for Java 6, the difference is lower with the synchronized method needing only 400% more time to do its work compared to the non-synchronized method.
Remember, JVMs, especially newer ones do some nifty runtime optimizations that most developers are not aware of at all – like “biased locking”. This feature reduces the time taken for reacquiring a lock if the same thread is taking the lock repeatedly. There are other features like “adaptive spinning” and “lock coarsening” as well. Ofcourse, the fact is that it might just be other optimizations that are at work here instead.
Such a simple benchmark does not deserve to have any conclusions drawn from it. Infact the effect may have been greatly amplified here because the program does only an increment within the function so locking becomes the bottleneck. Such a situation may ofcourse be true in “real world” programs as well. But quite often, it is not.
I obtained this result on the single core 2.8Ghz PIV system at my desk at work.
Ideally, one should be using a multi-processor machine and multiple threads to demonstrate such effects. I would expect Java 5 to show results between those for 1.4 and 6.
3 Comments |
Code, Java, Tips | Tagged: Benchmark, Java, JVM, Lock, Optimization, Speed, Synchronization, Threads |
Permalink
Posted by Onkar Joshi