Plasma GitLab Archive
Projects Blog Knowledge

Plasma Project:
Home 
Manual 
FAQ 
Slides 
Perf 
Releases 
======================================================================
Performance updates 2011-10-18
======================================================================

These updates refer to revision 483 of plasma.

The questions here:
 1. How many db commits can the system do per second?
 2. How many db commits are done in a typical map/red run?
 3. What follows for the total capacity of the system?

----------------------------------------------------------------------
1. How many db commits can the system do per second?
----------------------------------------------------------------------

In this Plasma version commits can be parallelized so far different
files are modified by the concurrently running commits.

This continues perf111013, test 5. We run it again with eight
namenode workers serving requests, and seven parallel test clients
generating requests (the other parameters are unchanged):

office3 $ time { ./ps_create -n 10000 /dir1 & ./ps_create -n 10000 /dir2 & ./ps_create -n 10000 /dir3 & ./ps_create -n 10000 /dir4 & ./ps_create -n 10000 /dir5 & ./ps_create -n 10000 /dir6 & ./ps_create -n 10000 /dir7 & wait; }
[1] 4975
[2] 4976
[3] 4977
[4] 4978
[5] 4979
[6] 4980
[7] 4981
Done
Done
[1]   Done                    ./ps_create -n 10000 /dir1
[2]   Done                    ./ps_create -n 10000 /dir2
Done
Done
Done
Done
[3]   Done                    ./ps_create -n 10000 /dir3
[4]   Done                    ./ps_create -n 10000 /dir4
[6]-  Done                    ./ps_create -n 10000 /dir6
[7]+  Done                    ./ps_create -n 10000 /dir7
Done
[5]+  Done                    ./ps_create -n 10000 /dir5

real	3m13.861s
user	0m39.702s
sys	0m6.744s

Each ps_create client creates 10000 files in the given directory, and
each file creation is immediately committed. We get a total
performance of

70000 files / 193.861s = 362 commits/s

That's over the whole distance. In the log file we can find much
higher peak rates of around 550 commits/s! (By counting the commit
messages for the period of one second.)


----------------------------------------------------------------------
2. How many db commits are done in a typical map/red run?
----------------------------------------------------------------------

This continues perf111010. I repeated the run (with parameters according
to run #4), and counted the number of commits by analyzing the log file.

We got:

--------------------------------------------------------
Job name:            mr_cffa64eecebfd154c85ecea909570356
Job ID:              mr_cffa64eecebfd154c85ecea909570356
PlasmaFS input dir:  /input
PlasmaFS output dir: /output
PlasmaFS work dir:   /work
PlasmaFS log dir:    /log
Number task nodes:   2
Task server dir:     /tmp/mapred
--------------------------------------------------------
Checking...
Planning...
Pre-start hook...
Starting...
[Tue Oct 18 16:12:01 2011] [info] Checking executable versions
...
[Tue Oct 18 17:07:49 2011] [info] Finished job
[Tue Oct 18 17:07:49 2011] [info] Number successful tasks:      437
[Tue Oct 18 17:07:49 2011] [info] Number misallocated tasks:    9
[Tue Oct 18 17:07:49 2011] [info] Number input files:           830
[Tue Oct 18 17:07:49 2011] [info] Number non-local input files: 157
[Tue Oct 18 17:07:49 2011] [info] Avg number of running tasks:  10.098553
[Tue Oct 18 17:07:49 2011] [info] Wallclock runtime:            3348.170546 seconds
Job_status: successful


The important number is now: This run needed 7399 commits. This means
(we have two nodes with four CPU cores each):

Number of commits:                 7399
Number of commits/second:          2.21
Number of commits/second and core: 0.28

The number of transactions depends on the amount of RAM. The machines
had 8 G of RAM, and we used a value for buffer_size of 256 M. By
extending the buffers one can reduce the number of transactions
because more data can be read and written in one go. Note that there
is practically no dependency on the size of the data blocks.

Also note that this run created 2195 log files, and according to the
code each log file is written in 2 transactions. This means more than
half of the commits was used for log files. (This will be improved in
a future Plasma version.)

----------------------------------------------------------------------
3. What follows for the total capacity of the system?
----------------------------------------------------------------------

So the total capacity of this system under ideal conditions would be:

 - 550 commits/s namenode capacity
 - 0.28 commits/s needed for each CPU core
 - ==> 550/0.28 = 1964 cores

Or around 500 of the used quad-core machines!

It is very likely that one runs into other bottlenecks first,
especially network bandwidth becomes quickly a scarce resource when a
network is scaled up.

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml