Puppet and Cfengine compared: time and resource consumption

D u r i n g t h e m a s t e r ’ s p r o g r a m at Jarle Bjørgeengen Puppet and Cfengine compared: time and resource consumption Jarle Bjørgeengen ha...
Author: Elvin Simmons
1 downloads 0 Views 394KB Size
D u r i n g t h e m a s t e r ’ s p r o g r a m at Jarle Bjørgeengen

Puppet and Cfengine compared: time and resource consumption Jarle Bjørgeengen has been working with UNIXes, storage, and HA clusters for about a decade. He is now a master’s student at Oslo University College, and works for the UNIX group of the central IT department (USIT) of the University of Oslo. [email protected]

Oslo University College, I was required to define and carry out a set of scientific experiments. As both graduate student and member of a project evaluating configuration management tools for the University of Oslo’s IT department [9], I was looking for an experiment that would serve both of these roles. The project was about evaluating Cfengine [3] and Puppet [5] against our existing script-based solution. My experiment needed to fulfill the criteria of scientific measurability and bringing a new and valuable approach to the decision-making process. I chose to compare time and resource consumption in Puppet and Cfengine 3 tools when carrying out identical configuration checks and actions. This would be useful information to have for a tool to be used on a large scale, since the time and resources used limit the scope of managed configuration during a certain period. Time usage for the state compliance verification process is of particular interest, since it will affect the frequency and/or scope of the configuration verified.

Equipment and Tool Setup The experiment was run on a PC with the following specifications: ■■ ■■ ■■ ■■

■■

CentOS 5.2 , 2.6.18-92.1.22.el i686 MSI MS-6380E (VIA KT333 based) motherboard 1024 MB RAM AMD Athlon XP2200 (1.8GHz / 266MHz FSB) 256KB cache Quantum fireball 7200 rpm / 30GB / 58169 Cyl / 16 Head / 63 sectors

In both tools it is the configuration agent that does the job of converging to desired state. In Puppet a server component is mandatory, since it is the server that provides the agent with its configuration. The latest stable version of Puppet at the time the experiment was done was version 0.24.7 and that is the version used in the experiment. Cfengine’s configuration agent is independent of a server component, but can be configured to be used with a server component if desired. Cfengine version 3.0.1b3 was downloaded and was compiled according to the installation part of the reference manual [2].

16

; L O G I N : VO L . 3 5, N O. 1

Installing Puppet is easy on CentOS 5 using the EPEL [1] yum repository. # rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm # yum install puppet # yum install puppet-server # service puppetmaster start # chkconfig puppetmaster on

This also automatically resolves and installs the dependencies needed for Puppet to work (ruby-libs, ruby, facter, ruby-shadow, augeas-libs, rubyaugeas). Installing Cfengine 3 took a bit more effort: Dependencies: # yum install db4-devel # yum install byacc # yum install openssl-devel # yum install flex # yum install gcc

Cfengine: # wget http://www.cfengine.org/downloads/cfengine-3.0.1b3.tar.gz # tar zxvf cfengine-3.0.1b3.tar.gz # cd cfengine-3.0.1b3 # ./configure && make && make install && cp /usr/local/sbin/cf-* /var/cfengine/ bin/

Methodology The workload chosen for the measurements reflects comparable activities of a configuration management tool. There are three main categories of work: ■■ ■■ ■■

File permissions File contents /etc/hosts entries

The host entries measurement differs a little bit from the file permission and file content workloads. This is because, in Puppet, host entries means a special type of resource, which ends up as file edits to the /etc/hosts file in the end. In Cfengine this is just one form of file-editing operation directly in the configuration language. This difference might make the /etc/hosts entries workload less directly comparable than the file permission and content workloads. Each type of work was measured in two ways: 1. With a known deviation from the tool’s configuration applied up front. Hence the tool will need to converge the deviation to compliance. 2. With the tool’s configuration applied up front. Hence the tool will do verification only. This expands to six measurements for each tool, and each measurement was repeated 40 times. All measurements and application of preconditions for each measurement were done in a shell script which ran the same measurements 40 times. All workload combinations measured are summarized in the following table:

; L O G I N : F e b rua ry 2 0 1 0

P u pp e t a n d C f e ngin e Co m pa r e d

17

No.

Type

Converge or Verify

Tool

 1

Permissions

Converge

cf3

 2

Permissions

Converge

puppet

 3

Content

Converge

cf3

 4

Content

Converge

puppet

 5

Hosts records

Converge

cf3

 6

Hosts records

Converge

puppet

 7

Permissions

Verify

cf3

 8

Permissions

Verify

puppet

 9

Content

Verify

cf3

10

Content

Verify

puppet

11

Hosts records

Verify

cf3

12

Hosts records

Verify

puppet

configfile cf3-perms.cf puppet-perms.cf cf3-content.cf puppet-content.cf cf3-hosts.cf puppet-hosts.cf cf3-perms.cf puppet-perms.cf cf3-content.cf puppet-content.cf cf3-hosts.cf puppet-hosts.cf

Each amount of work was scaled up such that it was possible to actually measure some resource and time consumption for both tools. For the file permissions and file content tests, a file tree with known content and permissions was created under a test directory /var/tmp/file_tests. The test file tree was made each time as follows: TESTDIR=/var/tmp/file_tests if [[ -d $TESTDIR ]] then rm -rf $TESTDIR fi for i in `seq 1 10` do mkdir -p $TESTDIR/dir_$i for j in `seq 1 10` do echo “Some testline” > $TESTDIR/dir_$i/file_$j done done

Run as root, this creates directories with ownership root:root and mode 755 and files with ownership root:root and mode 644. For the file permissions, the tools’ work was to converge from the default permissions to ownership root:bin and mode 755 for all files and directories. Here’s the Puppet configuration for applying file permissions (puppet-perms. cf): class fix_perms { file { “/var/tmp/file_tests”: owner => “root”, group => “bin”, mode => 0755, recurse => inf, backup => false, } } 18

; L O G I N : VO L . 3 5, N O. 1

node localhost { include fix_perms }

And here’s the Cfengine configuration for applying file permissions (cf3-perms.cf): body common control { bundlesequence => { “perms” }; } bundle agent perms { files: “/var/tmp/file_tests/” pathtype => “literal”, perms => passthrough(“0755”,”root”,”bin”), depth_search => recurse(“inf”); } body depth_search recurse(d) { depth => “$(d)”; } body perms passthrough(m,o,g) { mode => “$(m)”; owners => { “$(o)” }; groups => { “$(g)” } ; }

For file content, the tools’ work was to ensure some particular content in the same files. Here’s the Puppet configuration for applying file content (puppet-content.cf): class fix_contents { file { “/var/tmp/file_tests”: content => “Trallala\n”, recurse => inf, backup => false, } } node localhost { include fix_contents }

And here’s the Cfengine configuration for applying file content (cf3-content.cf): body common control { bundlesequence => { “content” }; } bundle agent content { files:

; L O G I N : F e b rua ry 2 0 1 0

P u pp e t a n d C f e ngin e Co m pa r e d

19

“/var/tmp/file_tests/.*/.*” edit_defaults => no_backup, edit_line => en_liten_trall; } bundle edit_line en_liten_trall { delete_lines: “.*”; insert_lines: “Trallala”; } body edit_defaults no_backup { edit_backup => “false”; }

For host entries, the tools’ work was to ensure that all specified host-to-IP mappings were present in /etc/hosts. Here’s the Puppet configuration for applying host entries (puppet-hosts.cf): class my_hosts { host { “private1.localdomain.com”: ip => “10.0.0.1”, ensure => present, } host { “private2.localdomain.com”: ip => “10.0.0.2”, ensure => present, } (...) host { “private254.localdomain.com”: ip => “10.0.0.254”, ensure => present, } } node localhost { include my_hosts }

And here’s the Cfengine configuration for applying host entries (cf3-hosts.cf): body common control { bundlesequence => { “hosts” }; } bundle agent hosts { vars: “my_hosts” slist => { “10.0.0.1 private1.localdomain.com”,

(...)

“10.0.0.253 private253.localdomain.com”, “10.0.0.254 private254.localdomain.com” 20

; L O G I N : VO L . 3 5, N O. 1

}; files: “/etc/hosts” edit_defaults => no_backup, edit_line => host_ensure(“@(hosts.my_hosts)”); } bundle edit_line host_ensure(record) { insert_lines: “$(record)”; } body edit_defaults no_backup { edit_backup => “false”; }

When measuring verification time, the script converged configuration of the tool before taking the measurements. This way it is unlikely that the precondition is anything other than the desired state in the tool configuration; hence the measured values when doing this are verification only. Application of the configurations with Puppet was done like this: /usr/sbin/puppetd --no-daemonize --onetime

Application of the configurations with Cfengine was done like this: /var/cfengine/bin/cf-agent --no-lock

To have as close to equal starting points as possible for the tests, all block device buffers were dropped before each test by using: sysctl -w vm.dropcaches = 3

The Scientific Method When measuring alternatives there will be uncertainty in the measurements. The uncertainty is defined as error, or noise. The total error was quantified using repeated measurements and statistical methods for finding the confidence intervals of the differences between alternatives. Confidence intervals of the differences comes with a probability of the true value being inside the interval. For the confidence interval to have any utility, the probability that the true value is inside the interval must be high. Higher probability widens the confidence interval, and vice versa. A commonly chosen value of this probability is 0.95. If confidence intervals of the two alternatives overlap, it is impossible to say that the difference is not caused by random fluctuations. If they don’t overlap, there is no evidence to suggest that there is not a statistically significant difference. The chosen probability quantifies the certainty of being right in assuming there is a true difference [10, p. 43].

; L O G I N : F e b rua ry 2 0 1 0

P u pp e t a n d C f e ngin e Co m pa r e d

21

Plotting the values shows random variations, but the true standard deviation of the underlying population is not known. The student’s t distribution takes this into account and is commonly used for detecting statistically significant differences between two alternatives [8]. The student’s t distribution, or more precisely the t-test, was used to identify any statistically significant differences in the experiment results. The tool “gnu time” [4] (version 1.7 release 27.2.2) was used for measuring time and resource consumption of each operation. For each measurement, three metrics were logged: ■■ ■■ ■■

Total time used Number of CPU seconds (system + user) Number of involuntary context switches

The values were logged in semicolon-separated files, one for each $measurement-$tool containing 40 lines of data. Each column of measurements then represents one metric (time, CPU, or CSWITCH) measurement repeated 40 times. These vectors were used for calculating sample means and confidence intervals using the free statistic program R [6]. For all 12x3 metrics, the t-test functions of R were used to compare pairs of vectors of 40 numbers, each representing the measured values of each tool, respectively. The outcome of each comparison is the sample mean of the difference between vectors and its confidence interval given a certain probability that the true value is within the interval. The probability used for the tests was 0.99, meaning there is 99% probability that the true value of the sample mean of the differences is within the confidence intervals produced by the t-tests. All t-tests were done as follows: diff = t-test(puppet_vector,cfengine_vector,conf.level = 0.99)

and the return values are fetched out as follows in R: c(diff$estimate[1],diff$estimate[2],diff$conf.int[1],diff$estimate[1] diff$estimate[2] , diff$conf.int[2])

The five values produced from the two statements above correspond to columns 3–7 in the results table.

Results The output of the eighteen t-tests in R is summarized in the following table. Table legend: 1. Type of workload: Permission/Content/Host Converge/Verify 2. Resource/time measurement 3. Sample mean value for Puppet 4. Sample mean value for Cfengine 5. Start of the confidence interval of the sample mean difference (C1) 6. Sample mean difference 7. End of the confidence interval of the sample mean difference (C2)

22

; L O G I N : VO L . 3 5, N O. 1

Workload

Measurement

Puppet mean

Cf. mean

C1

Mean difference

C2

Permissions conv.

Execution time

14.80s

1.18s

13.54s

13.63s

13.72s

Content conv.e

Execution time

14.85s

1.43s

13.24s

13.42s

13.6s

Hosts conv.

Execution time

28.44s

1.21s

26.19s

27.23s

28.27s

Permissions ver.

Execution time

14.28s

1.25s

12.82s

13.04s

13.25s

Content ver.

Execution time

14.32s

1.23s

12.92s

13.09s

13.27s

Hosts ver.

Execution time

21.52s

1.11s

20.30s

20.41s

20.53s

Permissions conv.

cpu seconds

11.03s

0.24s

10.77s

10.79s

10.81s

Content conv.

cpu seconds

11.03s

0.36s

10.64s

10.67s

10.69s

Hosts conv.

cpu seconds

10.50s

0.32s

10.16s

10.18s

10.19s

Permissions ver.

cpu seconds

11.04s

0.23s

10.78s

10.8s

10.8s

Content ver.

cpu seconds

11.03s

0.34s

10.67s

10.69s

10.72s

Hosts ver.

cpu seconds

17.97s

0.32s

17.61s

17.65s

17.68s

Permissions conv.

forced cswitch

1861

150

1595

1711

1827

Content conv.

forced cswitch

1918

159

1644

1759

1874

Hosts conv.

forced cswitch

3194

155

2929

3039

3148

Permissions ver.

forced cswitch

1933

151

1675

1782

1889

Content ver.

forced cswitch

1967

160

1708

1808

1907

Hosts ver.

forced cswitch

3186

159

2913

3027

3142

The following graphs are produced by the package Sciplot [7] in R. The error bars show the 99% confidence interval of each sample mean.

F i g u r e 1 : T i m e u s a g e fo r t h e s i x t a s k s

; L O G I N : F e b rua ry 2 0 1 0

P u pp e t a n d C f e ngin e Co m pa r e d

23

F i g u r e 2 : C P U s e co n d s u s e d

F i g u r e 3 : N u m b e r of i n vol u n t a r y co n t e x t s w i tc h e s

Conclusion The results show that Puppet uses considerably more time and resources than Cfengine3 for all the measurements included in the experiment. Time and resource usage is important, particularly in the verification phase. Verification is done every time the agent runs, regardless of compliance. The maximum frequency of verifications will be affected by time usage for each verification. The scope of the verified configuration in the experiment is small compared to what can be the case in real production environments. Clearly, time usage of verifications will limit the frequency of verifications when the scope increases. Generally it is also desirable to have low resource consumption on administrative processes that run regularly, both from an environmental and from a capacity point of view.

24

; L O G I N : VO L . 3 5, N O. 1

The experiment shows that usage of Puppet involves a major tradeoff with respect to time and resource consumption compared to Cfengine3 for the operations that were measured. Of course, there are many factors to consider when choosing configuration management tools. The differences of time and resource consumption might be ignorable to some. The results of this experiment serve as a supplement, broadening the understanding of the differences between these two popular configuration management tools. references

[1] http://fedoraproject.org/wiki/EPEL. [2] http://www.cfengine.org/manuals/cf3-reference.html. [3] http://cfengine.com/pages/whatIsCfengine. [4] http://www.gnu.org/software/time/. [5] http://reductivelabs.com/products/puppet/. [6] http://www.r-project.org. [7] http://cran.r-project.org/web/packages/sciplot/index.html. [8] http://en.wikipedia.org/wiki/Student’s_t-distribution. [9] http://www.usit.uio.no. [10] D.J. Lilja, Measuring Computer Performance: A Practitioner’s Guide (Cambridge University Press, 2000).

; L O G I N : F e b rua ry 2 0 1 0

P u pp e t a n d C f e ngin e Co m pa r e d

25