benchmarking

Posts Tagged ‘benchmarking’

Why Micro-Benchmark?

Posted in Perl, tagged benchmarking, databases on October 25, 2010| 2 Comments »

I have had some feedback along the lines of why write a micro-benchmark such as Tokyo Cabinet vs Berkeley DB. Everyone knows that:

"premature optimization is the root of all evil"

and

“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

That kinda misses the point. Micro-benchmarking is nothing to do with optimization. Its purpose is to avoid premature pessimization due to choosing the wrong technology.

I have a project to replace an existing system where the backing store is by far the biggest bottleneck. The new system needs to be signicantly faster than the existing system and I have a lot of flexibility in choosing the backing store.

The two alternatives that immediately come to mind are.

create a nice abstraction layer that can easily switch between Perl DBI, Redis, Tokyo Cabinet and a bunch of other alternatives… or
micro-benchmark them with something similar to my read/write data profile.

Time pressure unfortunately compels me to choose #2. Maybe it’s time to take a second look at KiokuDB, for inspiration at least, even if I can’t use it directly.

Read Full Post »

Tokyo Cabinet vs Berkeley DB

Posted in Perl, tagged benchmarking, berkeley db, tokyo cabinet on October 14, 2010| 4 Comments »

In response to my Berkeley DB benchmarking post, Pedro Melo points out that Tokyo Cabinet is faster and that JSON::XS is faster than Storable.

I couldn’t find an up to date Ubuntu package that included the TC perl libraries so I had to build everything from source. It was pretty straightforward though.

First we need to get the database handle.

my $tc_file = "$ENV{HOME}/test.tc";
unlink $tc_file;
my $hdb = TokyoCabinet::HDB->new();

if(!$hdb->open($tc_file, $hdb->OWRITER | $hdb->OCREAT)){
    my $ecode = $hdb->ecode();
    printf STDERR ("open error: %s\n", $hdb->errmsg($ecode));
}

Presumably putasync is the fastest database put method.

my $ORDER_ID = 0;

sub store_record_tc
{
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->putasync($key, $json ? encode_json($ref_record)
                              : Storable::freeze($ref_record));
}

I needed to amend store_record to compare json and storable too.

sub store_record
{
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->db_put($key, $json ? encode_json($ref_record)
                            : Storable::freeze($ref_record));
    $db->db_sync() unless $no_sync;
}

The benchmarking code looks like this.

Benchmark::cmpthese(-1, {
    'json-only-50/50' => sub { json_only($db, $rec_50_50) },
    'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },

    'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
    'freeze-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1) },

    'json-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1, 1) },
    'json-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1, 1) },
});

And the results are as follows:

        Rate freeze-no-sync-50/50 json-no-sync-50/50 freeze-no-sync-50/50-tc json-no-sync-50/50-tc freeze-only-50/50 json-only-50/50
freeze-no-sync-50/50     7791/s                   --                -9%                    -39%                  -47%              -59%            -81%
json-no-sync-50/50       8605/s                  10%                 --                    -33%                  -41%              -55%            -79%
freeze-no-sync-50/50-tc 12800/s                  64%                49%                      --                  -13%              -33%            -69%
json-no-sync-50/50-tc   14698/s                  89%                71%                     15%                    --              -23%            -64%
freeze-only-50/50       19166/s                 146%               123%                     50%                   30%                --            -54%
json-only-50/50         41353/s                 431%               381%                    223%                  181%              116%              --

Pedro was right. Tokyo Cabinet is significantly faster than Berkeley DB, at least in this simple benchmark.

Edit: json and no_sync parameter switch has been fixed.

Read Full Post »

In Which Redis Guy is Unfair To Sys Toilet Guy

Posted in Programming, tagged benchmarking, redis on September 24, 2010| 2 Comments »

I’m a big fan of Antirez. I’ve been subscribed to his blog for a while.

I’m also a big fan of shoddy benchmarks. Maybe that makes me biased.

So, the sys/toilet guy writes a post benchmarking Redis vs Memcached and concludes that Memcached is faster and Redis doesn’t do so well at setup and teardown.

Antirez responds: (emphasis mine)

Unfortunately I also think that our dude is part of the problem, since running crappy benchmarks is not going to help our pop culture.

Okay, fine. Sys/Toilet guy used some robust language in his post. Antirez has every right to respond in kind.

But the thing is…

Still this benchmark captured my interest. Even with all this problems, why on the earth Redis was showing so low numbers in multi-get (MGET) operations?

I checked the implementation better and found a problem in the query parsing code that resulted in a quadratic algorithm due to a wrong positioned strchr() call.

Hmmm… sounds like the “crappy” benchmark helped. And what’s more, there is already a (presumably decent) Redis benchmark that didn’t pick up the issue.

In Redis land we already have redis-benchmark that assisted us in the course of the latest one and half years, and worked as a great advertising tool: we were able to show that our system was performing well.

So, sys/toilet guy, thanks for making Redis better. I appreciate it.

Read Full Post »

That Syncing Feeling – Safety Is Expensive

Posted in Perl, tagged benchmarking, berkeley db on September 21, 2010| 4 Comments »

Recently I’ve been thinking about storing records on disk quickly. For my general use case, an RDBMS isn’t quite fast enough. My first thought, probably like many a NoSQL person before me, is how fast can I go if I give up ACID?

Even if I’m intending to do the final implementation in C++, I’ll often experiment with Perl first. It’s often possible to use the C libraries anyway.

First up, how about serialising records using Storable to an on-disk hash table like Berkeley DB.

(Aside: I’m probably going to appear obsessed with benchmarking now, but really I’m just sticking a finger in the air to get an idea about how various approaches perform. I can estimate 90cm given a metre stick. I don’t need a more precise way to do a rough estimate.)

use Storable;
use Benchmark qw(:hireswallclock);

use BerkeleyDB;

I need to make a random record to store in the DB.

sub make_record
{
    my ($order_id, $fields, $key_len, $val_len) = @_;

    my %record;
    $record{'order_id'} = $order_id;

    for my $field_no (1..$fields-1) {
        my $key = 'key';
        my $val = 'val';
        $key .= chr(65 + rand(26)) for (1..$key_len - length($key));
        $val .= chr(65 + rand(26)) for (1..$val_len - length($val));
        $record{$key} = $val;
        print "$key -> $val\n";
    }
    return \%record;
}

And a wrapper handles the general case I’ll be testing – key and value equal length and order_id starting at 1.

sub rec
{
    my ($fields, $len) = @_;
    return make_record(1, $fields, $len, $len);
}

I’ll compare serialisation only against actually storing the data to disk to see what the upper limit I could achieve is if, for example, I was using an SSD.

sub freeze_only
{
    my ($db, $ref_record, $no_sync) = @_;
    $no_sync //= 0;
    my $key = "/order/$ref_record->{'order_id'}";
    Storable::freeze($ref_record);
}

And I’m curious to know how much overhead syncing to disk adds.

my $ORDER_ID = 0;

sub store_record
{
    my ($db, $ref_record, $no_sync) = @_;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->db_put($key, Storable::freeze($ref_record));
    $db->db_sync() unless $no_sync;
}

The Test Program

A record with 50 fields, each of size 50 seems reasonable.

my $filename = "$ENV{HOME}/test.db";
unlink $filename;

my $db = new BerkeleyDB::Hash
    -Filename => $filename,
    -Flags    => DB_CREATE
    or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;

my $rec_50_50 = rec(50, 50);

Benchmark::cmpthese(-1, {
    'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },
    'freeze-sync-50/50' => sub { store_record($db, $rec_50_50) },
    'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
});

The Results

                        Rate freeze-sync-50/50 freeze-no-sync-50/50 freeze-only-50/50
freeze-sync-50/50     1543/s                --                 -80%              -93%
freeze-no-sync-50/50  7696/s              399%                   --              -63%
freeze-only-50/50    21081/s             1267%                 174%                --

Conclusion

Unsurprisingly syncing is expensive – it adds 400% overhead. However, even with the sync, we’re still able to store 5.5 million records an hour. Is that fast enough for me? (I need some level of reliability) It might well be.

Berkeley DB is fast. It only adds 170% overhead to the serialisation itself. I’m impressed.

In case anyone is interested. I ran a more comprehensive set of benchmarks.

freeze-sync-50/50       1846/s
freeze-sync-50/05       2262/s
freeze-sync-05/50       2546/s
freeze-sync-05/05       2799/s
freeze-no-sync-50/50    7313/s
freeze-no-sync-50/05    9514/s
freeze-no-sync-05/50   11395/s
freeze-no-sync-05/05   12589/s
freeze-only-50/50      20031/s
freeze-only-50/05      21920/s
freeze-only-05/05      26547/s
freeze-only-05/50      26547/s
fcall-only           2975364/s

Read Full Post »

Perl Compiler Speed – It’s Fast!

Posted in Perl, tagged benchmarking on August 31, 2010| 3 Comments »

As a developer who is not responsible for many infrastructure modules, I shouldn’t really waste so much of time micro-benchmarking. Profiling my code after the fact should be sufficient if even that is necessary.

But it seems I’ve got into a bad habit.

I’m not the only person that thinks that MooseX loading all those pre-requisites is what is taking a lot of time. Dave Rolsky says in the latest Moose blog post:

What’s a bit sad, however, is that this only appears to save 6-10% of the compilation time in real usage. I’m not sure why that is, but I think the issue is that the perl core’s compilation time may be a big factor. As I said, loading that one Markdent modules loads a lot of modules (203 individual .pm files), and the raw overhead of simply reading all those files and compiling them may be a significant chunk of the compilation time.

That gets me thinking about the speed of the perl compilation step. Although using eval doesn’t have the same disk IO as use, it still exercises the compiler.

Loop Unrolling with Eval

Does anyone remember back in the day when we used to unroll loops? For example, if we had an instruction sequence that was do_something, add, jne and do_something was relatively short, you could save some time unrolling that to do_something, add, do_something, add, jne – you would only have to execute approximately half as many jne instructions across a loop lifetime.

That is something I never worry about in perl. Unless I have to come up with contrived examples for my blog.

sub eval_loop
{
    my $count = 0;
    my $s = '';
    for (my $i = 0; $i < 100_000; ++$i) {
        $s .= '$count += ' . $i . ";\n";
    }
    eval $s;
    $count;
}

sub vanilla_loop
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
    }
    $count;
}

my $code = '';
for (my $i = 0; $i < 100_000; ++$i) {
    $code .= '$count += ' . $i . ";\n";
}

$code = '
sub totally_unrolled {
    my $count = 0;' . "\n" . $code . '
    $count;
};';

eval $code;

eval_loop compiles the unrolled loop every time the subroutine is called whereas totally_unrolled is compiled prior to benchmarking.

I also wanted some partially unrolled loops for comparison sake. Obviously these examples are not suitable for production use. For example, consider what would happen if unrolled_loop_4 needed to execute a number of times that was not a multiple of 4.

sub unrolled_loop_2
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
    }
    $count;
}

sub unrolled_loop_4
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
        $count += ++$i;
        $count += ++$i;
    }
    $count;
}

The full benchmarking code is as follows. I test the result of each function to ensure I’m getting the same answer.

use Benchmark qw(:hireswallclock);

use strict;
use warnings;

sub vanilla_loop
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
    }
    $count;
}

sub unrolled_loop_2
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
    }
    $count;
}

sub unrolled_loop_4
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
        $count += ++$i;
        $count += ++$i;
    }
    $count;
}


sub eval_loop
{
    my $count = 0;
    my $s = '';
    for (my $i = 0; $i < 100_000; ++$i) {
        $s .= '$count += ' . $i . ";\n";
    }
    eval $s;
    $count;
}

my $code = '';
for (my $i = 0; $i < 100_000; ++$i) {
    $code .= '$count += ' . $i . ";\n";
}

$code = '
sub totally_unrolled {
    my $count = 0;' . "\n" . $code . '
    $count;
};';

eval $code;

print vanilla_loop(), "\n";
print unrolled_loop_2(), "\n";
print unrolled_loop_4(), "\n";
print eval_loop(), "\n";
print totally_unrolled(), "\n";

Benchmark::cmpthese(-1, {
    'vanilla' => \&vanilla_loop,
    'unrolled-2' => \&unrolled_loop_2,
    'unrolled-4' => \&unrolled_loop_4,
    'eval' => \&eval_loop,
    'unrolled-max' => \&totally_unrolled,
});

And the results are:

$ perl unrolling.pl
4999950000
4999950000
4999950000
4999950000
4999950000
               Rate        eval  unrolled-2  unrolled-4     vanilla unrolled-max
eval         3.25/s          --        -93%        -95%        -95%         -96%
unrolled-2   48.1/s       1381%          --        -25%        -26%         -45%
unrolled-4   63.9/s       1865%         33%          --         -2%         -27%
vanilla      65.1/s       1902%         35%          2%          --         -26%
unrolled-max 87.7/s       2598%         82%         37%         35%           --

Conclusions

I am actually surprised that eval is as fast as it is. It runs less than 20 times slower than the vanilla loop, yet it has to construct a string by appending to it 100,000 times and run the compiler on the result. Or to put it another way, it compiles and runs 100,000 lines of code in under a third of a second. Good job perl compiler guys! Or did you put something in there for patholgically ridiculous examples like this?

And I was right never to worry about unrolling my loops. unrolled-2 is quite significantly slower than vanilla. As I can’t think of any good reason for that, it makes me worried that I’ve done something wrong.

Even fully unrolling the loop, which is quite ridiculous gives me a relatively tiny 35% speed increase.

Related posts:

Read Full Post »

Spurious Benchmark.pm Warnings

Posted in Perl, tagged benchmarking, moose on August 25, 2010| 5 Comments »

My most recent post has a few comments, but because I’ve been on holiday I didn’t reply to them in a timely manner. Sorry about that folks, I did read all of them.

And time has moved on so if I reply to those comments in the comments section, I’m guessing the original commenters won’t read the response. Having said that, they might not read this post either, but I think that is less likely.

Therefore this is a post to respond to the commenters, and to keep up with the IronMan schedule even though I’m not sure I’m being measured.

First of all, Roman, thanks for the comment. The figures you provide are interesting and yours is a second complaint about opaque error messages. That would bother me as well – I get enough of them from emacs macros and I have at least got tools (well, a tool – macroexpand) to debug those.

@Max – Hi. I’m not sure how you got the impression that I was interested in object creation time. We have primarily been talking about Moose start-up overhead here. I think I’m measuring the right thing.

@Anonymous – when you say "Run each benchmark item for 5 seconds or more." I think you missed this bit of the post:

Benchmark: timing 200 iterations of 1just-moose, 2all-includes, 3nocreate, 4create...
  1just-moose: 100.196 wallclock secs ( 0.05 usr +  0.25 sys =  0.30 CPU) @ 673.40/s (n=200)
2all-includes: 279.252 wallclock secs ( 0.08 usr +  0.34 sys =  0.42 CPU) @ 475.06/s (n=200)
    3nocreate: 318.256 wallclock secs ( 0.06 usr +  0.28 sys =  0.34 CPU) @ 581.40/s (n=200)
      4create: 320.979 wallclock secs ( 0.06 usr +  0.27 sys =  0.33 CPU) @ 611.62/s (n=200)

Not one of my benchmarks runs for less than a minute and a half. Most of them are around the five minute mark. And I have deadlines!

So where is this message coming from? I had a look in Benchmark.pm.

    print "            (warning: too few iterations for a reliable count)\n"
        if     $n < $Min_Count # 4
            || ($t->real < 1 && $n < 1000)
            || $t->cpu_a < $Min_CPU; # 0.4

My iterations is 200 so $Min_Count is not the problem. $t->real is at least 100 so that leaves $Min_CPU. Hmmm… yeah, I would have had to double the number of iterations to be sure. Look at 4create – 321 wallclock seconds of which 0.3 is CPU. No worries, I’m good – the warnings were spurious as I originally thought.

Read Full Post »

Benchmarking – Measuring The Right Thing

Posted in Perl, tagged benchmarking, moose, MooseX::Declare on August 19, 2010| 3 Comments »

The debate over whether or not Moose is too slow rumbles on. On one side, are the folks using it in production (and presumably happpy with it). On the other, are folks saying it adds multiple seconds to the start-up time.

Following on from my benchmarking Moose startup overhead post, Sue D. Nymme provided code that actually created an object. Thanks Sue!

I took a look and Sue’s code uses MooseX::Declare that I haven’t installed. One magic CPAN invocation and, oh my gosh, more than 20 minutes later, MooseX::Declare is finally installed. Now I’m thinking that the benchmark time is actually measuring the time for perl to load in and parse MooseX::Declare and tens of thousands of lines of prerequisites rather than how slow Moose itself is.

I’ll compare just loading the modules Moose vs MooseX::Declare.

Then I’ll add the rest of Sue’s code.

And finally I’ll actually create a Moose object.

just-moose.pl

use Modern::Perl;

use Moose;
use Moose::Util::TypeConstraints;
use DateTime;

1;

all-includes.pl

use Modern::Perl;

use MooseX::Declare;
use Moose::Util::TypeConstraints;
use DateTime;

1;

And moose.pl (taken from Sue D. Nymme’s comment here.

bench.pl

use Benchmark qw(:hireswallclock);

use strict;
use warnings;

my $perl = 'c:/strawberry/perl/bin/perl';

Benchmark::timethese(200, {
    '1just-moose' =>
        sub { system(qq{$perl -e "require 'just-moose.pl';"}) },
    '2all-includes' =>
        sub { system(qq{$perl -e "require 'all-includes.pl';"}) },
    '3nocreate' => sub { system(qq{$perl -e "require 'moose.pl';"}) },
    '4create' =>
        sub { system(qq/$perl -e "require 'moose.pl'; MooseObject->new({ name => 'Bob' })"/) },
});

The Results

I’ve elided the too few iterations warnings.

Benchmark: timing 200 iterations of 1just-moose, 2all-includes, 3nocreate, 4create...
  1just-moose: 100.196 wallclock secs ( 0.05 usr +  0.25 sys =  0.30 CPU) @ 673.40/s (n=200)
2all-includes: 279.252 wallclock secs ( 0.08 usr +  0.34 sys =  0.42 CPU) @ 475.06/s (n=200)
    3nocreate: 318.256 wallclock secs ( 0.06 usr +  0.28 sys =  0.34 CPU) @ 581.40/s (n=200)
      4create: 320.979 wallclock secs ( 0.06 usr +  0.27 sys =  0.33 CPU) @ 611.62/s (n=200)

Conclusions

The bulk of the time is in loading the MooseX::Declare library. But on the other hand, declaring a single object did take a significant amount of time (approximately 40 seconds over 200 runs). I can now believe that declaring a lot of objects would be prohibitively expensive.

But that is using MooseX::Declare. I’m sure it is nowhere near as bad if we used Plain Ole Moose.

Read Full Post »

Perl/Python Method Call Benchmarking Updated

Posted in Perl, tagged benchmarking, perl, python on July 5, 2010| Leave a Comment »

Following Nilson’s comment, I’ve updated my perl and python method call benchmarks to (hopefully) test what I actually intended to test.

I must admit, I’m not nearly as interested in method call times as I am in vanilla subroutine call times which could be why I made such a hash of it in the first place.

And I haven’t updated the iteration method as I’m not interested in loop speed (hence why I measured the base case) and that would make pointless extra work for me to do.

Read Full Post »

More Subroutine Benchmarking

Posted in Perl, tagged benchmarking, perl, python on June 24, 2010| 16 Comments »

Some folks requested a few more benchmarks. In this case, I’m happy to oblige. Perl does not come off well in these benchmarks.

Perl Code

package class;

sub new
{
    return bless {}, $_[0];
}

sub f1
{
}

sub f2
{
    my ($self, $x, $y) = @_;
    return ($x, $y);
}

sub f2a
{
    my $self = shift;
    my $x = shift;
    my $y = shift;
    return ($x, $y);
}

package main;

my $obj = class->new();

for ($i = 0; $i < 10_000_000; ++$i) {
    my ($x, $y) = $obj->f2(1, 2);
}

Python Code

class myClass:

    def f1(self):
        pass

    def f2(self, a, b):
        return a, b

x = myClass()

for i in xrange (1, 10000000):
    a,b = x.f2('hello', 'world')

Perl Results

$ perl -v

This is perl, v5.10.1 (*) built for i686-linux-thread-multi

$ time perl ./func.pl # f1()

real    0m5.052s
user    0m5.044s
sys     0m0.008s

$ time perl ./func.pl # f2(1, 2)

real    0m11.598s
user    0m11.585s
sys     0m0.000s

real    0m10.838s
user    0m10.833s
sys     0m0.000s

$ time perl ./func.pl # f2a(1, 2)

real    0m10.740s
user    0m10.713s
sys     0m0.004s

real    0m12.014s
user    0m11.993s
sys     0m0.012s

$ time perl ./func.pl # f2a('hello', 'world')

real    0m16.524s
user    0m16.505s
sys     0m0.008s

real    0m16.521s
user    0m16.489s
sys     0m0.000s

Python Results

$ time python ./func.py # f1()

real    0m3.840s
user    0m3.828s
sys     0m0.004s

$ time python ./func.py # f2(1, 2)

real    0m4.546s
user    0m4.504s
sys     0m0.040s

real    0m5.887s
user    0m5.860s
sys     0m0.016s

$ time python ./func.py # f2('hello', 'world')

real    0m4.548s
user    0m4.540s
sys     0m0.008s

real    0m5.907s
user    0m5.904s
sys     0m0.004s

Read Full Post »

Benchmarking Perl Subroutine Calls

Posted in Perl, tagged benchmarking, perl, python on June 21, 2010| 12 Comments »

For a long time now I have suspected that calling perl subroutines is slow. And I couldn’t figure out from the language shootout which benchmark tested subroutine calls (did it use to be Ackermann?) so I made up my own benchmark.

Perl in comparison to its closest rival – CPython.

I tested a few things:

First, a loop that does nothing, to see how much is loop overhead
Next a zero parameter function
Then a function called with two integers
And finally, declaring those two integers inline.

I’m assuming that an optimiser doesn’t come along and remove code that does nothing. From the results, it seems like a safe assumption.

And, I’m not running benchmarks multiple times or with many iterations because, frankly, I don’t care that much. I just want to get an idea as to how Perl stacks up.

Python Code

def f1():
    pass

def f2(a, b):
    pass # return a, b

for i in xrange (1, 10000000):
    pass
    # f1
    # f2(1, 2)
    # x,y=1,2

Python Results

 $ python -V
Python 2.6.5

$ time python ./func.py # (pass)

real    0m0.722s
user    0m0.720s
sys     0m0.004s

$ time python ./func.py # (x,y = 1,2)

real    0m2.030s
user    0m2.024s
sys     0m0.004s

$ time python ./func.py # (f1)

real    0m2.265s
user    0m2.244s
sys     0m0.012s

$ time python ./func.py # (f2 - pass)

real    0m2.885s
user    0m2.880s
sys     0m0.004s

$ time python ./func.py # (f2 - return a, b)

real    0m3.190s
user    0m3.144s
sys     0m0.012s

Perl Code

sub f1
{
}

sub f2
{
    my ($x, $y) = @_;
}

for ($i = 0; $i < 10_000_000; ++$i) {
    1;
    # f1();
    # f2(1, 2);
    # my ($x, $y) = (1, 2);
}

Perl Results

$ time perl ./func.pl # (1)

real    0m0.893s
user    0m0.888s
sys     0m0.004s

$ time perl ./func.pl # (f1)

real    0m2.932s
user    0m2.924s
sys     0m0.004s

$ time perl ./func.pl # (f2)

real    0m5.607s
user    0m5.580s
sys     0m0.004s

$ time perl ./func.pl # (my ($x, $y) ...)

real    0m2.687s
user    0m2.672s
sys     0m0.008s

Conclusions

A few things jump out at me.

1. 10 million subroutine calls take around a couple of seconds. Would reducing call speed actually affect any real program much?

2. Declaring the variables and assigning the values takes almost as long as the empty function call.

3. Python function calls are faster than perl function calls but it’s not by enough to worry about.

My next post should hopefully clarify why I was thinking about this.

Read Full Post »

A Curious Programmer

Thoughts on Perl and Emacs, technology and writing

Posts Tagged ‘benchmarking’

Why Micro-Benchmark?

Tokyo Cabinet vs Berkeley DB

In Which Redis Guy is Unfair To Sys Toilet Guy

That Syncing Feeling – Safety Is Expensive

The Test Program

The Results

Conclusion

Perl Compiler Speed – It’s Fast!

Loop Unrolling with Eval

Benchmarking

Conclusions

Spurious Benchmark.pm Warnings

Benchmarking – Measuring The Right Thing

just-moose.pl

all-includes.pl

bench.pl

The Results

Conclusions

Perl/Python Method Call Benchmarking Updated

More Subroutine Benchmarking

Perl Code

Python Code

Perl Results

Python Results

Benchmarking Perl Subroutine Calls

Perl in comparison to its closest rival – CPython.

Python Code

Python Results

Perl Code

Perl Results

Conclusions

Read These First

Recent Posts

Links

Meta