13
\$\begingroup\$

I have written the Rabin-Miller test in Python and ported it to C. But somehow, the C version is ~20 times slower on my system, and I am not sure how. Anyway, here is my code.

Python

import random
import time

def rabin_miller(n):
    if n in [2, 3, 5]:
        return True
    if n < 2 or n % 2 == 0:
        return False

    s = 0
    d = n - 1
    while d % 2 == 0:
        s += 1
        d >>= 1

    for k in range(2 ** 4):
        a = random.randrange(2, n - 1)
        x = pow(a, d, n)
        if x == 1 or x == n - 1:
            continue
        c = False
        for r in range(s - 1):
            x = pow(x, 2, n)
            if x == 1:
                return False
            if x == n - 1:
                break
        else:
            return False
    return True

for i in range(10000):
    rabin_miller(i)

C

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

long mod_pow(long b, long e, long m){
    if(m == 1)
        return 0;
    long c = 1;
    for(long i = 0; i < e; i++)
        c = (c * b) % m;
    return c;
}

int rabin_miller(long n){
    if(n == 2 || n == 3)
        return 1;
    if(n < 2 || n % 2 == 0)
        return 0;

    long s = 0, d = n - 1, k = 10, a, x, stop;
    while(d % 2 == 0)
        s += 1, d >>= 1;

    srand(time(NULL));
    for(long i = 0; i < k; i++){
        a = (long) (((float) rand() / RAND_MAX) * (n - 3)) + 2;
        x = mod_pow(a, d, n);
        if(x == 1 || x == n - 1)
            continue;
        stop = 1;
        for(long r = 0; r < s - 1; r++){
            x = mod_pow(x, 2, n);
            if(x == 1)
                return 0;
            if(x == n - 1){
                stop = 0;
                break;
            }
        }
        if(stop){
            return 0;
        }
    }
    return 1;
}

int main(int argc, char *argv[]){
    for(long i = 1; i < 10000; i++){
        rabin_miller(i);
    }
    return 0;
}
\$\endgroup\$
4
  • \$\begingroup\$ So you are looking to improve the C code, I think ...? If so, the Python code is unnecessary and could be removed \$\endgroup\$ Commented Oct 2, 2017 at 14:07
  • 3
    \$\begingroup\$ Just put it there as a point of reference \$\endgroup\$ Commented Oct 2, 2017 at 14:09
  • \$\begingroup\$ You could at least improve the while (d % 2 == 0) With I think while (~t & 1) \$\endgroup\$ Commented Oct 2, 2017 at 14:47
  • 1
    \$\begingroup\$ I also assume you know this, but while Raben Miller is an efficient prime test, it isn't a good generator, as a sieve would be much faster for either. \$\endgroup\$ Commented Oct 2, 2017 at 22:09

2 Answers 2

13
\$\begingroup\$

I believe the problem is that long mod_pow is very inefficient, as it takes e steps to complete, as compared to Python's approach which likely only takes log(e) steps. To fix this, you should use an algorithm similar to the following

function modular_pow(base, exponent, modulus)
if modulus = 1 then return 0
Assert :: (modulus - 1) * (modulus - 1) does not overflow base
result := 1
base := base mod modulus
while exponent > 0
    if (exponent mod 2 == 1):
       result := (result * base) mod modulus
    exponent := exponent >> 1
    base := (base * base) mod modulus
return result
\$\endgroup\$
3
  • \$\begingroup\$ I see the value of if modulus = 1 then return 0, yet not base := base mod modulus aside from reduced chance of overflow in base * base. Any other reason for it? \$\endgroup\$ Commented Oct 2, 2017 at 22:33
  • 1
    \$\begingroup\$ That's it. No reason to chance overflow if you don't need to. This way, it works for any base that fits the type, if the modulus is small enough. \$\endgroup\$ Commented Oct 2, 2017 at 22:41
  • \$\begingroup\$ It also doesn't take a ton of time, as it is one mod statement compared to up to 64 for the loop. (and some conditional branches) \$\endgroup\$ Commented Oct 2, 2017 at 22:43
11
\$\begingroup\$

I see a number of things that may help you improve your C program in addition to the review from Oscar Smith which correctly identifies the reason for the major performance difference in the two versions.

Use better variable names

Single letter variable names are not descriptive and not helpful to the reader of the code.

Only call srand once

The call to srand should only be done once. Calling it multiple times does not improve randomness.

Use bool where appropriate

Since C99, the C language supports bool, and the rabin_miller() function should return a bool rather than an int. You will need to #include <stdbool.h> to do this.

Use the appropriate data type

In the original code k and i are just used as part of a loop count. An int or unsigned would be a more appropriate type than long and may confer a performance advantage. The same goes for r and s.

Prefer for to while

The code which calculates s and d could be written a bit more succinctly and clearly, in my view, like this:

for(s = 0; d % 2 == 0; ++s) {
    d >>= 1;
}

Simplify for loop termination condition

The point to the for (r ... loop is to simply iterate s-1 times. Rather than comparing r to s-1 each iteration, it makes more sense to simply start the loop from 1 rather than 0 and compare with s directly:

    for(unsigned r = 1; r < s; r++) {

Even simpler, count down instead. Some processors have a special instruction which does both a decrement and test for zero so here again, there may be a performance advantage on some architectures:

    for(unsigned r = s; r; --r) {

Declare variables in as small a scope as practical

The variables a and x are only used inside the big loop and so could easily be declared only there. Keeping the scope of variables as small as practical makes the code easier to read and understand and may give the compiler some better hints as to register allocation.

\$\endgroup\$
18
  • \$\begingroup\$ Would there be any benefit to using !(n & 0x1) over n % 2 == 0? You missed your usual point about symbolic constants over raw numbers (main). \$\endgroup\$ Commented Oct 2, 2017 at 16:51
  • 2
    \$\begingroup\$ For the first question, no, there's usually no benefit because compilers are generally smart enough to generate the same optimal code for either. For the second, other than k in the original, I think that having hard coded numbers (e.g. 2, 3) in this intrinsically mathematical function is fine. \$\endgroup\$ Commented Oct 2, 2017 at 17:02
  • \$\begingroup\$ Pedantic corner !(n & 0x1) is not the same as n % 2 == 0 in the archaic one's complement. when n<0. \$\endgroup\$ Commented Oct 2, 2017 at 21:23
  • \$\begingroup\$ I'll concede the point if you can name any currently used compiler which uses one's complement. :) \$\endgroup\$ Commented Oct 2, 2017 at 21:29
  • \$\begingroup\$ Should have waited 5 minutes -- grumble grumble. \$\endgroup\$ Commented Oct 2, 2017 at 21:35

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.