Shortly after my post about speeding up Python with Cython, I was contacted by Mark Dufour, creator of ShedSkin, a Python-to-C compiler, who wanted to try my code with his compiler. I had heard of ShedSkin before, but I chalked it up as something to try later, or something too hard to try (C is not my forte).

After Mark contacted me, I decided to give it a go on the code of the post, and, to my great, surprise, it performed a bit better than Cython with no changes to my code. ShedSkin does require that you program in a restricted subset of Python, but most of my scientific code is written in that style anyway (it’s not really that restricting). After that point, I used ShedSkin for all my other assignments, and now I’m writing about it.

A few days ago I had a bioinformatics assignment, and the goal was to recognize protein location from their structure. I wrote an SVM to classify the proteins, compiled it with ShedSkin and ran it. I will give you a sample of the Python code and the same code modified for ShedSkin.

Before:

def train_adatron(kernel_matrix, label_matrix, h, c):
    tolerance = 0.5
    alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    bias = [0.0] * len(label_matrix[0])
    labelalphas = [0.0] * len(kernel_matrix)
    max_differences = [(0.0, 0)] * len(label_matrix[0])
    for iteration in range(10*len(kernel_matrix)):
        if not iteration % 100:
            print "Starting iteration %s..." % iteration
        for klass in range(len(label_matrix[0])):
            max_differences[klass] = (0.0, 0)
            for elem in range(len(kernel_matrix)):
                labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem]
            for col_counter in range(len(kernel_matrix)):
                prediction = 0.0
                for row_counter in range(len(kernel_matrix)):
                    prediction += kernel_matrix[col_counter][row_counter] * \\
                                 labelalphas[row_counter]
                g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass])
                betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c)
                difference = abs(alphas[klass][col_counter] - betas[klass][col_counter])
                if difference > max_differences[klass][0]:
                    max_differences[klass] = (difference, col_counter)

After:

def train_adatron(kernel_matrix, label_matrix, h, c):
    tolerance = 0.5
    alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    bias = [0.0] * len(label_matrix[0])
    labelalphas = [0.0] * len(kernel_matrix)
    max_differences = [(0.0, 0)] * len(label_matrix[0])
    for iteration in range(10*len(kernel_matrix)):
        if not iteration % 100:
            print "Starting iteration %s..." % iteration
        for klass in range(len(label_matrix[0])):
            max_differences[klass] = (0.0, 0)
            for elem in range(len(kernel_matrix)):
                labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem]
            for col_counter in range(len(kernel_matrix)):
                prediction = 0.0
                for row_counter in range(len(kernel_matrix)):
                    prediction += kernel_matrix[col_counter][row_counter] * \\
                                 labelalphas[row_counter]
                g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass])
                betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c)
                difference = abs(alphas[klass][col_counter] - betas[klass][col_counter])
                if difference > max_differences[klass][0]:
                    max_differences[klass] = (difference, col_counter)

You might notice that the two snippets are identical. That’s how awesome ShedSkin is. It didn’t need a single change, and on top of that, it gave me compile-time errors when I messed up my code.

The timings of the pure Python and ShedSkin compiled code are:

python        shedskin
------------- ------------
4841.94 sec   103.30 sec

You can find my code in the ShedSkin repository.

That is a 47x speedup (not 47%, 47 times), just by running two commands to compile my code to C and C to machine code. Needless to say, I will be using ShedSkin a lot more in the future.