Speeding up Python code with ShedSkin

Posted on 27 Apr 2010.

Shortly after my post about speeding up Python with Cython, I was contacted by Mark Dufour, creator of ShedSkin, a Python-to-C++ compiler, who wanted to try my code with his compiler. I had heard of ShedSkin before, but I chalked it up as something to try later, or something too hard to try (C++ is not my forte).

After Mark contacted me, I decided to give it a go on the code of the post, and, to my great, surprise, it performed a bit better than Cython with no changes to my code. ShedSkin does require that you program in a restricted subset of Python, but most of my scientific code is written in that style anyway (it’s not really that restricting). After that point, I used ShedSkin for all my other assignments, and now I’m writing about it.

A few days ago I had a bioinformatics assignment, and the goal was to recognize protein location from their structure. I wrote an SVM to classify the proteins, compiled it with ShedSkin and ran it. I will give you a sample of the Python code and the same code modified for ShedSkin.

Before:

def train_adatron(kernel_matrix, label_matrix, h, c): tolerance = 0.5 alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))] betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))] bias = [0.0] * len(label_matrix[0]) labelalphas = [0.0] * len(kernel_matrix) max_differences = [(0.0, 0)] * len(label_matrix[0]) for iteration in range(10*len(kernel_matrix)): if not iteration % 100: print "Starting iteration %s..." % iteration for klass in range(len(label_matrix[0])): max_differences[klass] = (0.0, 0) for elem in range(len(kernel_matrix)): labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem] for col_counter in range(len(kernel_matrix)): prediction = 0.0 for row_counter in range(len(kernel_matrix)): prediction += kernel_matrix[col_counter][row_counter] * \\ labelalphas[row_counter] g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass]) betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c) difference = abs(alphas[klass][col_counter] - betas[klass][col_counter]) if difference > max_differences[klass][0]: max_differences[klass] = (difference, col_counter)

After:

def train_adatron(kernel_matrix, label_matrix, h, c): tolerance = 0.5 alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))] betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))] bias = [0.0] * len(label_matrix[0]) labelalphas = [0.0] * len(kernel_matrix) max_differences = [(0.0, 0)] * len(label_matrix[0]) for iteration in range(10*len(kernel_matrix)): if not iteration % 100: print "Starting iteration %s..." % iteration for klass in range(len(label_matrix[0])): max_differences[klass] = (0.0, 0) for elem in range(len(kernel_matrix)): labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem] for col_counter in range(len(kernel_matrix)): prediction = 0.0 for row_counter in range(len(kernel_matrix)): prediction += kernel_matrix[col_counter][row_counter] * \\ labelalphas[row_counter] g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass]) betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c) difference = abs(alphas[klass][col_counter] - betas[klass][col_counter]) if difference > max_differences[klass][0]: max_differences[klass] = (difference, col_counter)

You might notice that the two snippets are identical. That’s how awesome ShedSkin is. It didn’t need a single change, and on top of that, it gave me compile-time errors when I messed up my code.

The timings of the pure Python and ShedSkin compiled code are:

python shedskin
4841.94 sec 103.30 sec

You can find my code in the ShedSkin repository.

That is a 47x speedup (not 47%, 47 times), just by running two commands to compile my code to C++ and C++ to machine code. Needless to say, I will be using ShedSkin a lot more in the future.