Optimization with Cython

Why? When? How?

Andrew Svetlov

http://asvetlov.blogspot.com
andrew.svetlov@gmail.com
http://asvetlov.github.io/optimization-moscow-2016/

Bio

  • Use Python for more than 16 years
  • Python Core Developer since 2012
  • asyncio committer
  • aiohttp maintainer
  • Author of a dozen libraries under aio-libs umbrella

Why?

  • It's cool!!!
  • Learning new tech
  • Take out a time on non-business task
  • ...

To speedup your code

Optimization techniques

  • Improve algorithms
  • Use python tweaks
  • Rewrite with C
  • Use Cython

Prerequisites

  • Python implementation exists
  • 100% test coverage
  • Bottleneck is found by profiler

Cythonize all

10% speedup

Fails by many reasons: Cython != Python

Distribution

setup.py


from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("helloworld.pyx")
)
            

pyximport


import pyximport
pyximport.install()

import helloworld
            

Step by step

Example: websocket mask applying


native_byteorder = sys.byteorder

def _websocket_mask_python(mask, data):
    assert isinstance(data, bytearray), data
    assert len(mask) == 4, mask
    datalen = len(data)
    if datalen == 0:
        return bytearray()
    data = int.from_bytes(data, native_byteorder)
    mask = int.from_bytes(mask * (datalen // 4) + mask[: datalen % 4],
                          native_byteorder)
    return (data ^ mask).to_bytes(datalen, native_byteorder)

            

Naive cythonizing: 9% boost


$ cython -a module.pyx
$ xdg-open module.html
            

Add types


from cpython cimport PyBytes_AsString
from libc.stdint cimport uint32_t, uint64_t, uintmax_t
cdef extern from "Python.h":
    char* PyByteArray_AsString(bytearray ba) except NULL

def _websocket_mask_cython(bytes mask, bytearray data):
    cdef:
        Py_ssize_t data_len, i
        unsigned char * in_buf
        const unsigned char * mask_buf
        uint32_t uint32_msk
        uint64_t uint64_msk

            

Work with raw data buffers


    assert len(mask) == 4

    data_len = len(data)
    in_buf = <unsigned char*>PyByteArray_AsString(data)
    mask_buf = <const unsigned char*>PyBytes_AsString(mask)
    uint32_msk = (<uint32_t*>mask_buf)[0]

            

Convert in-place


    while data_len >= 4:
        (<uint32_t*>in_buf)[0] ^= uint32_msk
        in_buf += 4
        data_len -= 4

    for i in range(0, data_len):
        in_buf[i] ^= mask_buf[i]

    return data
            

64 bit optimization


    if sizeof(size_t) >= 8:
        uint64_msk = uint32_msk
        uint64_msk = (uint64_msk << 32) | uint32_msk

        while data_len >= 8:
            (<uint64_t*>in_buf)[0] ^= uint64_msk
            in_buf += 8
            data_len -= 8
            

Aware cythonizing: 20x boost


$ cython -a module.pyx
$ xdg-open module.html
            

Usage and testing

Import


def _websocket_mask_python(mask, data):
    ...

if bool(os.environ.get('AIOHTTP_NO_EXTENSIONS')):
    websocket_mask = _websocket_mask_python
else:
    try:
        from ._websocket import _websocket_mask_cython
        websocket_mask = _websocket_mask_cython
    except ImportError:  # pragma: no cover
        websocket_mask = _websocket_mask_python
            

Testing


class WSTestMixin:
    mask = None

    def test_a(self):
        self.assertEqual(self.mask(..., ...), ...)

class TestCython(WSTestMixin, unittest.TestCase):
    mask = _websocket_mask_cython

class TestCython(WSTestMixin, unittest.TestCase):
    mask = _websocket_mask_python
            

Cython profiling and coverage

Questions?

Andrew Svetlov

http://asvetlov.blogspot.com
andrew.svetlov@gmail.com
http://asvetlov.github.io/optimization-moscow-2016/