Tuesday, November 29, 2011

PyPy 1.7 on Win32

Hi all,

We have fixed _continuation on Win32 (thanks Stakkars), and so we have now a Win32 version of PyPy 1.7.

Monday, November 21, 2011

PyPy 1.7 - widening the sweet spot

We're pleased to announce the 1.7 release of PyPy. As became a habit, this release brings a lot of bugfixes and performance improvements over the 1.6 release. However, unlike the previous releases, the focus has been on widening the "sweet spot" of PyPy. That is, classes of Python code that PyPy can greatly speed up should be vastly improved with this release. You can download the 1.7 release here:

http://pypy.org/download.html

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 1.7 and cpython 2.7.1 performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 32/64 or Windows 32. Windows 64 work is ongoing, but not yet natively supported.

The main topic of this release is widening the range of code which PyPy can greatly speed up. On average on our benchmark suite, PyPy 1.7 is around 30% faster than PyPy 1.6 and up to 20 times faster on some benchmarks.

Highlights

  • Numerous performance improvements. There are too many examples which python constructs now should behave faster to list them.

  • Bugfixes and compatibility fixes with CPython.

  • Windows fixes.

  • PyPy now comes with stackless features enabled by default. However, any loop using stackless features will interrupt the JIT for now, so no real performance improvement for stackless-based programs. Contact pypy-dev for info how to help on removing this restriction.

  • NumPy effort in PyPy was renamed numpypy. In order to try using it, simply write:

    import numpypy as numpy
    

    at the beginning of your program. There is a huge progress on numpy in PyPy since 1.6, the main feature being implementation of dtypes.

  • JSON encoder (but not decoder) has been replaced with a new one. This one is written in pure Python, but is known to outperform CPython's C extension up to 2 times in some cases. It's about 20 times faster than the one that we had in 1.6.

  • The memory footprint of some of our RPython modules has been drastically improved. This should impact any applications using for example cryptography, like tornado.

  • There was some progress in exposing even more CPython C API via cpyext.

Things that didn't make it, expect in 1.8 soon

There is an ongoing work, which while didn't make it to the release, is probably worth mentioning here. This is what you should probably expect in 1.8 some time soon:

  • Specialized list implementation. There is a branch that implements lists of integers/floats/strings as compactly as array.array. This should drastically improve performance/memory impact of some applications
  • NumPy effort is progressing forward, with multi-dimensional arrays coming soon.
  • There are two brand new JIT assembler backends, notably for the PowerPC and ARM processors.

Fundraising

It's maybe worth mentioning that we're running fundraising campaigns for NumPy effort in PyPy and for Python 3 in PyPy. In case you want to see any of those happen faster, we urge you to donate to numpy proposal or py3k proposal. In case you want PyPy to progress, but you trust us with the general direction, you can always donate to the general pot.

Cheers,
Maciej Fijałkowki, Armin Rigo and the entire PyPy team

Monday, November 14, 2011

Gothenburg sprint report

In the past week, we have been busy hacking on PyPy at the Gothenburg sprint, the second of this 2011. The sprint was hold at Laura's and Jacob's place, and here is a brief report of what happened.


In the first day we welcomed Mark Pearse, who was new to PyPy and at his first sprint. Mark worked the whole sprint in the new SpecialisedTuple branch, whose aim is to have a special implementation for small 2-items and 3-items tuples of primitive types (e.g., ints or floats) to save memory. Mark paired with Antonio for a couple of days, then he continued alone and did an amazing job. He even learned how to properly do Test Driven Development :-).

Antonio spent a couple of days investigating whether it is possible to use application checkpoint libraries such as BLCR and DMTCP to save the state of the PyPy interpreter between subsequent runs, thus saving also the JIT-compiled code to reduce the warmup time. The conclusion is that these are interesting technologies, but more work would be needed (either on the PyPy side or on the checkpoint library side) before it can have a practical usage for PyPy users.

Then, Antonio spent most of the rest of the sprint working on his ffistruct branch, whose aim is to provide a very JIT-friendly way to interact with C structures, and eventually implement ctypes.Structure on top of that. The "cool part" of the branch is already done, and the JIT already can compile set/get of fields into a single fast assembly instruction, about 400 times faster than the corresponding ctypes code. What is still left to do is to add a nicer syntax (which is easy) and to implement all the ctypes peculiarities (which is tedious, at best :-)).

As usual, Armin did tons of different stuff, including fixing a JIT bug, improving the performance of file.readlines() and working on the STM branch (for Software Transactional Memory), which is now able to run RPython multithreaded programs using software transaction (as long as they don't fill up all the memory, because support for the GC is still missing :-)). Finally, he worked on improving the Windows version of PyPy. While doing so he discovered together with Anto a terrible bug which lead to a continuous leak of stack space because the JIT called some functions using the wrong calling convention.

Håkan, with some help from Armin, worked on the jit-targets branch, whose goal is to heavily refactor the way the traces are internally represented by the JIT, so that in the end we can produce (even :-)) better code than what we do nowadays. More details in this mail.

Andrew Dalke worked on a way to integrate PyPy with FORTRAN libraries, and in particular the ones which are wrapped by Numpy and Scipy: in doing so, he wrote f2pypy, which is similar to the existing f2py but instead of producing a CPython extension module it produces a pure python modules based on ctypes. More work is needed before it can be considered complete, but f2pypy is already able to produce a wrapper for BLAS which passes most of the tests under CPython, although there's still work left to get it working for PyPy.

Armin and Håkan with Laura's "5x faster" cake
Christian Tismer worked the whole sprint on the branch to make PyPy compatible with Windows 64 bit. This needs a lot of work because a lot of PyPy is written under the assumption that the long type in C has the same bit size than void*, which is not true on Win64. Christian says that in the past Genova-Pegli sprint he completed 90% of the work, and in this sprint he did the other 90% of the work. Obviously, what is left to complete the task is the third 90% :-). More seriously, he estimated a total of 2-4 person-weeks of work to finish it.

But, all in all, the best part of the sprint has been the cake that Laura baked to celebrate the "5x faster than CPython" achievement. Well, actually our speed page reports "only" 4.7x, but that's because in the meantime we switched from comparing against CPython 2.6 to comparing against CPython 2.7, which is slightly faster. We are confident that we will reach the 5x goal again, and that will be the perfect excuse to eat another cake :-)