DRAFT Introduce CFFI-based Python bindings

Floris Bruynooghe flub at devork.be
Tue Nov 28 12:46:07 PST 2017


Hi all,

Here are the beginnings off CFFI-based Python bindings, rather
than the ctypes-based ones currently available.  I started this
work in order to get faster bindings on pypy since a script of
mine was running slower on pypy than CPython.  Initially aiming
for a drop-in replacement of the existing bindings I ended up
abandoning this to help enforce correct usage of the API.

The benefits of this approach are:
- More "Pythonic" API, e.g. tags behave like sets, iterators
  which get consumed can easily be re-created as is usual with
  collections, avoid allowing invalid combinations of args and
  calls on a Python-API level.
- CFFI, this works on both CPython and PyPy, on the latter it
  is (supposed to be) a lot faster as the JIT can cross the
  boundary between C and Python code where it otherwise has
  extra overheads to emulate the C-Python API.  Additionally
  it makes it safer to use compared to ctypes, it works on the
  API level using the compiler to figure out the correct details
  of the platform.  Compared to ctypes which only works on the
  ABI level and you need to rely on knowing the layout of code
  when writing the Python bindings.

Additionaly I belive these bindings fix a memory safety issue,
certain situations in my test-suite would lead to coredump which
is not something which should be possible from within Python.
I believe I have seen similar reports in the list archives so
am not the only one seeing these.  Sadly these are hard to
isolate and I have not managed to re-create this in a nice
minimal example, however I believe the root cause is that in
some situations, mostly interpreter shutdown, the __del__
method can have been called while there are still references
to the object and while child-objects are still alive.  This
effectively results in double-frees as the child object frees
memory already freed by the parent.  These bindings solve this
by adding the .alive property and using this to check parent
objects are still alive before destroying themselves.  This is
somewhat expensive, but works and is easy to implement.

Lastly there are some downsides to the choices I made:
- I ended up going squarely for CPython 3.6+.  Choosing Python
  3 allowed better API design, e.g. with keyword-only parameters
  etc.  Choosing CPython 3.4+ restricts the madness that can
  happen with __del__ and gives some newer (tho now unused)
  features in weakref.finalizer.
- This is no longer drop-in compatible.
- I haven't got to a stage where my initial goal of speed has
  been proven yet.

In theory I think it's possible to create a CFFI-based drop-in
replacement to the bindings, only adding the memory-safety fixes
and keeping the Python 2.7 compatibility.  It would then be
possible to build the API proposed in these bindings on top of
this, but once I was making these bindings safer it felt strange
to still allow the API to be misused.


There are a lot of details about this which can be discussed,
also many finer implementation points and even just getting the
proposed API right (you'll notice large gaps for now).  But
this mail is already too long.  I look forward to your comments
and feedback on the approach taken and on whether some form
of this could make it into the main repo.


Lastly a small note on the AUTHORS file patch, due to my own
unfortunate choice of employer I have strict rules to follow
on how to submit patches.  One of which is to add this line if
an AUTHORS file exists.  Given clearly not everyone is listed
here though maybe this is not appropriate.  I would also rather
receive email on flub at devork.be rather than the address I have
to use in the git commits.


Kind Regards,
Floris




More information about the notmuch mailing list