My Python Toolbox

Posted on

After a recent Christchurch Python meetup, I was asked to create a list of Python libraries and tools that I tend to gravitate towards when working on Python projects. The request was aimed at helping newer Pythonistas find a way through the massive Python ecosystem that exists today.

This is my attempt at such a list. It's highly subjective, being coloured by my personal journey. I've done a lot of Python in my career but that doesn't mean that I've necessarily picked the best tool for each task. Still, I hope it's useful as a starting point (I don't think there's any big surprises here).

Black

Many programming language communities are waking up to the fact that having a tool which takes care of code formatting for you is a productivity booster and avoids pointless arguments within teams. Go can take much of the credit for the recent interest in code formatters with the gofmt tool that ships with the standard Go toolchain. Rust has rustfmt, most JavaScript projects seem to prefer prettier and an automatic formatter seems to now be de rigueur for any new language.

A few formatters exist for Python but Black seems to be fast becoming the default choice, and rightly so. It makes sensible formatting decisions (the way it handles line length is particularly smart) and has few configuration options so everyone's code ends up looking the same across projects.

All Python projects should all be using Black!

argparse

Python has a number of options for processing command line arguments but I prefer good old argparse which has been in the standard library since Python 3.2 (and 2.7). It has a logical API and is capable enough for the needs of most programs.

pytest

The standard library has a perfectly fine xUnit style testing package in the form of unittest but pytest requires less ceremony and is just more fun. I really like the detailed failure output when tests fail and the fixtures mechanism which provides a more powerful and clearer way of reusing test common functionality than the classic setup and teardown approach. It also encourages composition in tests over inheritance.

pytest's extension mechanisms are great too. We have a handy custom test report hook for for the API server tests at The Cacophony Project which includes recent API server logs in the output when tests fail.

csv

So much data ends up being available in CSV or similarly formatted files and I've done my fair share of extracting data out of them or producing CSV files for consumption by other software. The csv package in the standard library is well designed and flexible workhorse that deserves more praise.

Dates and times

The standard datetime package from the standard library is excellent and ends up getting used in almost every Python program I work on. It provides convenient ways to represent and manipulate timestamps, time intervals and time zones. I frequently pop open a Python shell just to do some quick ad hoc date calculations.

datetime intentionally doesn't try to get too involved with the vagaries of time zones. If you need to represent timestamps in specific timezones or convert between them, the pytz package is your friend.

There are times where you need to do more complicated things with timestamps and that's where dateutil comes in. It supports generic date parsing, complex recurrence rules and relative delta calculations (e.g. "what is next Monday?"). It also has a complete timezone database built in so you don't need pytz if you're using dateutil.

plumbum

Shell scripts are great for what they are but there are also real benefits to using a more rigorous programming language for the tasks that shell scripts are typically used for, especially once a script get beyond a certain size or complexity. One way forward is to use Python for its expressiveness and cleanliness and the plumbum package to provide the shell-like ease of running and chaining commands together that Python lacks on it's own.

Here's a somewhat contrived example showing plumbum's command chaining capabilities combined with some Python to extract the first 5 lines:

from plumbum.cmd import find, grep, sort

output = (find['-name', '*.py'] | grep['-v', 'python2.7'] | sort)()
for line in output.splitlines()[:5]
     print(line)

In case you're wondering, the name is Latin for lead, which is what pipes used to be made from (and also why we also have plumbers).

attrs

Python class creation with a lot less boilerplate. attrs turns up all over the place and with good reason - you end up with classes that require fewer lines to define and behave correctly in terms of Python's comparison operators.

Here's a quick example of some of the things that attrs gives you:

>>> import attr
>>> @attr.s
... class Point:
...     x = attr.ib(default=0)
...     y = attr.ib(default=0)
...

>>> p0 = Point()     # using default values
>>> p1 = Point(0, 0) # specifying attribute values

# equality implemented by comparing attributes
>>> p0 == p1
True
>>> p2 = Point(3, 4)
>>> p0 == p2
False

>>> repr(p2)         # nice repr values
'Point(x=3, y=4)'

There's a lot more to attrs than this example covers, and most default behaviour is customisable.

It's worth nothing that data classes in Python 3.7 and later offer some of the features of attrs, so you could use those if you want to stick to the standard library. attrs offers a richer feature set though.

requests

If you're making HTTP 1.0/1.1 requests with Python then you should almost certainly be using requests. It can do everything you need and then some, and has a lovely API.

As far as HTTP 2.0 goes, it seems that requests 3 will have that covered, but it's a work in progress at time of writing.

pew

Effective use of virtual environments is crucial for a happy Python development experience. After trying out a few approaches for managing virtualenvs, I've settled on pew as my preferred tool of choice. It feels clean and fits the way I work.


That's what is in my Python toolbox. What's in yours? I'd love to hear your thoughts in the comments.