My Python Toolbox
After a recent Christchurch Python meetup, I was asked to create a list of Python libraries and tools that I tend to gravitate towards when working on Python projects. The request was aimed at helping newer Pythonistas find a way through the massive Python ecosystem that exists today.
This is my attempt at such a list. It's highly subjective, being coloured by my personal journey. I've done a lot of Python in my career but that doesn't mean that I've necessarily picked the best tool for each task. Still, I hope it's useful as a starting point (I don't think there's any big surprises here).
Many programming language communities are waking up to the fact that
having a tool which takes care of code formatting for you is a productivity booster
and avoids pointless arguments within teams. Go can take much of the credit for the
recent interest in code formatters with the
gofmt tool that ships
with the standard Go toolchain. Rust has
projects seem to prefer prettier and an
automatic formatter seems to now be de rigueur for any new language.
A few formatters exist for Python but Black seems to be fast becoming the default choice, and rightly so. It makes sensible formatting decisions (the way it handles line length is particularly smart) and has few configuration options so everyone's code ends up looking the same across projects.
All Python projects should all be using Black!
Python has a number of options for processing command line arguments but I prefer good old argparse which has been in the standard library since Python 3.2 (and 2.7). It has a logical API and is capable enough for the needs of most programs.
The standard library has a perfectly fine xUnit style testing package in the form of unittest but pytest requires less ceremony and is just more fun. I really like the detailed failure output when tests fail and the fixtures mechanism which provides a more powerful and clearer way of reusing test common functionality than the classic setup and teardown approach. It also encourages composition in tests over inheritance.
pytest's extension mechanisms are great too. We have a handy custom test report hook for
for the API server tests at The Cacophony Project which includes
recent API server logs in the output when tests fail.
So much data ends up being available in CSV or similarly formatted files and I've done my fair share of extracting data out of them or producing CSV files for consumption by other software. The csv package in the standard library is well designed and flexible workhorse that deserves more praise.
Dates and times
The standard datetime package from the standard library is excellent and ends up getting used in almost every Python program I work on. It provides convenient ways to represent and manipulate timestamps, time intervals and time zones. I frequently pop open a Python shell just to do some quick ad hoc date calculations.
datetime intentionally doesn't try to get too involved with the
vagaries of time zones. If you need to represent timestamps in
specific timezones or convert between them, the
pytz package is your friend.
There are times where you need to do more complicated things with
timestamps and that's where
dateutil comes in. It supports
generic date parsing, complex recurrence rules and relative delta
calculations (e.g. "what is next Monday?"). It also has a complete
timezone database built in so you don't need
pytz if you're using
Shell scripts are great for what they are but there are also real benefits to using a more rigorous programming language for the tasks that shell scripts are typically used for, especially once a script get beyond a certain size or complexity. One way forward is to use Python for its expressiveness and cleanliness and the plumbum package to provide the shell-like ease of running and chaining commands together that Python lacks on it's own.
Here's a somewhat contrived example showing
plumbum's command chaining
capabilities combined with some Python to extract the first 5 lines:
from plumbum.cmd import find, grep, sort output = (find['-name', '*.py'] | grep['-v', 'python2.7'] | sort)() for line in output.splitlines()[:5] print(line)
In case you're wondering, the name is Latin for lead, which is what pipes used to be made from (and also why we also have plumbers).
Python class creation with a lot less boilerplate. attrs turns up all over the place and with good reason - you end up with classes that require fewer lines to define and behave correctly in terms of Python's comparison operators.
Here's a quick example of some of the things that attrs gives you:
>>> import attr >>> @attr.s ... class Point: ... x = attr.ib(default=0) ... y = attr.ib(default=0) ... >>> p0 = Point() # using default values >>> p1 = Point(0, 0) # specifying attribute values # equality implemented by comparing attributes >>> p0 == p1 True >>> p2 = Point(3, 4) >>> p0 == p2 False >>> repr(p2) # nice repr values 'Point(x=3, y=4)'
There's a lot more to attrs than this example covers, and most default behaviour is customisable.
It's worth nothing that data classes in
Python 3.7 and later offer some of the features of attrs, so you could
use those if you want to stick to the standard library.
attrs offers a
richer feature set though.
If you're making HTTP 1.0/1.1 requests with Python then you should almost certainly be using requests. It can do everything you need and then some, and has a lovely API.
As far as HTTP 2.0 goes, it seems that requests 3 will have that covered, but it's a work in progress at time of writing.
Effective use of virtual environments is crucial for a happy Python development experience. After trying out a few approaches for managing virtualenvs, I've settled on pew as my preferred tool of choice. It feels clean and fits the way I work.
That's what is in my Python toolbox. What's in yours? I'd love to hear your thoughts in the comments.