Introducing influx-spout
Posted on
As well as my main gig, I do some work with the excellent folks at Jump Trading. My main focus there so far has been finalising and open sourcing a project - implemented in Go - called influx-spout.
influx-spout will primarily be of interest to you if:
- you use the InfluxDB time series database
- you have a lot of data going into InfluxDB
- you want flexibility in how incoming InfluxDB data is handled
influx-spout sits in between an incoming fire-hose of InfluxDB measurements (potentially from thousands of hosts) and one or more InfluxDB instances (or other services which accept the InfluxDB Line Protocol such as Kapacitor). It accepts InfluxDB measurements over either UDP or HTTP, applies sanity checks, batching, filtering and routing, and then writes out the measurements to their final destinations.
influx-spout provides flexibility: measurements can be sharded across backends, some classes of measurements can duplicated to multiple backends and measurements which are no longer important can be dropped before they get near a disk.
influx-spout also allows for easy changes in the way measurements are handled without requiring changes to the systems producing the measurements. External systems are configured to send measurements to single, static endpoint (i.e. influx-spout) with any changes to the way measurements are handled taken care of by changes to influx-spout's configuration.
As a Go developer, influx-spout is interesting because it of the high volumes of data that it needs to support. Here's a few things we've done to ensure that influx-spout can handle large data volumes:
- The data path is entirely in RAM: no disk is involved.
- Scale out: the various functions of influx-spout are implemented as separate processes which can run on a single machine or scaled out multiple machines.
- Custom algorithms for parsing and data conversion where required.
- Minimal memory allocation: wherever possible buffers are allocated once and reused.
- Minimal copying: wherever possible data is read into a buffer once and used from there.
- Performance regression testing: there are automated benchmarks which compare the performance of key areas of the code against an earlier reference revision. The checks are run for every pull request to detect regressions.
If you're into Go development and performance sensitive software, the influx-spout code is worth studying.
influx-spout v2.0.0 has just been released can downloaded from the project's Releases page. There's lots more information about the project in the project launch post and in the README.