Listing S3 objects with NodeJS

I recently had to write some NodeJS code which uses the AWS SDK to list all the objects in a S3 bucket which potentially contains many objects (currently over 80,000 in production). The S3 listObjects API will only return up to 1,000 keys at a time so you have to make multiple calls, setting the Marker field to page through all the keys.

It turns out there's a lot of sub-optimal examples out there for how to do this which often involve global state and complicated recursive callbacks. I'm also a fan of the clarity of JavaScript's newer async/await feature for handling asynchronous code so I was keen on a solution which uses that style.

Here's what I came up with:

async function allBucketKeys(s3, bucket) {
  const params = {
    Bucket: bucket,
  };

  var keys = [];
  for (;;) {
    var data = await s3.listObjects(params).promise();

    data.Contents.forEach((elem) => {
      keys = keys.concat(elem.Key);
    });

    if (!data.IsTruncated) {
      break;
    }
    params.Marker = data.NextMarker;
  }

  return keys;
}

It's called like this:

// Remember to catch exceptions somewhere...
const s3 = connectToS3Somehow();
var keys = await allBucketKeys(s3, "my_bucket");

This solution is clean, concise and hopefully straightforward.

An important aspect that supports this solution is that the AWS API can return a Promise for a call (via .promise()) which can then be used with await. Given the need to conditionally call listObjects multiple times, an arguably clearer code structure can be achieved using await instead of callbacks.

Comments

Comments powered by Disqus