Listing S3 objects with NodeJS
Posted on
I recently had to write some NodeJS code which uses the AWS SDK to
list all the objects in a S3 bucket which potentially contains many
objects (currently over 80,000 in production). The S3 listObjects
API will only return up to 1,000 keys at a time so you have to make
multiple calls, setting the Marker
field to page through all the
keys.
It turns out there's a lot of sub-optimal examples out there for how to do this which often involve global state and complicated recursive callbacks. I'm also a fan of the clarity of JavaScript's newer async/await feature for handling asynchronous code so I was keen on a solution which uses that style.
Here's what I came up with:
async function allBucketKeys(s3, bucket) {
const params = {
Bucket: bucket,
};
var keys = [];
for (;;) {
var data = await s3.listObjects(params).promise();
data.Contents.forEach((elem) => {
keys = keys.concat(elem.Key);
});
if (!data.IsTruncated) {
break;
}
params.Marker = data.NextMarker;
}
return keys;
}
It's called like this:
// Remember to catch exceptions somewhere...
const s3 = connectToS3Somehow();
var keys = await allBucketKeys(s3, "my_bucket");
This solution is clean, concise and hopefully straightforward.
An important aspect that supports this solution is that the AWS API
can return a Promise for a call (via .promise()
) which can then be
used with await
. Given the need to conditionally call listObjects
multiple times, an arguably clearer code structure can be achieved using
await instead of callbacks.