Skip to content

Benchmarking Node, Bun, and Deno on AWS Lambda

I've noticed myself gravitating towards bun lately for my own personal projects. Bun was recently acquired by Anthropic; this can be either cause for worry or enthusiasm. I try to stay positive and see it as an opportunity for bun to keep focusing on the DX and its strengths, where the other runtimes are starting to fall behind. Bun is also a pretty nice script engine since it removes a lot of the boilerplate friction that I normally associate with Typescript, I can just run bun init project -y and I'll be given sensible defaults which can be run with bun run project/index.ts without faffing about installing tsc, esbuild, minify, dotenv... There are some other niceties like automatically reading .env files, built-in test-runner, and with genuinely great documentation there's a lot to like.

But being a self-proclaimed serverless zealot, does bun make sense to run in Lambda? The bun documentation casually explains how to package it, giving us a nice side bonus by using Docker we're already handed a recipe for how to run our service locally without relying on tools like sam local invoke. I have some reservations about running Docker images on Lambda, but if there's a meaningful difference in performance, we could potentially be paying extra and giving a worse experience for users for nothing by just accepting the default that is the Node runtime.

For good measure we're also comparing against deno, but I admit I have very little experience with it. Deno has some oddities for running in Lambda, like wanting to precompile its dependencies into a cache with an ugly timeout adding significant build time to the Docker build. But I thought why not throw it in there to compare?

Benchmarks

All benchmarks are on Github available for scrutiny (I'll be more than happy to take suggestions for improvements!). I've tried to reuse the same code for all runtimes with the use of a common/ pattern, simply wrapping the benchmark code with the respective runtime's recommended handler implementation. I use zod for parsing requests extensively; depending on how purist/pragmatic you lean you may or may not agree with that in a benchmark for runtimes, but I think it represents real-world usage and is relevant to include for how the runtimes make use of it. Each benchmark ran 100 iterations.

  1. JWT Signing & validation. Standard stuff for a Lambda custom authorizer - how fast can we sign, verify, and parse claims? I'm using jose for all three runtimes. Each iteration invokes two separate Lambda functions (one to sign, one to validate) and the reported time is their sum.
  2. JSON parsing and transformation. With faker I've generated relatively a large JSON file with objects simulating an e-commerce data model. We're measuring how fast the built-in capabilities of the runtimes are able to parse, make some modifications, and then do garbage collection heavy object cloning with a ton of spread operations throughout.
  3. Gzip roundtrip. A medium-sized base64 blob is decoded, gzip:ed, gunzip:ed, and asserted. I believe all runtimes call the same underlying zlib C libraries but the overhead JS <-> native boundary and buffer handling is measured.
  4. Large array operations. Arguably the most interesting benchmark from a runtime perspective, where I try to measure pure JS CPU/GC pressure with no I/O or native calls. The caller sends size and seed parameters; the array is generated inside the Lambda. To avoid inline caching I try to generate random values for the array, and use a predefined seed for each run for determinism (deterministic randomness, one of my favorite oxymorons). Then we filter, map, and sort the array. Array operations is one of the most optimized parts of V8 so this is particularly interesting.

Lambda invocations are self-reporting as part of the response to the benchmark caller, where I'm only interested in the time for execution inside the handler. I run a "warmup" before the actual measurement iterations. Cold starts are not measured as part of the benchmark (I believe it's a problem blown out of proportion), for runtime optimization to make sense we'd need to have a significant Lambda usage anyway, where the number of coldstarts would be vastly outnumbered by the number of invocations of "hot" lambdas.

Benchmark discussion and result

All times in milliseconds/iteration (computation only) for 1,024MB memory allocation for Lambda. Because I'm lazy and don't bother to fight with Docker when my workstation is X86_64 the Lambda architecture is also X86_64 (but you really should run ARM64 for the best price performance when you can).

sh
Iterations: 100
Warmup:     3
Runtimes:   bun, deno, nodejs

Results (JWT Sign/Validate):
  bun:      mean=0.9ms  p50=0.8ms  p95=1.3ms  p99=2.6ms
  deno:     mean=1.1ms  p50=1.1ms  p95=1.5ms  p99=1.6ms
  nodejs:   mean=1.0ms  p50=0.9ms  p95=1.4ms  p99=2.0ms
  
Results (JSON Process):
  bun:      mean=10.8ms  p50=7.9ms   p95=19.5ms  p99=20.2ms
  deno:     mean=12.8ms  p50=13.0ms  p95=18.0ms  p99=21.8ms
  nodejs:   mean=13.5ms  p50=13.8ms  p95=19.5ms  p99=22.6ms

Results (Compression):
  bun:      mean=3.4ms  p50=3.4ms  p95=3.6ms  p99=3.8ms
  deno:     mean=3.0ms  p50=2.9ms  p95=3.0ms  p99=3.1ms
  nodejs:   mean=4.2ms  p50=4.1ms  p95=4.4ms  p99=4.7ms

Results (Array Operations):
  bun:      mean=110.0ms  p50=111.9ms  p95=123.1ms  p99=132.8ms
  deno:     mean=241.0ms  p50=240.9ms  p95=259.3ms  p99=280.2ms
  nodejs:   mean=316.5ms  p50=306.3ms  p95=370.9ms  p99=379.5ms

I'm very surprised to see that the AWS native runtime for Node v24 is outperformed by both Deno and Bun for meaningful memory allocation. I can only theorize as to why Node performs so poorly for the array operations benchmark, but I'm guessing it's GC related. But I can say that for Lambda, bun seems to provide better price performance than the native Node v24 runtime.

A slight apropos regarding memory allocation of Lambda

For completeness I ran the benchmarks for 128, 256, 512, 1,024, 1,769, and 3,072MB of RAM allocation. We can see one interesting thing right away: Yan Cui is probably right and you should never run your Lambda functions at 128MB. Even for trivial workloads like JWT sign/verify it seems like the sweetspot is around 512/1,024MB. To show how we're limited by the performance of the single CPU core utilized by the runtimes (and my crappy benchmark code) we only need to look at the 1,769 and 3,072MB tests; the second core allowed by the higher memory allocation does nothing really meaningful to improve the results of the test except for Node. Notably, Bun actually regresses for array operations at 3072MB (74.1ms → 79.8ms, a ~8% slowdown, scheduling overhead?). I'm not sure what's up with Node specifically, and why it keeps improving when more cores are added. GC offloading?

All values below are mean ms/iteration, columns are MB of RAM allocation.

JWT Sign/validate

Runtime128256512102417693072
Bun9.52.81.41.21.21.1
Deno7.62.61.91.61.71.6
Node11.74.72.61.61.51.3

JSON Process

Runtime128256512102417693072
Bun131.859.23212.47.77.8
Deno235.8126.133.813.9109.6
Node235.468.938.414.410.510.3

Compression

Runtime128256512102417693072
Bun34.69.43.73.63.73.6
Deno25.48.33.933.13
Node36.411.65.44.24.24.2

Array operations

Runtime128256512102417693072
Bun1294.1605.3296.513174.179.8
Deno3614.81753.2586.2230.7109.6107.3
Node4511.71735.1981.9325186115.3