Pete Freitag Pete Freitag

Burst Throttling on AWS API Gateway Explained

Updated on April 08, 2022
By Pete Freitag

One nice feature of AWS API Gateway is that you can configure or Throttle the Burst (requests) and Rate (requests per second). The first time I looked at that it was not really clear to me how the Burst (requests) throttling works. So here is an explanation of what the Burst and the Rate are, and how they work together.

What is the Burst?

The Burst limit is quite simply the maximum number of concurrent requests that API gateway will serve at any given point. So it is your maximum concurrency for the API.

What is the API Gateway Rate (requests per second)?

The Rate is a little easier to understand, it is the maximum number of requests that can occur within one second.

How do the Rate and Burst Throttle work together?

The Burst setting and Rate setting work together to control how many requests can be processed by your API.

Let's assume you set the throttle to Rate = 100 (requests per second) and the Burst = 50 (requests). With those settings if 100 concurrent requests are sent at the exact same millisecond only 50 would be processed due to the burst setting, the remaining 50 requests would get a 429 Too Many Requests response. Assuming the first 50 requests completed in 100ms each, your client could then retry the remaining 50 requests.

What does this mean for API Gateways that invoke Lambda?

AWS Lambda Functions have a default maximum concurrency level of 1000 (you can request to have this increased if you need to), but the default burst levels on AWS API Gateway is way higher than this, so if you are using API Gateway with Lambda you will want to make sure that you have set a value for the Burst throttle setting that makes sense for your Lambda Concurrency level.

aws lambda apigateway

Burst Throttling on AWS API Gateway Explained was first published on December 07, 2018.

If you like reading about aws, lambda, or apigateway then you might also like:

Discuss / Follow me on Twitter ↯


Thanks for the simple explanation, which is so hard to find!!!
by Magno de Araujo on 03/12/2020 at 2:44:56 AM UTC
Have been trying to understand this for last two days. Your post has made me to understand in less than 2 minutes. Thank you.
by MD on 05/17/2020 at 6:55:01 AM UTC
I agree with the above poster, this was an easy to understand explanation on Rate Limiting and Burst throttling. So appreciated!
by Dawn Lassen on 01/05/2021 at 8:05:49 PM UTC
The best explanation I've seen. I still find it difficult to know what is a reasonable balance between the rate, burst and lambda concurrency. At least I know what it all means now. Thank you!
by Richard on 04/30/2021 at 2:39:50 PM UTC
Thanks all! Glad to hear this is still useful. I know I've had to come back here and reference this myself a few times.
by Pete Freitag on 04/30/2021 at 2:54:30 PM UTC
Great article!

"The Rate is a little easier to understand, it is the maximum number of requests that can occurs within one second."

^This sentence seems a bit confusing to me. My understanding is that if the burst is 200 and the rate is 100, there can be 200 requests occurring within the first second if the service had not been taking any request for sometime.
by Alex on 03/21/2022 at 7:12:49 PM UTC