Skip to content

Commit be309fd

Browse files
committed
docs: add rate limiting readme and blog
Signed-off-by: Alex Aizman <alex.aizman@gmail.com>
1 parent d6bf97c commit be309fd

File tree

3 files changed

+296
-2
lines changed

3 files changed

+296
-2
lines changed
Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
---
2+
layout: post
3+
title: "Opportunities Galore: Using Rate Limits to Enhance Reliability and Performance"
4+
date: Mar 19, 2025
5+
author: Alex Aizman
6+
categories: reliability performance aistore
7+
---
8+
9+
AIStore v3.28 introduces a unified **rate-limiting** capability that works at both the frontend (client-facing) and backend (cloud-facing) layers. It enables proactive control to prevent hitting limits and reactive handling when limits are encountered — all configurable at both the cluster and bucket levels, with zero performance overhead when disabled.
10+
11+
![Dual-layer Rate Limiting](/assets/rate-limit-60pct.png)
12+
13+
This text explains how it all fits together.
14+
15+
## 1. Background
16+
17+
The original motivation was to **gracefully handle** rate-limited cloud storage such as Amazon S3, Google Cloud Storage (GCS), and other remote [backends](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#at-a-glance).
18+
19+
One common misconception is that integrating with systems that impose their own rate constraints boils down to simply retrying failed requests with exponential backoff.
20+
21+
Not true! In reality, such integrations are always a balancing act: the goal is to **minimize retries** while running at the **maximum allowed speed** — which further requires configuration, runtime state, and a few more elements explained below.
22+
23+
---
24+
25+
## 2. Key Elements
26+
27+
### 2.1 Proactive vs. Reactive
28+
29+
**Proactive Rate Limiting** aims to keep requests within permitted throughput before hitting any system-imposed limits—cloud or otherwise. By governing the flow of requests in real time, we reduce the need for retries.
30+
31+
**Reactive Rate Limiting** comes into play when the external service or remote storage actually enforces a limit (returning [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429) or [503](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/503)).
32+
33+
When that happens, we respond with adaptive backoff and retry logic — **but only** if the corresponding bucket has its rate-limiting policy enabled. This gives us a self-adjusting mechanism that converges on the maximum permissible speed with minimal overhead.
34+
35+
---
36+
37+
### 2.2 Frontend vs. Backend
38+
39+
"AIStore has a unique dual identity: it acts as **reliable distributed storage** (managing local and remote buckets) while also serving as a **fast tiering layer** for other systems."
40+
41+
1. **Bursty (Frontend)**
42+
- Frontend rate limiting is provided by each AIS proxy (or gateway).
43+
- The system allows short-term spikes but enforces an overall maximum request rate.
44+
- When clients exceed the allowed rate, they receive HTTP `429` immediately.
45+
46+
2. **Adaptive or Shaping (Backend)**
47+
- Backend rate limiting is handled by each AIS target node.
48+
- There's a stateful rate-limiter instance on a per `(bucket, verb)` basis, where the verbs are: `GET`, `PUT`, and `DELETE`.
49+
- This logic dynamically adjusts request rates based on current usage and responses from the remote service.
50+
- If a remote backend returns `429` or `503`, AIS target may engage exponential backoff to stay under the cloud provider's limits.
51+
52+
---
53+
54+
## 3. Configuration
55+
56+
Version 3.28 introduces **per-bucket configuration**, with **inheritable defaults** set at the **cluster level**. Buckets automatically inherit the global settings, but each bucket can override them at creation time or any point thereafter. For reference, see the updated configuration definitions in
57+
[cmn/config.go (lines 676–733)](https://github.com/NVIDIA/aistore/blob/main/cmn/config.go#L676-L733).
58+
59+
```console
60+
$ ais config cluster rate_limit --json
61+
```
62+
```json
63+
"rate_limit": {
64+
"backend": {
65+
"num_retries": 3,
66+
"interval": "1m",
67+
"max_tokens": 1000,
68+
"enabled": false
69+
},
70+
"frontend": {
71+
"burst_size": 375,
72+
"interval": "1m",
73+
"max_tokens": 1000,
74+
"enabled": false
75+
}
76+
}
77+
```
78+
79+
This includes:
80+
81+
- **Rate-Limit Policies**: Separate policies for frontend (bursty) and backend (adaptive or shaping) rate limiting.
82+
- **Performance Requirement**: If rate limiting is **not** enabled for a particular bucket or cluster, there should be **no** performance penalty.
83+
- **Proportional Distribution**:
84+
- On the front, each AIS proxy assumes it handles `1 / nap` share of incoming requests, where `nap` is the current number of active proxies in the cluster.
85+
- On the back, each target node assumes it is one of `nat` targets accessing the same remote bucket in parallel, and computes its share accordingly.
86+
87+
For a given bucket, configuration may look as follows:
88+
89+
```console
90+
$ ais bucket props set s3://abc rate_limit
91+
92+
PROPERTY VALUE
93+
rate_limit.backend.enabled true <<<< (values that differ from cluster defaults will be highlighted)
94+
rate_limit.backend.interval 10s <<<<
95+
rate_limit.backend.max_tokens 35000 <<<<
96+
rate_limit.backend.num_retries 5
97+
rate_limit.backend.per_op_max_tokens
98+
99+
rate_limit.frontend.burst_size 375
100+
rate_limit.frontend.enabled false
101+
rate_limit.frontend.interval 1m
102+
rate_limit.frontend.max_tokens 1000
103+
rate_limit.frontend.per_op_max_tokens
104+
```
105+
106+
### 3.1 Configuration Parameters Explained
107+
108+
| Parameter | Description |
109+
|-----------|-------------|
110+
| `enabled` | Enables/disables rate limiting for frontend or backend |
111+
| `interval` | Time window for token replenishment (e.g., "1m", "10s") |
112+
| `max_tokens` | Maximum number of operations allowed in the interval |
113+
| `burst_size` | (Frontend only) Maximum burst allowed above steady rate |
114+
| `num_retries` | (Backend only) Maximum number of retry attempts when handling `429` or `503` |
115+
| `per_op_max_tokens` | Optional per-operation (GET/PUT/DELETE) token configuration |
116+
117+
---
118+
119+
## 4. Unified Rate-Limiting Logic
120+
121+
Although frontend and backend differ in their specific mechanisms (bursty vs. shaping), the underlying logic is unified:
122+
123+
- **Frontend**: Each proxy enforces a configurable rate limit on a per-bucket and per-operation (verb) basis.
124+
125+
- **Backend**: Each target enforces the configured limit for outbound calls. This is wrapped by a dedicated `ais/rlbackend` layer that shapes traffic to remote AIS clusters or external clouds (e.g., S3, GCS).
126+
127+
Here's a simplified snippet of logic with (4) inline comments:
128+
129+
```go
130+
func (bp *rlbackend) GetObj() (int, error) {
131+
// 1. find or create rate limiter instance; proactively apply to optimize out error handling
132+
arl := bp.acquire(bucket, http.MethodGet)
133+
134+
// 2. fast path
135+
ecode, err := bp.Backend.GetObj()
136+
if !IsErrTooManyRequests(err) {
137+
return ecode, err
138+
}
139+
140+
// 3. generic retry with a given backend and method-specific callback
141+
cb := func() (int, error) {
142+
return bp.Backend.GetObj()
143+
}
144+
total, ecode, err := bp.retry()
145+
146+
// 4. increment retry count, add `total` waiting time to retry latency
147+
bp.stats()
148+
149+
return ecode, err
150+
}
151+
```
152+
153+
- **Lazy Pruning**: Over time, not all buckets remain active, so the system utilizes common [housekeeping](https://github.com/NVIDIA/aistore/blob/main/hk/common_durations.go) mechanism to lazily prune stale rate-limiter instances.
154+
155+
---
156+
157+
## 5. Handling Batch Jobs
158+
159+
AIStore supports numerous batch jobs that read and transform data across buckets. For example, a job might read data from one bucket, apply a user-defined transformation, and then write the results to another bucket. Multiple rate-limiting scenarios can arise:
160+
161+
- **Source Bucket**:
162+
- May or may not be rate-limited.
163+
- Could be limited on the frontend, the backend, or both.
164+
- **Destination Bucket**:
165+
- Could also have its own rate-limiting policies.
166+
- Might be a remote system with enforced limits (e.g., writing to an S3 bucket).
167+
168+
At first, the permutations may seem too numerous, but in reality it is easy to state a single rule:
169+
170+
- Frontend rate-limiting is enforced by a given running job and is _conjunctive_ (source and/or destination wise). When both limits are defined one of them will most likely "absorb" the other.
171+
- Backend is controlled by the corresponding `(bucket, verb)` rate limiter that keeps adjusting its runtime state based on the responses from remote storage.
172+
173+
---
174+
175+
## 6. Use Cases
176+
177+
### 6.1 Handling Backend-Imposed Limits
178+
179+
**Scenario**: You have an S3 bucket with a known rate limit of 3500 requests per second.
180+
181+
**Configuration**:
182+
```console
183+
$ ais bucket props set s3://abc rate_limit.backend.enabled=true rate_limit.backend.interval=10s \
184+
rate_limit.backend.max_tokens=35000
185+
```
186+
187+
### 6.2 Limiting User Traffic
188+
189+
"You want to limit the maximum number of client requests to a specific bucket to 20,000 per minute. Further, the cluster in question happens to have 10 AIS gateways (and a load balancer on the front)."
190+
191+
**Configuration**:
192+
```console
193+
$ ais bucket props set ais://nnn rate_limit.frontend.enabled=true rate_limit.frontend.interval=1m \
194+
rate_limit.frontend.max_tokens=2000 rate_limit.frontend.burst_size=100
195+
```
196+
197+
This configures a given bucket to:
198+
- Limit client requests to 20,000 per minute (notice that 2000 * 10 = 20,000).
199+
- Allow short bursts up to 100 additional requests.
200+
- Return status `429` ("Too Many Requests") if clients exceed these limits.
201+
202+
### 6.3 Combined Frontend/Backend Limiting for Cross-Cloud Transfer
203+
204+
**Scenario**: You are migrating or copying data from GCS to S3 and need to respect both providers' limits.
205+
206+
**Configuration**:
207+
```console
208+
# Configure source S3 bucket
209+
$ ais bucket props set s3://src rate_limit.backend.enabled=true rate_limit.backend.interval=10s \
210+
rate_limit.backend.max_tokens=50000
211+
212+
# Configure destination Google Cloud bucket
213+
$ ais bucket props set gs://dst rate_limit.backend.enabled=true rate_limit.backend.interval=10s \
214+
rate_limit.backend.max_tokens=33000
215+
```
216+
217+
When running a copy or transform job between these buckets, AIStore automatically respects both rate limits without (requiring) any additional configuration.
218+
219+
---
220+
221+
## 7. Monitoring and Troubleshooting
222+
223+
There are statistics (and [Prometheus metrics](/docs/metrics.md)) to monitor all performance-related aspects including (but not limited to) rate-limiting.
224+
225+
Below are two tables — one for `GET`, another for `PUT` — that illustrate how the performance monitoring might look for an AIStore cluster under the described rate-limited scenario.
226+
227+
---
228+
229+
### GET Performance Table
230+
231+
```console
232+
$ ais performance latency --refresh 10 --regex get
233+
```
234+
235+
| TARGET | AWS-GET(n) | AWS-GET(t) | GET(n) | GET(t) | GET(total/avg size) | RATELIM-RETRY-GET(n) | RATELIM-RETRY-GET(t) |
236+
|:------:|:----------:|:----------:|:------:|:------:|:--------------------:|:---------------------:|:---------------------:|
237+
| T1 | 800 | 180ms | 3200 | 25ms | 12GB / 3.75MB | 50 | 240ms |
238+
| T2 | 1000 | 150ms | 4000 | 28ms | 15GB / 3.75MB | 70 | 230ms |
239+
| T3 | 700 | 200ms | 2800 | 32ms | 10GB / 3.57MB | 40 | 215ms |
240+
241+
- **AWS-GET(n)** / **AWS-GET(t)**: Number and average latency of GET requests that actually hit the AWS backend.
242+
- **GET(n)** / **GET(t)**: Number and average latency of *all* GET requests (including those served from local cache or in-cluster data).
243+
- **GET(total/avg size)**: Approximate total data read and corresponding average object size.
244+
- **RATELIM-RETRY-GET(n)** / **RATELIM-RETRY-GET(t)**: Number and average latency of GET requests retried due to hitting the rate limit.
245+
246+
---
247+
248+
### PUT Performance Table
249+
250+
```console
251+
$ ais performance latency --refresh 10 --regex put
252+
```
253+
254+
| TARGET | GCP-PUT(n) | GCP-PUT(t) | PUT(n) | PUT(t) | PUT(total/avg size) | RATELIM-RETRY-PUT(n) | RATELIM-RETRY-PUT(t) |
255+
|:------:|:----------:|:----------:|:------:|:------:|:--------------------:|:---------------------:|:---------------------:|
256+
| T1 | 3200 | 75ms | 4000 | 50ms | 12GB / 3MB | 40 | 210ms |
257+
| T2 | 4200 | 85ms | 5200 | 60ms | 15GB / 2.88MB | 50 | 200ms |
258+
| T3 | 2500 | 90ms | 3300 | 58ms | 10GB / 3.03MB | 35 | 205ms |
259+
260+
- **GCP-PUT(n)** / **GCP-PUT(t)**: Number and average latency of PUT requests that actually went to Google Cloud Storage.
261+
- **PUT(n)** / **PUT(t)**: Number and average latency of *all* PUT requests processed.
262+
- **PUT(total/avg size)**: Approximate total data written and the corresponding average object size.
263+
- **RATELIM-RETRY-PUT(n)** / **RATELIM-RETRY-PUT(t)**: Number and average latency of PUT requests retried due to rate limiting on the destination bucket.
264+
265+
---
266+
267+
These tables can vary widely, primarily depending on the percentage of source data that is in-cluster, but also on:
268+
269+
- **Rate-limit settings** for both the source (AWS) and the destination (GCP).
270+
- **Total number of disks** in the cluster.
271+
- **Object sizes**, **current workload from other running jobs**, **available network bandwidth**, etc.
272+
273+
In practice, you’d adjust the rate-limit `interval` and `max_tokens` (and potentially other AIStore config parameters) to match your workload and performance requirements.
274+
275+
### Quick Troubleshooting Summary
276+
277+
| Issue | Possible Cause | Solution |
278+
|-------|----------------|----------|
279+
| Excessive `429` errors from cloud storage | Rate limit set too high | Lower `max_tokens` value for the bucket |
280+
| Performance degradation with rate limiting enabled | Unnecessarily low limits | Increase `max_tokens` or disable if not needed |
281+
| Client occasionally receives "Too Many Requests" | Burst size too small | Increase `burst_size` |
282+
| Client keeps receiving `429` or `503` from its Cloud bucket | Rate limit not configured | Enable and tune up backend rate limiter on the bucket |
283+
284+
---
285+
286+
## 8. Recap
287+
288+
The objective for v3.28 was to maintain linear scalability and high performance while safeguarding against external throttling or internal overload.
289+
290+
The solution features:
291+
292+
- **Bursty** (frontend) and **adaptive** (backend) rate limiters configurable on a per-bucket basis.
293+
- **Proactive** controls, to keep runtime errors and retries to a minimum.
294+
- **Reactive** logic, to gracefully handle 429s and 503s from remote storages.

‎docs/assets/rate-limit-60pct.png‎

341 KB
Loading

‎docs/rate_limit.md‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This text explains how it all fits together.
66

77
## 1. Background
88

9-
The original motivation was to **gracefully handle rate-limited cloud storage** such as Amazon S3, Google Cloud Storage (GCS), and other remote [backends](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#at-a-glance).
9+
The original motivation was to **gracefully handle** rate-limited cloud storage such as Amazon S3, Google Cloud Storage (GCS), and other remote [backends](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#at-a-glance).
1010

1111
One common misconception is that integrating with systems that impose their own rate constraints boils down to simply retrying failed requests with exponential backoff.
1212

@@ -193,7 +193,7 @@ This configures a given bucket to:
193193

194194
### 6.3 Combined Frontend/Backend Limiting for Cross-Cloud Transfer
195195

196-
**Scenario**: You're migrating or copying data from GCS to S3 and need to respect both providers' limits.
196+
**Scenario**: You are migrating or copying data from GCS to S3 and need to respect both providers' limits.
197197

198198
**Configuration**:
199199
```console

0 commit comments

Comments
 (0)