|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Opportunities Galore: Using Rate Limits to Enhance Reliability and Performance" |
| 4 | +date: Mar 19, 2025 |
| 5 | +author: Alex Aizman |
| 6 | +categories: reliability performance aistore |
| 7 | +--- |
| 8 | + |
| 9 | +AIStore v3.28 introduces a unified **rate-limiting** capability that works at both the frontend (client-facing) and backend (cloud-facing) layers. It enables proactive control to prevent hitting limits and reactive handling when limits are encountered — all configurable at both the cluster and bucket levels, with zero performance overhead when disabled. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +This text explains how it all fits together. |
| 14 | + |
| 15 | +## 1. Background |
| 16 | + |
| 17 | +The original motivation was to **gracefully handle** rate-limited cloud storage such as Amazon S3, Google Cloud Storage (GCS), and other remote [backends](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#at-a-glance). |
| 18 | + |
| 19 | +One common misconception is that integrating with systems that impose their own rate constraints boils down to simply retrying failed requests with exponential backoff. |
| 20 | + |
| 21 | +Not true! In reality, such integrations are always a balancing act: the goal is to **minimize retries** while running at the **maximum allowed speed** — which further requires configuration, runtime state, and a few more elements explained below. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## 2. Key Elements |
| 26 | + |
| 27 | +### 2.1 Proactive vs. Reactive |
| 28 | + |
| 29 | +**Proactive Rate Limiting** aims to keep requests within permitted throughput before hitting any system-imposed limits—cloud or otherwise. By governing the flow of requests in real time, we reduce the need for retries. |
| 30 | + |
| 31 | +**Reactive Rate Limiting** comes into play when the external service or remote storage actually enforces a limit (returning [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429) or [503](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/503)). |
| 32 | + |
| 33 | +When that happens, we respond with adaptive backoff and retry logic — **but only** if the corresponding bucket has its rate-limiting policy enabled. This gives us a self-adjusting mechanism that converges on the maximum permissible speed with minimal overhead. |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +### 2.2 Frontend vs. Backend |
| 38 | + |
| 39 | +"AIStore has a unique dual identity: it acts as **reliable distributed storage** (managing local and remote buckets) while also serving as a **fast tiering layer** for other systems." |
| 40 | + |
| 41 | +1. **Bursty (Frontend)** |
| 42 | + - Frontend rate limiting is provided by each AIS proxy (or gateway). |
| 43 | + - The system allows short-term spikes but enforces an overall maximum request rate. |
| 44 | + - When clients exceed the allowed rate, they receive HTTP `429` immediately. |
| 45 | + |
| 46 | +2. **Adaptive or Shaping (Backend)** |
| 47 | + - Backend rate limiting is handled by each AIS target node. |
| 48 | + - There's a stateful rate-limiter instance on a per `(bucket, verb)` basis, where the verbs are: `GET`, `PUT`, and `DELETE`. |
| 49 | + - This logic dynamically adjusts request rates based on current usage and responses from the remote service. |
| 50 | + - If a remote backend returns `429` or `503`, AIS target may engage exponential backoff to stay under the cloud provider's limits. |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## 3. Configuration |
| 55 | + |
| 56 | +Version 3.28 introduces **per-bucket configuration**, with **inheritable defaults** set at the **cluster level**. Buckets automatically inherit the global settings, but each bucket can override them at creation time or any point thereafter. For reference, see the updated configuration definitions in |
| 57 | +[cmn/config.go (lines 676–733)](https://github.com/NVIDIA/aistore/blob/main/cmn/config.go#L676-L733). |
| 58 | + |
| 59 | +```console |
| 60 | +$ ais config cluster rate_limit --json |
| 61 | +``` |
| 62 | +```json |
| 63 | + "rate_limit": { |
| 64 | + "backend": { |
| 65 | + "num_retries": 3, |
| 66 | + "interval": "1m", |
| 67 | + "max_tokens": 1000, |
| 68 | + "enabled": false |
| 69 | + }, |
| 70 | + "frontend": { |
| 71 | + "burst_size": 375, |
| 72 | + "interval": "1m", |
| 73 | + "max_tokens": 1000, |
| 74 | + "enabled": false |
| 75 | + } |
| 76 | + } |
| 77 | +``` |
| 78 | + |
| 79 | +This includes: |
| 80 | + |
| 81 | +- **Rate-Limit Policies**: Separate policies for frontend (bursty) and backend (adaptive or shaping) rate limiting. |
| 82 | +- **Performance Requirement**: If rate limiting is **not** enabled for a particular bucket or cluster, there should be **no** performance penalty. |
| 83 | +- **Proportional Distribution**: |
| 84 | + - On the front, each AIS proxy assumes it handles `1 / nap` share of incoming requests, where `nap` is the current number of active proxies in the cluster. |
| 85 | + - On the back, each target node assumes it is one of `nat` targets accessing the same remote bucket in parallel, and computes its share accordingly. |
| 86 | + |
| 87 | +For a given bucket, configuration may look as follows: |
| 88 | + |
| 89 | +```console |
| 90 | +$ ais bucket props set s3://abc rate_limit |
| 91 | + |
| 92 | +PROPERTY VALUE |
| 93 | +rate_limit.backend.enabled true <<<< (values that differ from cluster defaults will be highlighted) |
| 94 | +rate_limit.backend.interval 10s <<<< |
| 95 | +rate_limit.backend.max_tokens 35000 <<<< |
| 96 | +rate_limit.backend.num_retries 5 |
| 97 | +rate_limit.backend.per_op_max_tokens |
| 98 | + |
| 99 | +rate_limit.frontend.burst_size 375 |
| 100 | +rate_limit.frontend.enabled false |
| 101 | +rate_limit.frontend.interval 1m |
| 102 | +rate_limit.frontend.max_tokens 1000 |
| 103 | +rate_limit.frontend.per_op_max_tokens |
| 104 | +``` |
| 105 | + |
| 106 | +### 3.1 Configuration Parameters Explained |
| 107 | + |
| 108 | +| Parameter | Description | |
| 109 | +|-----------|-------------| |
| 110 | +| `enabled` | Enables/disables rate limiting for frontend or backend | |
| 111 | +| `interval` | Time window for token replenishment (e.g., "1m", "10s") | |
| 112 | +| `max_tokens` | Maximum number of operations allowed in the interval | |
| 113 | +| `burst_size` | (Frontend only) Maximum burst allowed above steady rate | |
| 114 | +| `num_retries` | (Backend only) Maximum number of retry attempts when handling `429` or `503` | |
| 115 | +| `per_op_max_tokens` | Optional per-operation (GET/PUT/DELETE) token configuration | |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## 4. Unified Rate-Limiting Logic |
| 120 | + |
| 121 | +Although frontend and backend differ in their specific mechanisms (bursty vs. shaping), the underlying logic is unified: |
| 122 | + |
| 123 | +- **Frontend**: Each proxy enforces a configurable rate limit on a per-bucket and per-operation (verb) basis. |
| 124 | + |
| 125 | +- **Backend**: Each target enforces the configured limit for outbound calls. This is wrapped by a dedicated `ais/rlbackend` layer that shapes traffic to remote AIS clusters or external clouds (e.g., S3, GCS). |
| 126 | + |
| 127 | +Here's a simplified snippet of logic with (4) inline comments: |
| 128 | + |
| 129 | +```go |
| 130 | +func (bp *rlbackend) GetObj() (int, error) { |
| 131 | + // 1. find or create rate limiter instance; proactively apply to optimize out error handling |
| 132 | + arl := bp.acquire(bucket, http.MethodGet) |
| 133 | + |
| 134 | + // 2. fast path |
| 135 | + ecode, err := bp.Backend.GetObj() |
| 136 | + if !IsErrTooManyRequests(err) { |
| 137 | + return ecode, err |
| 138 | + } |
| 139 | + |
| 140 | + // 3. generic retry with a given backend and method-specific callback |
| 141 | + cb := func() (int, error) { |
| 142 | + return bp.Backend.GetObj() |
| 143 | + } |
| 144 | + total, ecode, err := bp.retry() |
| 145 | + |
| 146 | + // 4. increment retry count, add `total` waiting time to retry latency |
| 147 | + bp.stats() |
| 148 | + |
| 149 | + return ecode, err |
| 150 | +} |
| 151 | +``` |
| 152 | + |
| 153 | +- **Lazy Pruning**: Over time, not all buckets remain active, so the system utilizes common [housekeeping](https://github.com/NVIDIA/aistore/blob/main/hk/common_durations.go) mechanism to lazily prune stale rate-limiter instances. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## 5. Handling Batch Jobs |
| 158 | + |
| 159 | +AIStore supports numerous batch jobs that read and transform data across buckets. For example, a job might read data from one bucket, apply a user-defined transformation, and then write the results to another bucket. Multiple rate-limiting scenarios can arise: |
| 160 | + |
| 161 | +- **Source Bucket**: |
| 162 | + - May or may not be rate-limited. |
| 163 | + - Could be limited on the frontend, the backend, or both. |
| 164 | +- **Destination Bucket**: |
| 165 | + - Could also have its own rate-limiting policies. |
| 166 | + - Might be a remote system with enforced limits (e.g., writing to an S3 bucket). |
| 167 | + |
| 168 | +At first, the permutations may seem too numerous, but in reality it is easy to state a single rule: |
| 169 | + |
| 170 | +- Frontend rate-limiting is enforced by a given running job and is _conjunctive_ (source and/or destination wise). When both limits are defined one of them will most likely "absorb" the other. |
| 171 | +- Backend is controlled by the corresponding `(bucket, verb)` rate limiter that keeps adjusting its runtime state based on the responses from remote storage. |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +## 6. Use Cases |
| 176 | + |
| 177 | +### 6.1 Handling Backend-Imposed Limits |
| 178 | + |
| 179 | +**Scenario**: You have an S3 bucket with a known rate limit of 3500 requests per second. |
| 180 | + |
| 181 | +**Configuration**: |
| 182 | +```console |
| 183 | +$ ais bucket props set s3://abc rate_limit.backend.enabled=true rate_limit.backend.interval=10s \ |
| 184 | + rate_limit.backend.max_tokens=35000 |
| 185 | +``` |
| 186 | + |
| 187 | +### 6.2 Limiting User Traffic |
| 188 | + |
| 189 | +"You want to limit the maximum number of client requests to a specific bucket to 20,000 per minute. Further, the cluster in question happens to have 10 AIS gateways (and a load balancer on the front)." |
| 190 | + |
| 191 | +**Configuration**: |
| 192 | +```console |
| 193 | +$ ais bucket props set ais://nnn rate_limit.frontend.enabled=true rate_limit.frontend.interval=1m \ |
| 194 | + rate_limit.frontend.max_tokens=2000 rate_limit.frontend.burst_size=100 |
| 195 | +``` |
| 196 | + |
| 197 | +This configures a given bucket to: |
| 198 | +- Limit client requests to 20,000 per minute (notice that 2000 * 10 = 20,000). |
| 199 | +- Allow short bursts up to 100 additional requests. |
| 200 | +- Return status `429` ("Too Many Requests") if clients exceed these limits. |
| 201 | + |
| 202 | +### 6.3 Combined Frontend/Backend Limiting for Cross-Cloud Transfer |
| 203 | + |
| 204 | +**Scenario**: You are migrating or copying data from GCS to S3 and need to respect both providers' limits. |
| 205 | + |
| 206 | +**Configuration**: |
| 207 | +```console |
| 208 | +# Configure source S3 bucket |
| 209 | +$ ais bucket props set s3://src rate_limit.backend.enabled=true rate_limit.backend.interval=10s \ |
| 210 | + rate_limit.backend.max_tokens=50000 |
| 211 | + |
| 212 | +# Configure destination Google Cloud bucket |
| 213 | +$ ais bucket props set gs://dst rate_limit.backend.enabled=true rate_limit.backend.interval=10s \ |
| 214 | + rate_limit.backend.max_tokens=33000 |
| 215 | +``` |
| 216 | + |
| 217 | +When running a copy or transform job between these buckets, AIStore automatically respects both rate limits without (requiring) any additional configuration. |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## 7. Monitoring and Troubleshooting |
| 222 | + |
| 223 | +There are statistics (and [Prometheus metrics](/docs/metrics.md)) to monitor all performance-related aspects including (but not limited to) rate-limiting. |
| 224 | + |
| 225 | +Below are two tables — one for `GET`, another for `PUT` — that illustrate how the performance monitoring might look for an AIStore cluster under the described rate-limited scenario. |
| 226 | + |
| 227 | +--- |
| 228 | + |
| 229 | +### GET Performance Table |
| 230 | + |
| 231 | +```console |
| 232 | +$ ais performance latency --refresh 10 --regex get |
| 233 | +``` |
| 234 | + |
| 235 | +| TARGET | AWS-GET(n) | AWS-GET(t) | GET(n) | GET(t) | GET(total/avg size) | RATELIM-RETRY-GET(n) | RATELIM-RETRY-GET(t) | |
| 236 | +|:------:|:----------:|:----------:|:------:|:------:|:--------------------:|:---------------------:|:---------------------:| |
| 237 | +| T1 | 800 | 180ms | 3200 | 25ms | 12GB / 3.75MB | 50 | 240ms | |
| 238 | +| T2 | 1000 | 150ms | 4000 | 28ms | 15GB / 3.75MB | 70 | 230ms | |
| 239 | +| T3 | 700 | 200ms | 2800 | 32ms | 10GB / 3.57MB | 40 | 215ms | |
| 240 | + |
| 241 | +- **AWS-GET(n)** / **AWS-GET(t)**: Number and average latency of GET requests that actually hit the AWS backend. |
| 242 | +- **GET(n)** / **GET(t)**: Number and average latency of *all* GET requests (including those served from local cache or in-cluster data). |
| 243 | +- **GET(total/avg size)**: Approximate total data read and corresponding average object size. |
| 244 | +- **RATELIM-RETRY-GET(n)** / **RATELIM-RETRY-GET(t)**: Number and average latency of GET requests retried due to hitting the rate limit. |
| 245 | + |
| 246 | +--- |
| 247 | + |
| 248 | +### PUT Performance Table |
| 249 | + |
| 250 | +```console |
| 251 | +$ ais performance latency --refresh 10 --regex put |
| 252 | +``` |
| 253 | + |
| 254 | +| TARGET | GCP-PUT(n) | GCP-PUT(t) | PUT(n) | PUT(t) | PUT(total/avg size) | RATELIM-RETRY-PUT(n) | RATELIM-RETRY-PUT(t) | |
| 255 | +|:------:|:----------:|:----------:|:------:|:------:|:--------------------:|:---------------------:|:---------------------:| |
| 256 | +| T1 | 3200 | 75ms | 4000 | 50ms | 12GB / 3MB | 40 | 210ms | |
| 257 | +| T2 | 4200 | 85ms | 5200 | 60ms | 15GB / 2.88MB | 50 | 200ms | |
| 258 | +| T3 | 2500 | 90ms | 3300 | 58ms | 10GB / 3.03MB | 35 | 205ms | |
| 259 | + |
| 260 | +- **GCP-PUT(n)** / **GCP-PUT(t)**: Number and average latency of PUT requests that actually went to Google Cloud Storage. |
| 261 | +- **PUT(n)** / **PUT(t)**: Number and average latency of *all* PUT requests processed. |
| 262 | +- **PUT(total/avg size)**: Approximate total data written and the corresponding average object size. |
| 263 | +- **RATELIM-RETRY-PUT(n)** / **RATELIM-RETRY-PUT(t)**: Number and average latency of PUT requests retried due to rate limiting on the destination bucket. |
| 264 | + |
| 265 | +--- |
| 266 | + |
| 267 | +These tables can vary widely, primarily depending on the percentage of source data that is in-cluster, but also on: |
| 268 | + |
| 269 | +- **Rate-limit settings** for both the source (AWS) and the destination (GCP). |
| 270 | +- **Total number of disks** in the cluster. |
| 271 | +- **Object sizes**, **current workload from other running jobs**, **available network bandwidth**, etc. |
| 272 | + |
| 273 | +In practice, you’d adjust the rate-limit `interval` and `max_tokens` (and potentially other AIStore config parameters) to match your workload and performance requirements. |
| 274 | + |
| 275 | +### Quick Troubleshooting Summary |
| 276 | + |
| 277 | +| Issue | Possible Cause | Solution | |
| 278 | +|-------|----------------|----------| |
| 279 | +| Excessive `429` errors from cloud storage | Rate limit set too high | Lower `max_tokens` value for the bucket | |
| 280 | +| Performance degradation with rate limiting enabled | Unnecessarily low limits | Increase `max_tokens` or disable if not needed | |
| 281 | +| Client occasionally receives "Too Many Requests" | Burst size too small | Increase `burst_size` | |
| 282 | +| Client keeps receiving `429` or `503` from its Cloud bucket | Rate limit not configured | Enable and tune up backend rate limiter on the bucket | |
| 283 | + |
| 284 | +--- |
| 285 | + |
| 286 | +## 8. Recap |
| 287 | + |
| 288 | +The objective for v3.28 was to maintain linear scalability and high performance while safeguarding against external throttling or internal overload. |
| 289 | + |
| 290 | +The solution features: |
| 291 | + |
| 292 | +- **Bursty** (frontend) and **adaptive** (backend) rate limiters configurable on a per-bucket basis. |
| 293 | +- **Proactive** controls, to keep runtime errors and retries to a minimum. |
| 294 | +- **Reactive** logic, to gracefully handle 429s and 503s from remote storages. |
0 commit comments