Rate-Limiting

Architect limits the rate of incoming gRPC requests per user (including usage of the SDK) to ensure services are reliable and responsive. Rate limiting is enforced on a per-user basis, meaning each authenticated user has their own independent rate limit quota.

Overview

Rate limiting applies to all gRPC requests made through the Architect API, including:

  • Direct gRPC calls

  • SDK usage (Python, TypeScript, Rust)

  • Any application or script that communicates with Architect services

How Rate Limiting Works

Per-User Enforcement

Rate limits are enforced based on the authenticated user ID. When you make a request:

  1. Architect extracts your user ID

  2. The rate limiter checks your personal quota

  3. If you have available tokens, the request proceeds

  4. If your quota is exhausted, the request is rejected with a rate limit error

Note that multiple accounts under a single user share a single token bucket.

Token Bucket Algorithm

The rate limiter uses a token bucket algorithm with the following characteristics:

  • Burst Capacity: You can make a burst of requests up to your configured limit

  • Refill Rate: Tokens are replenished over time according to your quota

For example, if your rate limit is 10 requests per second with a 100 burst capacity:

  • You can make up to 100 requests immediately (burst)

  • After consuming tokens, they refill at a rate of 10 per second

  • If you exceed the burst capacity, you must wait for tokens to refill

Default Rate Limits

By default the server will enforce rate limits with a burst capacity of 100 requests and a refill rate of 10 requests per second. This applies per user (not per account). However, Architect reserves the right to change these limits at any time depending on server capacity and use cases.

Rate Limit Responses

When you exceed your rate limit, Architect returns a gRPC error with the following characteristics:

Error Details

  1. gRPC Status Code: RESOURCE_EXHAUSTED (code 8)

  2. Error Message: "rate limit exceeded"

  3. Retry-After Header: Provides the recommended wait time in milliseconds before retrying

Handling the Error in Python Example

When using the Python SDK, you'll receive an AioRpcError:

Example Error Output:

Rust Example

When using the Rust SDK:

Best Practices

Use Streaming Channels Instead of Polling

The most effective way to stay within rate limits is to use Architect's streaming channels instead of repeatedly calling unary endpoints.

Rate limit tokens are consumed per gRPC call, not per message on a stream. Establishing a streaming connection like orderflow consumes only one token—all subsequent messages sent over that stream (PlaceOrder, CancelOrder, GetOrder, etc.) do not consume additional tokens.

Approach
Rate Limit Impact

Polling get_order in a loop

1 token per call ❌

Using orderflow channel

1 token to connect, unlimited messages ✅

Subscribing to subscribe_orderflow

1 token to connect, unlimited updates ✅

Example: Prefer orderflow over polling

Batch and Cache Where Possible

  • Batch requests: Use batch endpoints (e.g., place_batch_order) instead of making multiple individual calls

  • Cache static data: Cache infrequently changing data like account lists or product definitions to reduce redundant API calls

Request Backoff

Clients should treat RESOURCE_EXHAUSTED as a signal to alleviate pressure. Retrying after a delay is recommended, and doubling the delay upon each consecutive RESOURCE_EXHAUSTED message is often best practice. You can apply some jitter to avoid the thundering herd problem.

FAQs

What happens when I exceed the rate limit?

When you exceed your rate limit, your request is immediately rejected with a RESOURCE_EXHAUSTED error. Note that this differs from throttling (where requests are slowed down). Crucially, this means that rejected calls need to be resent (if still needed).

You must wait for tokens to refill before making additional requests. The error response includes a Retry-After header indicating how long you should wait.

How do I know what my rate limit is?

Rate limit quotas are configured per deployment. Default rates will be posted, but contact the Architect team to learn the specific rate limits for your environment.

Can I request a higher rate limit?

Rate limit quotas are configured based on system capacity and fair usage policies. Contact the Architect team to discuss your specific needs.

Do rate limits apply to all gRPC methods?

Yes, rate limiting applies to all gRPC requests made to Architect services, regardless of the specific method or service being called.

How are rate limits enforced across multiple connections?

Rate limits are enforced per user ID, not per connection. If you have multiple connections or clients using the same user credentials, they all share the same rate limit quota.

Last updated