Why Rate Limit?
- API quotas — Most providers have rate limits (e.g., 60 RPM for GPT-4)
- Cost control — Slow down expensive operations
- Fairness — Prevent one agent from hogging resources
Basic Usage
Per-Agent Limits
Different agents can have different limits:Per-Tool Limits
Limit specific tool usage:Global Limits
Set limits that apply to all agents:Handling Rate Limit Errors
Sliding Window
Rate limits use a sliding window algorithm: At 75s, the first call “slides out” of the 60-second window, allowing a new call.Integration with Workflow
Best Practices
Match provider limits
Match provider limits
Set your limits slightly below the provider’s limits to avoid errors.
Use tool limits for expensive operations
Use tool limits for expensive operations
Web searches and external API calls should be rate limited separately.
Monitor rate limit hits
Monitor rate limit hits
If you’re hitting rate limits often, you might need to optimize your agents.