Skip to main content
Rate limiting prevents agents from overwhelming APIs or making too many calls in a short period.

Why Rate Limit?

  • API quotas — Most providers have rate limits (e.g., 60 RPM for GPT-4)
  • Cost control — Slow down expensive operations
  • Fairness — Prevent one agent from hogging resources

Basic Usage

from splinter.control import RateLimiter

limiter = RateLimiter()

# 20 calls per minute for this agent
limiter.set_agent_limit("researcher", calls=20, window_seconds=60)

# Check before calling
if limiter.check_agent("researcher"):
    # Make the call
    limiter.record_agent_call("researcher")

Per-Agent Limits

Different agents can have different limits:
limiter = RateLimiter()

# Researcher does lots of small calls
limiter.set_agent_limit("researcher", calls=30, window_seconds=60)

# Writer does fewer, larger calls
limiter.set_agent_limit("writer", calls=10, window_seconds=60)

# Reviewer is conservative
limiter.set_agent_limit("reviewer", calls=5, window_seconds=60)

Per-Tool Limits

Limit specific tool usage:
limiter = RateLimiter()

# Web search is expensive
limiter.set_tool_limit("web_search", calls=10, window_seconds=60)

# File operations are cheap
limiter.set_tool_limit("read_file", calls=100, window_seconds=60)

# Check tool access
limiter.check_tool("web_search")  # Raises if over limit
limiter.record_tool_call("web_search")

Global Limits

Set limits that apply to all agents:
limiter = RateLimiter()

# Overall API limit
limiter.set_global_limit(calls=100, window_seconds=60)

# Per-agent limits still apply on top
limiter.set_agent_limit("researcher", calls=30, window_seconds=60)

Handling Rate Limit Errors

from splinter.exceptions import RateLimitError

try:
    limiter.check_agent("researcher")
except RateLimitError as e:
    print(f"Rate limited: {e.calls_made} calls in {e.window}s")
    print(f"Retry after: {e.retry_after:.1f}s")
    await asyncio.sleep(e.retry_after)

Sliding Window

Rate limits use a sliding window algorithm: At 75s, the first call “slides out” of the 60-second window, allowing a new call.

Integration with Workflow

from splinter.workflow import Workflow
from splinter.control import RateLimiter

workflow = Workflow(workflow_id="pipeline")

# Attach rate limiter
limiter = RateLimiter()
limiter.set_agent_limit("*", calls=60, window_seconds=60)  # All agents
workflow.set_rate_limiter(limiter)

# Now all agents respect rate limits

Best Practices

Set your limits slightly below the provider’s limits to avoid errors.
Web searches and external API calls should be rate limited separately.
If you’re hitting rate limits often, you might need to optimize your agents.