Batching & Rate Limits

ContextMD is designed to handle large-scale documentation sites efficiently while respecting the rate limits of the OpenAI API. To ensure stability and performance, the tool employs a batch-processing strategy and a graceful fallback mechanism.

Concurrent Batch Processing

When processing a documentation site, ContextMD does not send every page to the LLM simultaneously, nor does it process them strictly one-by-one. Instead, it uses a chunked batching strategy:

Batch Size: The CLI processes pages in batches of 5 concurrent requests.
Execution: Each batch must complete before the next set of 5 pages begins.
Benefit: This provides a significant speedup over sequential processing while preventing your local machine or the API connection from being overwhelmed by hundreds of simultaneous network requests.

Managing OpenAI Rate Limits

ContextMD uses the gpt-4o-mini model by default. This model is chosen specifically for its high throughput and cost-effectiveness. However, even with smaller models, large documentation sites (e.g., 500+ pages) may encounter rate limits depending on your OpenAI account tier.

To manage this, the tool provides the following controls:

1. Page Limits

Use the -l or --limit flag to restrict the number of pages crawled and processed. This is the most effective way to stay within your API quota during initial testing.

# Limit processing to the first 20 pages
contextmd https://docs.example.com --limit 20

2. Graceful Markdown Fallback

If ContextMD encounters a rate limit error (HTTP 429) or any other API failure during the refinement stage, the process will not crash.

The Processor is built with a fail-safe mechanism:

Primary Path: Refines content using AI to remove fluff and optimize for agents.
Fallback Path: If the AI call fails, the tool reverts to the raw Markdown conversion of the cleaned HTML.
Outcome: Your context.md file will still be generated, ensuring you always receive the documentation content even if your API tier limits are reached.

Optimization Tips

To minimize the risk of hitting rate limits on extremely large repositories:

Targeted Crawling: Provide a specific sub-directory URL (e.g., https://docs.example.com/api-reference) rather than the root domain to reduce the total page count.
API Tier: Ensure your OpenAI account is at least "Tier 1" to take advantage of higher Tokens Per Minute (TPM) and Requests Per Minute (RPM) limits.
Output Monitoring: Watch the CLI dashboard's real-time progress bar; if you notice many pages skipping the "refinement" step, you may be hitting rate limits, and ContextMD is using its fallback Markdown logic.