Token Optimization

Token Optimization & Density

ContextMD is designed to maximize the information-to-token ratio. Large documentation sites often contain redundant navigation, boilerplate code, and conversational filler that consumes valuable context window space. By default, ContextMD applies several layers of optimization to ensure your context.md file is "Agent-ready."

Automated AI Refinement

The core of ContextMD's optimization strategy is its Refinement Layer. Once a page is crawled and converted to Markdown, it is processed by gpt-4o-mini with a specific system prompt designed for technical density.

Noise Striping: Removes conversational phrases ("In this section, we will explore...") and focuses on logic and syntax.
Logical Compression: Collapses verbose explanations into high-density summaries while preserving all API signatures and code blocks.
Format Normalization: Ensures headers and lists are consistently structured to help LLMs leverage structural attention.

Controlling Context Volume

To manage your token budget and prevent context overflow in models like Claude 3.5 Sonnet or GPT-4o, use the following controls:

1. Page Limits

Use the --limit (or -l) flag to restrict the depth of the crawl. This is the most effective way to keep the final output within a specific token range.

# Limit output to approximately the top 20 most relevant pages
npx contextmd https://docs.example.com --limit 20

2. Strategic Scoping

Instead of crawling a root domain, target specific sub-directories to generate "modular" context files. This allows you to feed your agent only the relevant module documentation rather than the entire library.

# High-density context for just the API reference
npx contextmd https://docs.example.com/api-reference/v1

Content Filtering (Noise Reduction)

ContextMD automatically performs "surgical" HTML cleaning before the AI even sees the content. This reduces the initial token count and prevents the model from being distracted by UI elements. The following elements are stripped by default:

Token Budgeting Best Practices

When preparing a context.md file for an LLM, consider the following target sizes:

Small (10-30 pages): Ideal for RAG (Retrieval-Augmented Generation) or small context windows (8k - 32k). Use -l 20.
Medium (30-100 pages): Optimized for "Long Context" models like GPT-4o (128k window). Use -l 75.
Large (100+ pages): Best suited for "Mega Context" models like Gemini 1.5 Pro (1M+ window) or Claude 3.5 (200k window). Use -l 200.

[!TIP] Output Monitoring: After generation, check the file size of context.md. A rough estimate for token count is 0.75 tokens per character for refined technical markdown.