Introduction
Overview
ContextMD is a specialized terminal utility designed to bridge the gap between fragmented web documentation and Large Language Models (LLMs). While modern AI agents like Claude 3.5 Sonnet and GPT-4o are incredibly capable, they are often hindered by the "noise" of modern documentation sitesānavigation bars, footers, cookie banners, and redundant formatting.
ContextMD solves this by crawling entire documentation domains and "chemically" refining them into a single, high-density context.md file. By using AI to strip away fluff and restructure technical content for machine comprehension, it provides your agents with a clear, focused, and token-efficient knowledge base.
Why ContextMD?
Traditional RAG (Retrieval-Augmented Generation) often struggles with the layout of documentation websites. ContextMD takes a different approach by consolidating information into a single "Source of Truth" that is:
- Token-Optimized: AI-powered refinement removes conversational filler while preserving critical API signatures and logic.
- Structurally Clear: Multi-page sites are flattened into a single document with clear hierarchical headers.
- Agent-Ready: Designed specifically for "drop-in" usage in LLM context windows or custom instructions.
Key Features
- š·ļø Intelligent Crawling: Automatically traverses documentation sub-paths while staying within the target domain.
- š§ AI Refinement: Leverages OpenAI's models to rewrite raw HTML/Markdown into high-density technical summaries.
- š§¹ Automatic Noise Reduction: Strips out sidebars, scripts, and non-essential UI elements before processing.
- ā” High Performance: Utilizes concurrent processing to handle large documentation sets quickly with real-time CLI feedback.
Quick Start
ContextMD is distributed as a CLI tool. You can run it directly via npx or install it globally.
Usage
To generate a context file for a documentation site, provide the base URL and your OpenAI API key:
# Run via npx
npx contextmd-cli https://docs.example.com --key YOUR_OPENAI_API_KEY
# Or specify a custom output path and page limit
npx contextmd-cli https://docs.example.com -o project-context.md -l 50
Options
| Flag | Description | Default |
| :--- | :--- | :--- |
| <url> | The base URL of the documentation to crawl. | (Required) |
| -k, --key | Your OpenAI API Key (can also be set via OPENAI_API_KEY env var). | - |
| -o, --output | The path for the generated Markdown file. | context.md |
| -l, --limit | Maximum number of pages to crawl. | 100 |
How it Works
- Discovery: The crawler starts at the provided URL, mapping out all internal links within the same domain.
- Extraction: For each page, the tool extracts the main content (typically within
<main>or<article>tags) and discards the surrounding UI boilerplate. - Refinement: The raw content is passed to an LLM (defaulting to
gpt-4o-mini) with a specialized system prompt that instructs it to optimize the text for AI consumptionāprioritizing technical constraints, code blocks, and logic. - Consolidation: All refined pages are appended into a single Markdown file, complete with source URL attributions.