Contribution Guidelines
Contributing to ContextMD
Thank you for your interest in improving ContextMD. We welcome contributions that help make documentation more accessible for AI agents. As a TypeScript-based CLI utility, we maintain high standards for code quality and performance.
Development Setup
To get started with the codebase, ensure you have Node.js (v18+) installed.
-
Clone the repository:
git clone https://github.com/UditAkhourii/contextmd.git cd contextmd -
Install dependencies:
npm install -
Environment Variables: Create a
.envfile in the root directory to test AI-powered processing:OPENAI_API_KEY=your_api_key_here -
Run in development mode: You can run the CLI directly using
ts-nodeor by compiling the project:# Using npm link for global testing npm run build npm link contextmd https://example.com/docs
Code Architecture & Standards
ContextMD is built with a modular architecture. Most contributions will fall into one of these three areas:
Crawler(src/crawler.ts): Handles URL normalization, recursive link discovery, and HTML fetching.Processor(src/processor.ts): Handles HTML-to-Markdown conversion via Turndown and LLM-based refinement.CLI(src/index.ts): Manages the user interface, command-line arguments, and batch orchestration.
Technical Guidelines
- TypeScript: All code must be written in TypeScript with strict type checking. Avoid using
anyunless absolutely necessary for external library interop. - Async/Await: Use modern
async/awaitsyntax for all asynchronous operations (crawling, file I/O, API calls). - Interfaces: When modifying data structures, update the relevant interfaces (e.g.,
Pageincrawler.ts). - Error Handling: Ensure the CLI remains resilient. Errors in individual page processing should not crash the entire crawl.
Feature Requests
We are particularly interested in features that improve the "agentic readiness" of the output. This includes:
- Better support for specific documentation frameworks (Docusaurus, GitBook, Mintlify).
- Improved noise reduction algorithms in
Processor.cleanHtml. - Support for local LLM providers (Ollama, LocalAI).
If you have a major feature idea, please open an Issue first to discuss the implementation approach.
Pull Request Process
- Create a Branch: Use a descriptive name like
feat/add-local-llm-supportorfix/crawler-depth-logic. - Lint & Format: Ensure your code follows the project's formatting.
npm run lint # if applicable - Update Documentation: If you add a new CLI flag or configuration option, update the
src/index.tscommander definition and theREADME.md. - Submit PR: Provide a clear description of the changes and a sample output of a generated
context.mdfile using your branch.
Testing Your Changes
Before submitting a PR, verify your changes by running a crawl against a known documentation site. Use a small limit to save tokens:
# Test the crawler and processor together
node dist/index.js https://docs.github.com --limit 5 --output test-output.md
Check the test-output.md file to ensure:
- Headers are correctly nested.
- Code blocks are preserved and properly fenced.
- The AI refinement has removed navigation fluff without losing technical substance.