Symfony DOCS RAG experiments. PART 1

In the current world MCP with documentation search is a must have. While Laravel provides Laravel boost with documentation search capabilities, Symfony doesn't. Of course, you could use something like Context7, but why not build customized RAG for Symfony documentation by ourselves.

Benchmark

So to check how good our RAG is, we will need a benchmark.

To achieve it I generated 603 question/answer pairs for each RST file in Symfony docs using Qwen3 Coder Next on a local machine.

Pairs looks like:

{
  "id": "6dae8f2fb2dae26968bae477b94d4b05",
  "question": "When using the `#[IsGranted]` attribute with an Expression as the `subject`, how can you pass multiple expressions with custom aliases, and what is the syntax for referencing them in the `attribute` expression?",
  "difficulty": "hard",
  "kind": "general",
  "required_entity": null,
  "source_file": "security/expressions.rst",
  "answer_line_start": 168,
  "answer_line_end": 176,
  "answer_quote": "The subject may also be an array where the key can be used as an alias for the result of an expression::\n\n    #[IsGranted(\n        attribute: new Expression('user === subject[\"author\"] and subject[\"post\"].isPublished()'),\n        subject: [\n            'author' => new Expression('args[\"post\"].getAuthor()'),\n            'post',\n        ],\n    )]",
  "generator": {
    "mode": "llm",
    "model": "qwen3-next",
    "base_url": "http://localhost:8052/v1",
    "timestamp": "2026-02-15T01:19:01.067963+00:00",
    "shard_index": 0,
    "shard_count": 1
  }
}

This gives us ability to check later whether our retrieved chunks/documents hit lines in concrete file. To check up we will use hit@1 and hit@5 metrics. To check if answer chunk hits specific lines we will use strict mode. To check if answer chunk is from same file as an answer we will use relaxed mode.

Simple RAG implementation

Chunking

So first we need to chunk RST, we will apply some tricks to make chunks at least slightly better than just raw N number of tokens.

And here is an explanation of chunking strategy by LLM

1. Preprocessing & Parsing
 - File Discovery: Recursively finds all .rst and .rst.inc files in symfony-docs, ignoring _build, _images, and .github.
 - Include Resolution: Recursively finds and inlines .. include:: directives so that the content appears in the parent file rather than as separate fragments.
 - Sphinx Compatibility: Registers custom "no-op" handlers for Sphinx-specific roles (e.g., :ref:, :doc:) and directives (e.g., .. toctree::, .. configuration-block::) so docutils can parse the file without errors.
 - Doctree Generation: Uses docutils to parse the full text into a hierarchical Document Tree (doctree).
2. Hierarchical Extraction
The script walks the document tree section by section:

 - Section-Based: Each section (header) is treated as a logical unit. Nested sections are processed recursively but independently.
 - Breadcrumbs: Tracks the "path" to the current section (e.g., Security > Authenticators > Form Login) and saves it in metadata.
 - Content Extraction: Converts the nodes within a section (paragraphs, lists, code blocks, tables) back into plain text/markdown.
 - Code Blocks: Preserved with ``` fences.
 - Tables: Flattened into pipe-separated text (cell | cell).
3. Splitting Strategy
If a section's content is larger than 1000 characters, it undergoes a multi-stage splitting pipeline:

 - Code Block Split: Splits the text at code block boundaries to avoid breaking code snippets.
 - Paragraph Split: If segments are still too large, splits at paragraph boundaries (\n\n) with a 200-character overlap.
 - Hard Cap: As a last resort, force-splits at newline boundaries if a segment still exceeds the limit.
4. Post-Processing
 - Tiny Chunk Merging: Chunks smaller than 50 characters are merged into their previous or next neighbor to prevent useless fragments.
 - Line Number Injection: After chunking, the script scans the original source text to find the exact line_start and line_end for each chunk, enabling precise citations.

Embeddings part

For quick and fast experimentation I used https://huggingface.co/nomic-ai/CodeRankEmbed with GGUF and llama.cpp.

It's fast, very fast and promising bi-encoder, which should work nicely for Symfony docs as it contains quite a lot of code blocks.

Embeddings stored for retrieval in local ChromaDB instance.

Results

You can check code in https://github.com/ineersa/symfony-docs-rag-test it's just a test completely vibe-coded just to test new theories and do Proof-of-concept discovery.

                 Benchmark Scores                    

┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Mode    ┃ Hit@1             ┃ Hit@5             ┃ Count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ strict  │ 597/1389 (42.98%) │ 840/1389 (60.48%) │ 1389  │
│ relaxed │ 767/1389 (55.22%) │ 956/1389 (68.83%) │ 1389  │
└─────────┴───────────────────┴───────────────────┴───────┘

And results are honestly not that bad, I was surprised by how good this naive approach worked.

But we will move forward and check more RAG variants!