Logo
Loading...
Published on

Symfony DOCS RAG experiments. PART 3

Author

In https://blog.ineersa.com/post/symfony-docs-rag-experiments-part-2 we implemented "Tree-RAG" by splitting our documentation into TOC tree, building summaries for each leaf and building vectorless RAG.

While results where fascinating, speed wise it wasn't the best. So what if we will combine our Vector search with our tree/vectorless RAG? In theory this should give us "Teleport" feature to provide relevant nodes right into LLM without traversing a tree.

Hybrid retrieval

The retrieval process follows a multi-stage pipeline:

  1. Vector Shortlist:

    • The query is embedded and searched against the local ChromaDB vector index.
    • Top vector_top_n chunks are retrieved.
    • Chunks are mapped to PageIndex Nodes (sections/topics) via line number overlap or breadcrumb matching.
    • Initial scores are assigned based on vector distance and rank.
  2. Neighbor Expansion:

    • Top scoring nodes from the vector step act as strict "seeds".
    • The graph is traversed to include related nodes (parents, children, siblings) up to neighbor_depth.
    • Scores decay as distance from the seed increases (e.g., Parent: 0.75x, Child: 0.7x, Sibling: 0.6x).
    • Top matching files are also injected to ensure coverage of highly relevant documents even if specific section chunks were missed.
  3. Reciprocal Rank Fusion (RRF):

    • A Lexical Score is calculated for all candidates using token overlap (recall-weighted Jaccard) on title + summary + text.
    • It applies light stemming and stopword removal to ensure robust keyword matching.
    • The Vector-Expanded Score and Lexical Score are combined using RRF (k=60).
    • This ensures that a node must ideally be both semantically similar (vector) and keyword-relevant (lexical) to rank highly.
  4. LLM Reranking (Optional):

    • The top candidate_cap nodes after RRF are sent to a local LLM.
    • The LLM is prompted to pick the best top_k nodes that "most directly answer the question".
    • The final list is returned to the user.

Logic Diagram

                                  User Query
                                       │
                                       ▼
                             ┌───────────────────┐
                             │  Vector Search    │ (ChromaDB)
                             │   (Top N Chunks)  │
                             └─────────┬─────────┘
                                       │
                                       ▼
                             ┌───────────────────┐
                             │  Map to PageIndex │ (Chunk -> Node)
                             │      Nodes        │
                             └─────────┬─────────┘
                                       │
                                       ▼
    ┌───────────────────────┐    ┌─────┴────────────────┐
    │  Neighbor Expansion   │◄───│  Initial Node Scores │
    │ (Parents, Children)   │    └──────────────────────┘
    └──────────┬────────────┘
               │
               ▼
    ┌───────────────────────┐    ┌──────────────────────┐
    │  Vector/Graph Score   │    │    Lexical Score     │
    │      (Rank List A)    │    │    (Rank List B)     │
    └──────────┬────────────┘    └──────────┬───────────┘
               │                            │
               └──────────────┬─────────────┘
                              │
                              ▼
                     ┌───────────────────┐
                     │        RRF        │ (Reciprocal Rank Fusion)
                     │    Combination    │
                     └─────────┬─────────┘
                               │
                               ▼
                     ┌───────────────────┐
                     │   Top Candidates  │ (Cap: 30)
                     └─────────┬─────────┘
                               │
                    (If LLM Rerank Enabled)
                               │
                               ▼
                     ┌───────────────────┐
                     │    LLM Rerank     │ (select top K)
                     └─────────┬─────────┘
                               │
                               ▼
                        Final Top K Hits

Results

So we actually improved our Tree RAG accuracy by using "Teleport" feature, not by much but it's something.

Examples of benchmarks:

uv run python benchmark_run.py \                                                                                                                                                          
  --mode both \                                                                                                                                                                                                                 
  --predictor hybrid \
  --questions data/benchmark/questions.jsonl \
  --hybrid-model local-model \
  --hybrid-vector-top-n 40 \
  --hybrid-candidate-cap 30 \
  --hybrid-neighbor-depth 1 \
  --hybrid-siblings-per-node 2 \
  --pageindex-final-summary-chars 500 \                                                                          
  --pageindex-final-text-chars 2000 \
  --sample-size 100 \
  --sample-seed 42

                  Benchmark Scores                   
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Mode    ┃ Hit@1          ┃ Hit@5          ┃ Count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ strict  │ 62/100 (62.0%) │ 74/100 (74.0%) │ 100   │
│ relaxed │ 68/100 (68.0%) │ 78/100 (78.0%) │ 100   │
└─────────┴────────────────┴────────────────┴───────┘

Speed wise it's not that much faster tho, around 2 times over Tree RAG, we still run through a lot of candidates and using ~15k tokens per query. But it's improvement over Naive PageIndex like Tree RAG implementation.

Reducing top-n, candidate cap and lowering characters we pass speeds things up a lot, but affects performance, so need to find acceptable balance between quality and speed.

Also, larger General model, larger embeddings could affect it drastically, as we use only 137M embeddings model and Qwen3 4b non-reasoning model for this running locally.

As a next steps we can go to:

  • tune up our vector RAG part by adding BM25, HyDE
  • try to swap to summaries with HyDE step