◎Retrieval & Memory

Context that knows
what matters.

Retrieval runs automatically inside the engine on every intent call — you never invoke it directly. Hybrid RAG pipelines with semantic chunking, metadata filtering, cross-encoder re-ranking, and multi-tier memory. The right context, every time.

Start building View docs

retrieval.sh

curl -X POST https://api.liyaengine.com/v1/retrieval/query \
  -H "x-api-key: $LIYA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "FHIR-compliant medication reconciliation requirements",
    "sources": ["clinical-policies-v3", "cms-guidelines-2026"],
    "options": {
      "mode": "hybrid",
      "top_k": 8,
      "rerank": true,
      "filters": {
        "document_type": "policy",
        "effective_after": "2025-01-01"
      }
    }
  }'

# {
#   "passages": [...],   // re-ranked context passages
#   "sources":  [...],   // provenance for each passage
#   "latency":  { "vector_ms": 18, "rerank_ms": 22 }
# }

Retrieval

Beyond naive chunking

◎

Semantic Chunking

Documents are split at natural content boundaries — section headers, paragraphs, list items — not fixed character offsets. Coherent chunks produce far better retrieval than naive splitting.

⬡

Hybrid Search

Dense vector search and BM25 keyword search run in parallel. Results are combined via reciprocal rank fusion, giving you semantic similarity and exact-term precision simultaneously.

◈

Metadata Filtering

Apply structured filters before vector search to narrow the candidate set — by document version, date range, source, tag, or any custom attribute. Cuts latency and improves precision.

◇

Cross-Encoder Re-Ranking

After initial retrieval, a cross-encoder model re-ranks the top-20 candidates against the query. Consistently improves end-to-end answer quality for knowledge-intensive tasks.

◬

Bring Your Own Vector Store

Connect Pinecone, Weaviate, Qdrant, pgvector, or any vector store via the adapter API. Liya Engine handles chunking, embedding, and retrieval orchestration — you own the data layer.

▷

Streaming Retrieval

Retrieve and stream context passages to agents mid-generation. Agents can issue additional retrieval queries without blocking the response stream.

Memory

Three tiers of agent memory

Different tasks need different memory strategies. Liya Engine provides three tiers — choose one, or combine all three in a single agent.

Short-Term Buffer

Active conversations

A rolling window of recent turns or tokens included in every model call. Fast, zero-latency, ideal for active single-session conversations.

< 1ms

avg latency

Episodic Memory

Multi-session agents

Structured summaries of past sessions — entities, decisions, open questions — retrieved and injected at the start of new sessions. Gives agents memory across days and weeks.

~15ms

avg latency

Semantic Store

Knowledge-intensive tasks

Long-term knowledge in a vector store, retrieved on demand during agent execution. Handles large corpora that can't fit in context. The foundation of RAG-powered agents.

~40ms

avg latency

Context that knowswhat matters.