Platform
Retrieval & Memory

Context that knows
what matters.

Retrieval runs automatically inside the engine on every intent call — you never invoke it directly. Hybrid RAG pipelines with semantic chunking, metadata filtering, cross-encoder re-ranking, and multi-tier memory. The right context, every time.

retrieval.sh
curl -X POST https://api.liyaengine.com/v1/retrieval/query \
  -H "x-api-key: $LIYA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "FHIR-compliant medication reconciliation requirements",
    "sources": ["clinical-policies-v3", "cms-guidelines-2026"],
    "options": {
      "mode": "hybrid",
      "top_k": 8,
      "rerank": true,
      "filters": {
        "document_type": "policy",
        "effective_after": "2025-01-01"
      }
    }
  }'

# {
#   "passages": [...],   // re-ranked context passages
#   "sources":  [...],   // provenance for each passage
#   "latency":  { "vector_ms": 18, "rerank_ms": 22 }
# }
Retrieval

Beyond naive chunking

Semantic Chunking

Documents are split at natural content boundaries — section headers, paragraphs, list items — not fixed character offsets. Coherent chunks produce far better retrieval than naive splitting.

Hybrid Search

Dense vector search and BM25 keyword search run in parallel. Results are combined via reciprocal rank fusion, giving you semantic similarity and exact-term precision simultaneously.

Metadata Filtering

Apply structured filters before vector search to narrow the candidate set — by document version, date range, source, tag, or any custom attribute. Cuts latency and improves precision.

Cross-Encoder Re-Ranking

After initial retrieval, a cross-encoder model re-ranks the top-20 candidates against the query. Consistently improves end-to-end answer quality for knowledge-intensive tasks.

Bring Your Own Vector Store

Connect Pinecone, Weaviate, Qdrant, pgvector, or any vector store via the adapter API. Liya Engine handles chunking, embedding, and retrieval orchestration — you own the data layer.

Streaming Retrieval

Retrieve and stream context passages to agents mid-generation. Agents can issue additional retrieval queries without blocking the response stream.

Memory

Three tiers of agent memory

Different tasks need different memory strategies. Liya Engine provides three tiers — choose one, or combine all three in a single agent.

Short-Term Buffer
Active conversations

A rolling window of recent turns or tokens included in every model call. Fast, zero-latency, ideal for active single-session conversations.

< 1ms
avg latency
Episodic Memory
Multi-session agents

Structured summaries of past sessions — entities, decisions, open questions — retrieved and injected at the start of new sessions. Gives agents memory across days and weeks.

~15ms
avg latency
Semantic Store
Knowledge-intensive tasks

Long-term knowledge in a vector store, retrieved on demand during agent execution. Handles large corpora that can't fit in context. The foundation of RAG-powered agents.

~40ms
avg latency