API indexscripts/run_locomo.py

LOCOMO Benchmark

Benchmark runner for measuring CME retrieval quality using LOCOMO QA pairs.

Use this for CI integration and evaluating search quality across modes.

Endpoints

Service surface

POST

python -m scripts.run_locomo --api-url <url> --tenant <id> --search-mode hybrid

Run LOCOMO benchmark against CME API with specified search mode.

N/A

Run hybrid benchmark

Execute LOCOMO with hybrid search.

python -m scripts.run_locomo \ --api-url http://localhost:8080 \ --tenant <tenant-id> \ --search-mode hybrid \ --graph \ --output-format json

Request example

CLI invocation

Run LOCOMO with vector-only mode for ablation study.

python -m scripts.run_locomo \
  --api-url http://localhost:8080 \
  --tenant <tenant-id> \
  --search-mode vector-only \
  --k 10 \
  --output-format json

Base path

scripts/run_locomo.py

Schemas

OpenAPI-style field tables

CLI arguments

Supported command-line arguments for run_locomo.py.

FieldTypeRequiredDescription
--api-urlstringrequiredCME API base URL.
--tenantstringrequiredTenant ID for test data.
--search-modestringoptionalSearch mode: vector-only, hybrid, hybrid+graph (default hybrid).
--graphbooleanoptionalEnable graph retrieval context.
--knumberoptionalNumber of results to retrieve (default 10).
--output-formatstringoptionalOutput format: json (default) or text.
--dataset-pathstringoptionalPath to LOCOMO dataset (downloads if not set).
--harness-dirstringoptionalPath to LOCOMO harness directory containing datasets/.

Score metrics

Composite scores computed by the benchmark runner.

FieldTypeRequiredDescription
recall@10numberrequiredProportion of relevant results in top 10.
mrrnumberrequiredMean reciprocal rank of first relevant result.
ndcg@10numberrequiredNormalized discounted cumulative gain at 10.

Response examples

What the API returns

Benchmark output

JSON output with composite scores for CI integration.

{
  "mode": "hybrid",
  "graph": true,
  "scores": {
    "recall@10": 0.87,
    "mrr": 0.72,
    "ndcg@10": 0.81
  },
  "total_queries": 150,
  "passed": true
}

Notes

Implementation notes

  • The benchmark runner falls back to synthetic smoke-test data if LOCOMO dataset is unavailable.
  • Supports ablation studies: compare vector-only vs hybrid vs hybrid+graph modes.
  • Set CME_API_KEY environment variable for API authentication, or use session auth.
  • Output JSON format is designed for CI integration and trending over time.