Data sources & semantics

NNZen is a comparison explorer. It combines data from multiple sources and preserves source semantics instead of collapsing everything into a single “score”.

What data comes from where

OpenRouter model metadata: raw fields from OpenRouter returned in /api/v1/models/all (pricing, context length, capabilities/parameters, provider-specific information).
Arena leaderboard metrics: human-preference leaderboard signals (pairwise voting style metrics, category-specific scores/ranks).
LiteLLM metadata overlay: reviewed pricing/context/capability metadata for Arena-only rows when no OpenRouter runtime row exists. LiteLLM never creates a public OpenRouter runtime alias by itself.
Merged snapshot: NNZen’s normalized dataset after alias matching, profile-link enrichment, LiteLLM overlay enrichment, and validation.

A single model row can therefore mix sources: IDs and pricing may come from OpenRouter, ranks and category scores come from Arena, and old Arena-only rows may receive reviewed LiteLLM overlay metadata. The merged row is a projection with provenance, not a claim that one upstream source owns every field.

Main merge strategy

The production strategy is conservative: OpenRouter remains runtime truth, Arena remains evaluation truth, and LiteLLM is metadata overlay enrichment. Plain aliases are allowed only for reviewed identity matches. Search, grounding, high, thinking, and harness labels are stored as Arena eval profiles rather than collapsed into runtime model IDs.

Aliases connect names that are the same model identity.
Profile links connect Arena search/thinking/high/harness rows to their base runtime without changing the runtime ID.
Negative links pin known false friends so future fuzzy matching cannot accidentally connect them.
LiteLLM links add reviewed pricing/context metadata only to Arena-only rows.

Rank semantics (important)

Arena leaderboard rank

Human-preference leaderboard signal. Useful for relative model performance in Arena categories, but still constrained by source coverage and availability.

Arena rank is not a universal product score. A model can be ranked in one category, absent in another, and missing entirely from one snapshot while still being strong in practice. OpenRouter columns are shown as raw metadata from /api/v1/models/all. No usage/popularity ranking is applied here.

Freshness & snapshots

The explorer UI shows per-source timestamps (OpenRouter, Arena, merged snapshot) and a snapshot version when available. If one source fails or lags, NNZen may show a partial/older snapshot rather than a blank page.

API status metadata includes source_timestamps and source_health so clients can see which source is fresh and which one is lagging.
snapshot_version identifies the current merged snapshot, while last_successful_snapshot is the last known-good merge recorded by the backend.
When using_last_good_snapshot is true, the UI is intentionally serving the last healthy merged snapshot because the newest primary merge is unavailable.
last_merge_diff is a change summary, not a quality score. It tells you how many rows changed between merged snapshots, not whether the newer snapshot is “better”.

Known limitations

Model names differ across providers/sources; alias matching may require manual overrides.
Some pricing fields are provider-specific and not directly comparable across all models.
A missing rank does not always mean a model is weak; it may simply be absent from the source snapshot.
Some Arena rows exist in a snapshot without a stable Arena model ID in that snapshot's identity catalog. NNZen keeps those rows visible and now surfaces the missing-ID reason instead of inventing an ID.
OpenRouter endpoint-performance fields such as TTFT and generation speed are percentile snapshots. When the source reports them as missing or non-positive, NNZen hides those values instead of showing misleading zeroes.
Detailed field-level explanations for OpenRouter-specific metadata are still being expanded.

How to verify an odd-looking row

Check source_timestamps first. If Arena is older than OpenRouter, mixed freshness is expected.
Check whether using_last_good_snapshot is true before assuming the newest merge already shipped.
If Arena ID is blank, inspect arena_model_id_missing_reason before treating the row as a parser bug.
If TTFT or Tok/s is blank, treat that as “not usable from source right now”, not as a literal zero-performance claim.

Want to verify raw data? Use the public API endpoint /api/v1/models/all, then compare source timestamps and snapshot version with the UI summaries shown here.

Looking for model-specific details? Open any model page from the explorer for server-rendered identifiers, pricing, and source semantics.

Preparing Neural Aggregator

Streaming the page shell before the model grid is mounted.

What data comes from where

OpenRouter model metadata: raw fields from OpenRouter returned in /api/v1/models/all (pricing, context length, capabilities/parameters, provider-specific information).

Arena leaderboard metrics: human-preference leaderboard signals (pairwise voting style metrics, category-specific scores/ranks).

LiteLLM metadata overlay: reviewed pricing/context/capability metadata for Arena-only rows when no OpenRouter runtime row exists. LiteLLM never creates a public OpenRouter runtime alias by itself.

Merged snapshot: NNZen’s normalized dataset after alias matching, profile-link enrichment, LiteLLM overlay enrichment, and validation.

Main merge strategy

Aliases connect names that are the same model identity.

Profile links connect Arena search/thinking/high/harness rows to their base runtime without changing the runtime ID.

Negative links pin known false friends so future fuzzy matching cannot accidentally connect them.

LiteLLM links add reviewed pricing/context metadata only to Arena-only rows.

Rank semantics (important)

Arena leaderboard rank

Human-preference leaderboard signal. Useful for relative model performance in Arena categories, but still constrained by source coverage and availability.

Freshness & snapshots

API status metadata includes source_timestamps and source_health so clients can see which source is fresh and which one is lagging.

snapshot_version identifies the current merged snapshot, while last_successful_snapshot is the last known-good merge recorded by the backend.

When using_last_good_snapshot is true, the UI is intentionally serving the last healthy merged snapshot because the newest primary merge is unavailable.

last_merge_diff is a change summary, not a quality score. It tells you how many rows changed between merged snapshots, not whether the newer snapshot is “better”.

Known limitations

Model names differ across providers/sources; alias matching may require manual overrides.

Some pricing fields are provider-specific and not directly comparable across all models.

A missing rank does not always mean a model is weak; it may simply be absent from the source snapshot.

Some Arena rows exist in a snapshot without a stable Arena model ID in that snapshot's identity catalog. NNZen keeps those rows visible and now surfaces the missing-ID reason instead of inventing an ID.

OpenRouter endpoint-performance fields such as TTFT and generation speed are percentile snapshots. When the source reports them as missing or non-positive, NNZen hides those values instead of showing misleading zeroes.

Detailed field-level explanations for OpenRouter-specific metadata are still being expanded.

How to verify an odd-looking row

Check source_timestamps first. If Arena is older than OpenRouter, mixed freshness is expected.

Check whether using_last_good_snapshot is true before assuming the newest merge already shipped.

If Arena ID is blank, inspect arena_model_id_missing_reason before treating the row as a parser bug.

If TTFT or Tok/s is blank, treat that as “not usable from source right now”, not as a literal zero-performance claim.