Arcee AI: Spotlight

Server-rendered model summary page for indexing/share previews. Use the interactive explorer for full filtering and comparison.

Match confidence: UnmatchedSource type: openrouter_only

Context window

131.1K

Arena overall rank

—

Input price

$0.000 / 1M

Output price

$0.000 / 1M

Identifiers & provenance

Primary ID: arcee-ai/spotlight
OpenRouter ID: arcee-ai/spotlight
Canonical slug: arcee-ai/spotlight

Source semantics

Arena rank is a human-preference leaderboard signal, not a universal truth metric.
OpenRouter usage/popularity reflects adoption/traffic, not benchmark quality.
Pricing fields may differ by provider and can include extra modes beyond prompt/completion.

Read more on Methodology & data sources.

Description

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests.

Raw fields snapshot

{
  "id": "arcee-ai/spotlight",
  "name": "Arcee AI: Spotlight",
  "description": "Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests. ",
  "created": 1746481552,
  "canonical_slug": "arcee-ai/spotlight",
  "hugging_face_id": "",
  "source_type": "openrouter_only",
  "context_length": 131072,
  "max_completion_tokens": 65537,
  "is_moderated": false,
  "architecture": {
    "modality": "text+image->text",
    "input_modalities": [
      "image",
      "text"
    ],
    "output_modalities": [
      "text"
    ],
    "tokenizer": "Other",
    "instruct_type": null
  },
  "input_modalities": [
    "image",
    "text"
  ],
  "output_modalities": [
    "text"
  ],
  "modality": "text+image->text",
  "tokenizer": "Other",
  "instruct_type": null,
  "supported_parameters": [
    "frequency_penalty",
    "logit_bias",
    "max_tokens",
    "min_p",
    "presence_penalty",
    "repetition_penalty",
    "stop",
    "temperature",
    "top_k",
    "top_p"
  ],
  "default_parameters": {},
  "per_request_limits": null,
  "top_provider": {
    "context_length": 131072,
    "max_completion_tokens": 65537,
    "is_moderated": false
  },
  "pricing": {
    "prompt": "0.00000018",
    "completion": "0.00000018"
  },
  "PPM": {
    "prompt": 0.18,
    "completion": 0.18
  },
  "openrouter_raw": {
    "id": "arcee-ai/spotlight",
    "canonical_slug": "arcee-ai/spotlight",
    "hugging_face_id": "",
    "name": "Arcee AI: Spotlight",
    "created": 1746481552,
    "description": "Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests. ",
    "context_length": 131072,
    "architecture": {
      "modality": "text+image->text",
      "input_modalities": [
        "image",
        "text"
      ],
      "output_modalities": [
        "text"
      ],
      "tokenizer": "Other",
      "instruct_type": null
    },
    "pricing": {
      "prompt": "0.00000018",
      "completion": "0.00000018"
    },
    "top_provider": {
      "context_length": 131072,
      "max_completion_tokens": 65537,
      "is_moderated": false
    },
    "per_request_limits": null,
    "supported_parameters": [
      "frequency_penalty",
      "logit_bias",
      "max_tokens",
      "min_p",
      "presence_penalty",
      "repetition_penalty",
      "stop",
      "temperature",
      "top_k",
      "top_p"
    ],
    "default_parameters": {},
    "expiration_date": null
  }
}