← Back to explorer

ByteDance: UI-TARS 7B

Server-rendered model summary page for indexing/share previews. Use the interactive explorer for full filtering and comparison.

Match confidence: UnmatchedSource type: openrouter_only
Context window
128K
Arena overall rank
Input price
$0.000 / 1M
Output price
$0.000 / 1M

Identifiers & provenance

Primary ID
bytedance/ui-tars-1.5-7b
OpenRouter ID
bytedance/ui-tars-1.5-7b
Canonical slug
bytedance/ui-tars-1.5-7b

Source semantics

  • Arena rank is a human-preference leaderboard signal, not a universal truth metric.
  • OpenRouter usage/popularity reflects adoption/traffic, not benchmark quality.
  • Pricing fields may differ by provider and can include extra modes beyond prompt/completion.

Read more on Methodology & data sources.

Description

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Raw fields snapshot

{
  "id": "bytedance/ui-tars-1.5-7b",
  "name": "ByteDance: UI-TARS 7B ",
  "description": "UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces.\n\nThis model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.",
  "created": 1753205056,
  "canonical_slug": "bytedance/ui-tars-1.5-7b",
  "hugging_face_id": "ByteDance-Seed/UI-TARS-1.5-7B",
  "source_type": "openrouter_only",
  "context_length": 128000,
  "max_completion_tokens": 2048,
  "is_moderated": false,
  "architecture": {
    "modality": "text+image->text",
    "input_modalities": [
      "image",
      "text"
    ],
    "output_modalities": [
      "text"
    ],
    "tokenizer": "Other",
    "instruct_type": null
  },
  "input_modalities": [
    "image",
    "text"
  ],
  "output_modalities": [
    "text"
  ],
  "modality": "text+image->text",
  "tokenizer": "Other",
  "instruct_type": null,
  "supported_parameters": [
    "frequency_penalty",
    "logit_bias",
    "max_tokens",
    "presence_penalty",
    "repetition_penalty",
    "seed",
    "stop",
    "temperature",
    "top_k",
    "top_p"
  ],
  "default_parameters": {},
  "per_request_limits": null,
  "top_provider": {
    "context_length": 128000,
    "max_completion_tokens": 2048,
    "is_moderated": false
  },
  "pricing": {
    "prompt": "0.0000001",
    "completion": "0.0000002"
  },
  "PPM": {
    "prompt": 0.1,
    "completion": 0.2
  },
  "openrouter_raw": {
    "id": "bytedance/ui-tars-1.5-7b",
    "canonical_slug": "bytedance/ui-tars-1.5-7b",
    "hugging_face_id": "ByteDance-Seed/UI-TARS-1.5-7B",
    "name": "ByteDance: UI-TARS 7B ",
    "created": 1753205056,
    "description": "UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces.\n\nThis model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.",
    "context_length": 128000,
    "architecture": {
      "modality": "text+image->text",
      "input_modalities": [
        "image",
        "text"
      ],
      "output_modalities": [
        "text"
      ],
      "tokenizer": "Other",
      "instruct_type": null
    },
    "pricing": {
      "prompt": "0.0000001",
      "completion": "0.0000002"
    },
    "top_provider": {
      "context_length": 128000,
      "max_completion_tokens": 2048,
      "is_moderated": false
    },
    "per_request_limits": null,
    "supported_parameters": [
      "frequency_penalty",
      "logit_bias",
      "max_tokens",
      "presence_penalty",
      "repetition_penalty",
      "seed",
      "stop",
      "temperature",
      "top_k",
      "top_p"
    ],
    "default_parameters": {},
    "expiration_date": null
  }
}
ByteDance: UI-TARS 7B · NNZen