---
type: Glossary Term
title: Tokens Per Second (TPS)
description: Tokens Per Second (TPS) is the primary throughput metric for evaluating AI language model inference performance. It measures how many tokens a model generates p
resource: "https://www.contextstudios.ai/glossary/tokens-per-second"
category: infrastructure
language: en
timestamp: "2026-03-18T09:56:02.987Z"
---

# Tokens Per Second (TPS)

Tokens Per Second (TPS) is the primary throughput metric for evaluating AI language model inference performance. It measures how many tokens a model generates per second after the generation process has begun. TPS and Time-to-First-Token (TTFT) jointly determine the overall user experience quality.

A token roughly corresponds to 0.75 words in English or 0.5–0.6 words in other languages. Typical TPS benchmarks: Groq's LPU achieves 500–800 TPS for 7B parameter models; Anthropic's Claude API delivers 30–100 TPS depending on model tier; self-hosted open-source models on a single H100 GPU achieve 50–200 TPS depending on model size.

TPS influences UX in two distinct ways. For short responses (up to ~500 tokens), TTFT dominates perceived responsiveness. For long outputs — documents, code, analyses — TPS becomes the determining factor. At 30 TPS, generating a 3,000-word document takes ~80 seconds; at 200 TPS, ~12 seconds. For voice AI systems, a minimum TPS of 100 is necessary for speech synthesis without perceptible gaps.

Factors affecting TPS: model size (larger = lower TPS per request), quantization level (FP4 > FP8 > BF16 in throughput), batch size (larger batches increase aggregate TPS but lower individual TPS), hardware, and KV-cache utilization patterns.

## Business Value

TPS bestimmt direkt die maximale Ausgabegröße einer KI-Lösung. Workflows, die lange Dokumente generieren, sind ohne ausreichende TPS nicht praktikabel.

## Context Studios Perspective

Wir wählen Modell-Tiers bei Context Studios basierend auf TPS-Anforderungen: Voice-Pipelines brauchen >100 TPS; Analyse-Agents arbeiten problemlos mit 30–50 TPS.
