GPU / VRAM Calculator

Estimate VRAM for LLM. Factors: model weights, KV-cache, activations, quantization, batch size and users.

Model & Architecture

Quantization & Precision

Concurrent Users

VRAM Breakdown — single req

Model Weights
KV Cache (1 req)
Activation Memory
Framework Overhead
Modality Overhead
Shared Cost
Per-User Cost
Total — 1 User

Multi-User / Server VRAM

shared + per user

Shared (weights + overhead)
Per-User × 1 users
Modality × Safety Margin
Total — Server VRAM

Throughput & Queuing

Active Batch (parallel)
Users Queued
Gen Speed (single stream)
Total Server Throughput
Per-User Speed
Time to First Token (est.)

GPU Compatibility

Sized for 1 user.

GPUVRAMBW# GPUsMax UsersGen TPSPer-User TPSFitCostAction