GPU / VRAM Calculator
Estimate VRAM for LLM. Factors: model weights, KV-cache, activations, quantization, batch size and users.
Model & Architecture
Quantization & Precision
Concurrent Users
VRAM Breakdown — single req
Model Weights –
KV Cache (1 req) –
Activation Memory –
Framework Overhead –
Modality Overhead –
Shared Cost –
Per-User Cost –
Total — 1 User–
Multi-User / Server VRAM
– shared + – per user
Shared (weights + overhead)
Per-User × 1 users –
Modality × Safety Margin –
Total — Server VRAM –
Throughput & Queuing
Active Batch (parallel) –
Users Queued –
Gen Speed (single stream) –
Total Server Throughput –
Per-User Speed –
Time to First Token (est.) –
GPU Compatibility
Sized for 1 user.
| GPU | VRAM | BW | # GPUs | Max Users | Gen TPS | Per-User TPS | Fit | Cost | Action |
|---|