Kimi K2.5 - Model Report

Overview

Attribute	Details
Developer	Moonshot AI (China)
Release Date	January 27, 2026
Model Type	Native Multimodal Agentic Model
License	Modified MIT License

Architecture

Specification	Value
Total Parameters	1.04 Trillion
Active Parameters	32 Billion
Architecture	Mixture of Experts (MoE)
Number of Experts	384 (8 selected per token + 1 shared)
Vision Encoder	400 Million parameters
Context Window	256K tokens
Modalities	Text, Image, Video

Training

Aspect	Details
Training Data	~15 Trillion mixed visual and text tokens
Base Model	Kimi-K2-Base (continual pretraining)
Quantization	Native INT4 (QAT), Group size 32
Optimization	Hopper Architecture optimized

Key Features

Agent Swarm Orchestration: Can spawn and manage up to 100 agents per prompt
Multi-Agent Task Decomposition: Decomposes complex tasks into parallel sub-tasks
Thinking Mode: Includes reasoning traces with reasoning_content (temp=1.0)
Instant Mode: Direct responses without reasoning traces (temp=0.6)
Native Multimodal: Built-in text, image, and video understanding
Native INT4 Inference: ~2x generation speed improvement via QAT

Benchmarks

Benchmark	Score
Hallucinations	100%
General Knowledge	100%
Reasoning	100%
Ethics	100%
Mathematics	96.8% (97th percentile)
Coding	92.0% (76th percentile)

Pricing

Platform	Details
Self-Hosted	Free (open source)
NVIDIA NIM	Available via NIM catalog
Cloud APIs	Various providers

Open Source Availability

Platform	Status
Hugging Face	moonshotai/Kimi-K2.5
GGUF Quants	unsloth/Kimi-K2-Instruct-GGUF
Weights Download	Available
Self-Hosting	Possible (requires significant hardware)

Minimum Hardware for Self-Hosting

Memory Requirements

Quantization	Model Size	Min Memory (RAM+VRAM+Disk)
1.8-bit GGUF	~247GB	250GB
2-bit XL	~300GB	300GB+
Q8 (Full)	~1.09TB	8x H200 GPUs
FP8	~1TB	Enterprise GPU cluster

Apple Hardware Options

Requirement	Minimum	Recommended
Product	Mac Studio (M3 Ultra)	Mac Studio (M3 Ultra)
Unified Memory	256GB	512GB
Storage	500GB+ NVMe	1TB+ NVMe
Quantization	1.8-bit GGUF	2-bit or higher
Expected Speed	1-2 tokens/sec	5+ tokens/sec
Approx. Cost	~$8,000	~$12,000

Why Mac Studio M3 Ultra?

Apple Product	Max Unified Memory	Sufficient?
MacBook Pro M4 Max	128GB	No
Mac Studio M4 Max	128GB	No
Mac Studio M3 Ultra	512GB	Yes
Mac Pro M2 Ultra	192GB	Borderline

Note: The Mac Studio with M3 Ultra (512GB unified memory) is currently the only consumer Apple product capable of running Kimi K2.5 locally. Lower memory configurations will experience severe performance degradation due to disk swapping.

Performance Expectations

Setup	VRAM	RAM	Speed
RTX 4090 + 256GB RAM	24GB	256GB	1-2 tok/s
Mac Studio M3 Ultra 512GB	512GB unified	-	3-5 tok/s
2x A100 80GB	160GB	512GB	15-20 tok/s
8x H200	1.1TB	-	~45 tok/s

Sources