Step 3.5 Flash – Open-source foundation model, supports deep reasoning at speed

Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act. 2026-02-12 updated · StepFun Step 3.5 Flash | GitHub HuggingFace Tech Report New ModelScope OpenClaw Guidance 🔥Hot Score 82 80 78 76 81.0 Step 3.5 Flash Params (B) 196 Avg Score 81.0 78.5 GLM-4.7 Params (B) 355 Avg Score 78.5 77.3 DeepSeek V3.2 Params (B) 671 Avg Score 77.3 80.5 Kimi K2.5 Params (B) 1000 Avg Score 80.5 80.7 Gemini 3.0 Pro Params (B) Unknown Avg Score 80.7 80.6 Claude Opus 4.5 Params (B) Unknown Avg Score 80.6 82.2 GPT-5.2 xhigh Params (B) Unknown Avg Score 82.2 200 400 600 800 1000 Likely >1000 Total Model Parameters (B) Scores represent the mean of the following eight benchmarks listed below, excluding xbench-DeepSearch. The Step 3.5 Flash score is derived under standard settings (i.e., $w/o$ Parallel Thinking). Step 3.5 Flash is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This “intelligence density” allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction. Deep Reasoning at Speed: While chatbots are built for reading, agents must reason fast. Powered by 3-way Multi-Token Prediction (MTP-3), Step 3.5 Flash achieves a generation throughput of 100–300 tok/s in typical usage (peaking at 350 tok/s for single-stream coding tasks). This allows for complex, multi-step reasoning chains with immediate responsiveness. A Robust Engine for Coding & Agents: Step 3.5 Flash is purpose-built for agentic tasks, integrating a scalable RL framework that drives consistent self-improvement. It achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0, proving its ability to handle sophisticated, long-horizon tasks with unwavering stability. Efficient Long Context: The model supports a cost-efficient 256K context window by employing a 3:1 Sliding Window Attention (SWA) ratio—integrating three SWA layers for every one full-attention layer. This hybrid approach ensures consistent performance across massive datasets or long codebases while significantly reducing the computational overhead typical of standard long-context models. Accessible Local Deployment: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance. Reasoning AIME 2025 100 90 80 97.3 Step 3.5 Flash Score: 97.3 Params: 196B 99.9 Step 3.5 Flash (PaCoRe) Score: 99.9 Params: 196B 95.7 GLM-4.7 Score: 95.7 Params: 355B 93.1 DeepSeek V3.2 Score: 93.1 Params: 671B 96.1 Kimi K2.5 Score: 96.1 Params: 1T 95.0 Gemini 3.0 Pro Score: 95.0 Params: Unknown 92.8 Claude Opus 4.5 Score: 92.8 Params: Unknown 100.0 GPT-5.2 xhigh Score: 100.0 Params: Unknown

Source: Hacker News | Original Link