Skip to content

stepfun-ai/Step3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StepFun: Cost-Effective Multimodal Intelligence

Chat Homepage
Hugging Face ModelScope Twitter Follow
Discord License
📰  Step3 Model Blog     |     📄  Step3 System Tech Report

Introduction

Step3 is our cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

Step3 model card:

Config Value
Number of Layers (Dense layer included) 61
Number of Dense Layers 5
Hidden Dimension 7168
Attention Mechanism MFA
Low-rank Query Dimension 2048
Number of Query Heads 64
Head Dimension 256
Number of Experts 48
Selected Experts per Token 3
Number of Shared Experts 1
Max Context Length 65536
Tokenizer Deepseek V3
Total Parameters (LLM) 316B
Activated Params per Token 38B
Total Parameters (VLM) 321B

Evaluation Results

Image

Deployment

You can access Step3's API on https://platform.stepfun.com/ , we provide OpenAI/Anthropic-compatible API for you.

Our model checkpoints are stored in bf16 and block-fp8 format, you can find it on Huggingface.

Currently, it is recommended to run Step3 on the following inference engines:

  • vLLM
  • SGLang

Deployment and Request examples for vLLM and SGLang can be found in the Model Deployment Guide.

Contact Us

If you have any questions, please reach out at [email protected] .

License

Both the code repository and the model weights are released under the Apache License (Version 2.0).

Citation

@misc{step3system,
      title={Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding}, 
      author={StepFun Team},
      year={2025},
      eprint={2507.19427},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.19427}, 
}

@misc{step3blog,
      title={Step3: Cost-Effective Multimodal Intelligence}, 
      author={StepFun Team},
      url={https://stepfun.ai/research/step3}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published