Question 1

What is Step1X-Edit?

Accepted Answer

Step1X-Edit-v1p2 (ReasonEdit-S) is StepFun's reasoning image-edit model — released November 2025 under Apache 2.0. Where Qwen-Image-Edit follows direct instructions ("make the sky purple"), Step1X-Edit reasons about multi-step, referential, and abstract edits ("match the lighting in this room to a golden-hour sunset", "fix what looks anatomically wrong about her left hand"). It runs a thinking pass + reflection rounds around the diffusion forward, trained to match GPT-4o / Gemini-2-Flash edit quality on KRIS-Bench and GEdit-Bench. Self-hosted on Free.ai's H200 — no upstream provider, no markup.

Question 2

How is Step1X-Edit different from Qwen-Image-Edit?

Accepted Answer

Qwen-Image-Edit-2511 is the fastest free instruction-edit on Free.ai — ~10-15 seconds per edit, MMDiT backbone with a Qwen2.5-VL text encoder. Step1X-Edit is the slower, smarter sibling: a DiT decoder plus a separate MLLM front-end that runs a thinking pass (reformat the prompt for the diffuser) and an optional reflection pass (compare candidates, pick the best). Use Qwen for direct edits; use Step1X when the prompt needs interpretation, has multiple clauses, or references something outside the frame.

Question 3

How is Step1X-Edit different from FLUX Kontext / Seedream / Nano Banana?

Accepted Answer

Those are premium models that route through paid providers (fal.ai upstream) and cost 30K-112K tokens per edit. Step1X-Edit is fully self-hosted on Free.ai's H200 under Apache 2.0 and costs ~2,000 tokens. Premium models are still ahead on raw fidelity for product photography and graphic-design preservation; Step1X wins on reasoning prompts and is the strongest open alternative we ship.

Question 4

Which prompts work best with Step1X-Edit?

Accepted Answer

Multi-step ("remove the watermark and then warm the color temperature"), referential ("match the framing of a 1985 Polaroid"), interpretive ("make her expression look more confident"), repair-style ("fix the hand", "straighten the horizon"), and abstract ("make this look like a memory"). For plain "change X to Y" edits, Qwen-Image-Edit is faster.

Question 5

How much does Step1X-Edit cost?

Accepted Answer

~2,000 tokens per edit — double the standard image-edit rate (1,000 tokens) because the thinking + reflection passes roughly double wall-clock time vs Qwen-Image-Edit. Anonymous users get 2,500 free tokens/day; signed-in users get 30,000/day — enough for ~15 reasoning edits daily without paying.

Question 6

How long does an edit take?

Accepted Answer

~25-50 seconds per 1024-side edit with thinking + reflection enabled (default). Turning off reflection (enable_reflection=false) drops it to ~15-25 seconds with a small accuracy hit on multi-step prompts. The reflection pass catches the "fixed the hand but accidentally changed the background" failure mode that plain diffusion edits often drift into.

Question 7

What license is Step1X-Edit under?

Accepted Answer

Apache 2.0 — both the weights (huggingface.co/stepfun-ai/Step1X-Edit-v1p2) and the GitHub repo (github.com/stepfun-ai/Step1X-Edit). No territorial carve-outs, no MAU cap, no non-commercial rider, no research-only clause. The images you generate are yours to use commercially with no royalties.

Question 8

What VRAM and hardware does it need?

Accepted Answer

~22 GB resident peak with model CPU offload (transformer + VAE on GPU during forward; MLLM + text encoder paged from CPU). 41.8 GB on disk. We reserve a 24 GB slot on the H200 and the wrapper aborts at startup if free VRAM dips below 18 GB. To self-host you'd need a 24 GB consumer card (RTX 4090) at minimum, ideally 40 GB+ for headroom.

Question 9

Can I disable the thinking / reflection passes for speed?

Accepted Answer

Yes — pass enable_thinking=false to skip the MLLM's prompt-reformatting pass, or enable_reflection=false to skip the multi-candidate selection round. With both off, Step1X behaves like a vanilla DiT image-editor at ~12-15 s/edit. We keep both on by default because that's what the model is trained to do and it materially beats plain diffusion on the benchmark suites.

Question 10

Which architectures power it?

Accepted Answer

Step1X-Edit-v1p2 (ReasonEdit-S) couples a DiT-based decoder with a Qwen2.5-VL family MLLM front-end. The MLLM interprets the edit instruction; the DiT decoder paints the edit. The diffusers pipeline class is Step1XEditPipelineV1P2 (lives on a Peyton-Chen fork of diffusers, branch step1xedit_v1p2). RegionE optionally accelerates inference by skipping diffusion in regions the MLLM marks as unchanged.

Question 11

Are uploads stored or used for training?

Accepted Answer

Uploaded images are deleted immediately after the edit completes. Output sits on our CDN for 24 hours (7 days for paid users) so you can re-download from /account/?tab=history. Never used for training. Privacy policy at /privacy/.

Question 12

Is there an API?

Accepted Answer

Yes — POST multipart to /v1/image/edit/ with image, model=step1x-edit, operation=img2img (or inpaint / outpaint / style_transfer / etc.), prompt, optional enable_thinking, enable_reflection, steps (default 50), guidance_scale (default 6.0). Bearer auth, 10K tokens/month free. /api/ has curl examples.

Step1X-Edit v1p2

Frequently Asked Questions