Step1X-Edit v1p2

Free.ai (self-hosted) · image_edit · ~1000 tokens per edit

Drop a file here or click to browse

~1000 tokens per edit

Step1X-Edit-v1p2 (ReasonEdit-S, released Nov 2025) — Apache 2.0. StepFun's reasoning image-edit model: pairs a DiT-based decoder with an MLLM front-end, then adds a thinking + reflection pass around the diffusion forward. Trained to match GPT-4o / Gemini-2-Flash edit quality on KRIS-Bench and GEdit-Bench, especially on multi-step, referential, and abstract edits ("fix what looks wrong about her left hand", "match the lighting in this room to a sunset"). Self-hosted on Free.ai's H200 (dedicated venv-step1x-edit). The reflection pass costs ~25-50 s per 1024-side edit but catches the drift artifacts plain diffusion edits often produce. Free tier for /image/edit/ reasoning-mode operations.

Use via API
curl -X POST https://api.free.ai/v1/image/edit/ \
  -H "Authorization: Bearer sk-free-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"step1x-edit","prompt":"your prompt here"}'
API Documentation Get API Key

Frequently Asked Questions

Step1X-Edit-v1p2 (ReasonEdit-S) is StepFun's reasoning image-edit model — released November 2025 under Apache 2.0. Where Qwen-Image-Edit follows direct instructions ("make the sky purple"), Step1X-Edit reasons about multi-step, referential, and abstract edits ("match the lighting in this room to a golden-hour sunset", "fix what looks anatomically wrong about her left hand"). It runs a thinking pass + reflection rounds around the diffusion forward, trained to match GPT-4o / Gemini-2-Flash edit quality on KRIS-Bench and GEdit-Bench. Self-hosted on Free.ai's H200 — no upstream provider, no markup.

Qwen-Image-Edit-2511 is the fastest free instruction-edit on Free.ai — ~10-15 seconds per edit, MMDiT backbone with a Qwen2.5-VL text encoder. Step1X-Edit is the slower, smarter sibling: a DiT decoder plus a separate MLLM front-end that runs a thinking pass (reformat the prompt for the diffuser) and an optional reflection pass (compare candidates, pick the best). Use Qwen for direct edits; use Step1X when the prompt needs interpretation, has multiple clauses, or references something outside the frame.

Those are premium models that route through paid providers (fal.ai upstream) and cost 30K-112K tokens per edit. Step1X-Edit is fully self-hosted on Free.ai's H200 under Apache 2.0 and costs ~2,000 tokens. Premium models are still ahead on raw fidelity for product photography and graphic-design preservation; Step1X wins on reasoning prompts and is the strongest open alternative we ship.

Multi-step ("remove the watermark and then warm the color temperature"), referential ("match the framing of a 1985 Polaroid"), interpretive ("make her expression look more confident"), repair-style ("fix the hand", "straighten the horizon"), and abstract ("make this look like a memory"). For plain "change X to Y" edits, Qwen-Image-Edit is faster.

~2,000 tokens per edit — double the standard image-edit rate (1,000 tokens) because the thinking + reflection passes roughly double wall-clock time vs Qwen-Image-Edit. Anonymous users get 2,500 free tokens/day; signed-in users get 30,000/day — enough for ~15 reasoning edits daily without paying.

~25-50 seconds per 1024-side edit with thinking + reflection enabled (default). Turning off reflection (enable_reflection=false) drops it to ~15-25 seconds with a small accuracy hit on multi-step prompts. The reflection pass catches the "fixed the hand but accidentally changed the background" failure mode that plain diffusion edits often drift into.

Apache 2.0 — both the weights (huggingface.co/stepfun-ai/Step1X-Edit-v1p2) and the GitHub repo (github.com/stepfun-ai/Step1X-Edit). No territorial carve-outs, no MAU cap, no non-commercial rider, no research-only clause. The images you generate are yours to use commercially with no royalties.

~22 GB resident peak with model CPU offload (transformer + VAE on GPU during forward; MLLM + text encoder paged from CPU). 41.8 GB on disk. We reserve a 24 GB slot on the H200 and the wrapper aborts at startup if free VRAM dips below 18 GB. To self-host you'd need a 24 GB consumer card (RTX 4090) at minimum, ideally 40 GB+ for headroom.

Yes — pass enable_thinking=false to skip the MLLM's prompt-reformatting pass, or enable_reflection=false to skip the multi-candidate selection round. With both off, Step1X behaves like a vanilla DiT image-editor at ~12-15 s/edit. We keep both on by default because that's what the model is trained to do and it materially beats plain diffusion on the benchmark suites.

Step1X-Edit-v1p2 (ReasonEdit-S) couples a DiT-based decoder with a Qwen2.5-VL family MLLM front-end. The MLLM interprets the edit instruction; the DiT decoder paints the edit. The diffusers pipeline class is Step1XEditPipelineV1P2 (lives on a Peyton-Chen fork of diffusers, branch step1xedit_v1p2). RegionE optionally accelerates inference by skipping diffusion in regions the MLLM marks as unchanged.

Uploaded images are deleted immediately after the edit completes. Output sits on our CDN for 24 hours (7 days for paid users) so you can re-download from /account/?tab=history. Never used for training. Privacy policy at /privacy/.

Yes — POST multipart to /v1/image/edit/ with image, model=step1x-edit, operation=img2img (or inpaint / outpaint / style_transfer / etc.), prompt, optional enable_thinking, enable_reflection, steps (default 50), guidance_scale (default 6.0). Bearer auth, 10K tokens/month free. /api/ has curl examples.

Love Free.ai? Tell your friends!

Rate this page