รายละเอียดของโมเดล
เกี่ยวกับ
Lance 3B (unified) is an AI model built by ByteDance. Strongest at Cross-task research, prototyping pipelines that need image + video + edit + VQA from one model, "one model, four tasks" demos. Apache 2.0, commercial use OK.. Self-hosted on Free.ai GPUs — runs free against your daily token pool (100 tokens per use). Released under Apache 2.0 — commercial use permitted on Free.ai.
ใช้ผ่าน API
curl https://api.free.ai/v1/chat/ \
-H "Authorization: Bearer YOUR_KEY" \
-d '{"model":"lance-3b"}'
เปรียบเทียบ
คำถามที่พบบ่อย
Lance is ByteDance's 2025 native unified multimodal model — 3B active parameters under Apache 2.0. One set of weights covers four tasks: text→image (768×768), image-edit (768×768), text→video (480p, up to 121 frames ≈ 5 seconds), and image+video understanding (VQA, captioning). Built on a Qwen2-derived LLM backbone with a Wan-Video VAE and a Qwen2.5-VL ViT. Self-hosted on Free.ai's H200 with no upstream provider, no API markup, and no per-call fees beyond your token balance.
Most open stacks pick the best specialist for each surface — SDXL or FLUX for raw image generation, Qwen-Image-Edit for edits, Wan 2.2 for video, Qwen2.5-VL for vision-language reasoning. Lance trades a bit of per-task quality for cross-task coherence: the same internal representation feeds every output, so an image you generate and then edit retains its style, and the VQA the model gives about a video matches the language model in the same checkpoint. Useful for research and demos that benefit from one consistent model rather than a pipeline of four.
Pick Lance when: you want consistent style across image + edit + video from one model, you're prototyping a multi-task pipeline and the "one model" angle matters, or you need permissive licensing on the unified workflow. Pick specialists when: you want highest-quality raw image gen (FLUX.2 Klein > Lance at >768²), longest / highest-quality video (Wan 2.2 TI2V-5B or HunyuanVideo > Lance at >480p), or fastest VQA in chat (Qwen2.5-VL is always warm on the H200, Lance has to cold-load).
Text→image and image-edit: 5,000 tokens (matches FLUX-class image gen). Text→video: 15,000 tokens (matches CogVideoX / Wan 5B class). Image+video VQA: 1,000 tokens. The higher cost vs SDXL (1,000) reflects Lance's heavier cold-load — every call evicts the rest of the warm fleet and re-loads 40 GB of weights, which adds 25-40 s on top of the inference itself. We're billing for total wall-clock GPU time, not just inference.
After cold-load (~25-40 s): image gen ~12-20 s, image edit ~15-25 s, text→video ~60-180 s (depending on num_frames), VQA ~3-8 s. Every Lance call cold-loads the model because it can't co-resident with the rest of the warm fleet on the H200, so the cold-load delay is part of every call, not just the first.
Image generation and image edit are fixed at 768×768. Video generation is fixed at 480p (typically 480×848 landscape) and capped at 121 frames (~5 seconds at 24 fps). These are the resolutions Lance was trained on; pushing higher requires upscaling via a separate model (try /image/upscaler/ for images or /video/upscaler/ for videos).
Janus (DeepSeek) and Show-o split understanding and generation into separate heads on a shared backbone; Lance is more tightly unified — one set of generation+understanding heads with explicit task tokens. Emu3 (BAAI) tokenizes everything as discrete tokens including pixels, which gives it cleaner autoregressive generation but lower quality at fixed compute. Lance's pitch is the four-task coverage in 3B active params plus its Wan-derived VAE which handles video natively (Janus and Show-o are image-only).
Apache 2.0 — both the weights (huggingface.co/bytedance-research/Lance) and the GitHub repo (github.com/bytedance/Lance). No territorial restrictions, no MAU cap, no non-commercial rider, no research-only clause. Outputs are yours to use commercially with no royalties or attribution requirements beyond the standard Apache 2.0 license text.
40 GB minimum per ByteDance's README. The 3B active params are deceptive — the full Qwen2 LLM + Wan VAE + Qwen2.5-VL ViT all sit in memory together. To self-host you'd need a single A100 80 GB, A6000 48 GB, or an H100/H200 with at least 40 GB free. We run it on our H200 (141 GB total) but it still evicts the rest of the loaded models per call because it's the heaviest single-shot on the box.
Yes — POST JSON or multipart to /v1/multimodal/lance/ on api.free.ai with {task: "t2i" | "image_edit" | "t2v" | "vqa", prompt: "...", image: <upload> or image_url: "/static/outputs/..."}. Bearer auth via developer API keys. Response includes job_id, output URL, and share_token. /api/ has curl examples per task.
We mark Lance experimental because cold-load latency means it's not a great fit for high-volume traffic — every call evicts the warm fleet and reloads. We may add a "warm Lance" tier later if usage justifies dedicating a slot, or we may add a second H200 specifically for unified models. For now it's available on the same token economy as the rest of Free.ai's self-hosted models with no surcharge, just the higher per-call token cost reflecting the wall-clock GPU time.
Uploaded images for image-edit and VQA are deleted immediately after the task completes. Generated outputs sit on our CDN for 24 hours (7 days for paid users) so you can re-download from /account/?tab=history. Nothing is shared with ByteDance — the weights run locally on our hardware. Full details at /privacy/.