ControlNet — 12 conditioning types in one tool

Upload a reference image, pick a conditioning type, write a prompt. The AI keeps your reference's structure (lines, pose, depth, etc.) and renders new content in any style. Backed by ControlNet-Union SDXL ProMax — Apache 2.0, fully commercial-use friendly.

Conditioning type

Canny / lineart for clean linework. Pose for body position. Depth for 3D layout. Scribble / soft-edge for rough doodles. MLSD for architecture. Normal / segmentation / tile for advanced workflows.

Reference image

Conditioning is extracted from this — the colors are discarded, only the structural signal (per your chosen type) is preserved.

Prompt

Control strength

Looser 0.7 Stricter

Aspect ratio

~1,200 tokens (SDXL × 1.2 ControlNet)

How ControlNet works

ControlNet lets you steer image generation with the structure of a reference image instead of relying on the text prompt alone. A preprocessor reads your reference and extracts a single conditioning signal — its edges, its depth map, the pose skeleton of a person, and so on. The diffusion model is then locked to that signal while the prompt decides the style, colors, lighting, and subject. The result keeps the exact composition you fed in but looks like something completely new.

This tool is backed by ControlNet-Union SDXL ProMax (Apache 2.0) — a single model that understands all 12 conditioning types below, so you switch between them from one picker without loading a different network each time. It is fully commercial-use friendly: keep, sell, or modify whatever you generate.

The 12 conditioning types

Canny

Crisp edge detection. Best for preserving sharp outlines and clean linework.

Depth

3D depth map. Keeps the spatial layout — what is near and what is far.

Pose

OpenPose body skeleton. Locks the figure's stance and limb positions.

Scribble

Loose hand-drawn doodles turned into finished art.

Segmentation

Color-coded region map. Assign each area of the scene to a class.

Normal

Surface-normal map. Preserves fine 3D surface orientation and bumps.

Lineart

Fine line extraction — ideal for inking, manga, and illustration.

Soft-edge

Gentle boundary detection that follows shapes more loosely than Canny.

MLSD

Straight-line segments. Made for architecture, interiors, and product shots.

Tile

Detail-preserving conditioning for upscaling and seamless texture work.

Inpaint

Mask-aware conditioning to regenerate only part of an image.

Repaint / outpaint

Extend a canvas or repaint regions while honoring the surrounding structure.

Three steps

Upload a reference image — a photo, a sketch, a screenshot, anything with the structure you want to keep.
Pick the conditioning type that matches what you care about (pose for a figure, depth for a scene, canny or lineart for clean outlines).
Write a prompt describing the look you want and generate. Raise control strength to follow the reference more tightly, lower it for more creative freedom.

ControlNet — 12 conditioning types in one tool — FAQ

ControlNet lets you steer AI image generation with a structural reference image instead of relying on the text prompt alone. Upload a reference, pick a conditioning type (pose, depth, edges, scribble, segmentation, and more), write a prompt, and SDXL generates a brand-new image that follows your reference structure while taking its content, style, and colors from your words. It is backed by ControlNet-Union SDXL ProMax — one model that handles 12 conditioning types in a single omni-picker.

Twelve: Canny (hard edges), Depth (3D distance), OpenPose (human skeleton/pose), Scribble (rough sketch), Segmentation (region map), Normal map (surface direction), Lineart, Lineart-anime, Soft Edge / HED (soft outlines), MLSD (straight lines for architecture/interiors), Tile (detail-preserving upscale guidance), and Inpaint. Pick the one that matches what you want to keep constant from your reference.

It extracts a control signal from your reference (an edge map, a pose skeleton, a depth map, etc.) and feeds that into SDXL alongside your text prompt at every denoising step. The result respects the structure of the control signal — same composition, same pose, same perspective — while the prompt decides the subject, style, materials, and lighting. So you get the layout you want plus the creative freedom of a prompt.

Use OpenPose when the human body position/gesture is what matters and everything else can change (great for putting a character in a specific stance). Use Depth when you want to preserve 3D layout and perspective — rooms, scenes, foreground/background separation — while restyling everything. Use Canny when you need to keep precise outlines and hard edges (product shots, logos, architecture) almost exactly. Rule of thumb: Pose = body, Depth = space, Canny = edges.

Scribble turns a rough hand-drawing into a finished image — loose control, maximum creativity. Lineart / Lineart-anime keep clean line drawings (great for coloring illustrations or anime). Segmentation paints by region ("sky here, building there, grass below"). Normal maps preserve fine surface relief. MLSD locks straight architectural lines. Soft Edge is a gentler Canny for organic subjects. Tile guides detail-faithful upscales. Pick looser types (Scribble, Soft Edge) for freedom, tighter ones (Canny, Lineart, MLSD) for fidelity.

Any clear photo, drawing, or render — JPG, PNG, or WebP up to 10MB. For Pose, use a photo with the full body and limbs visible. For Depth/Segmentation, a clean scene works best. For Canny/Lineart, high-contrast images with crisp edges give the strongest guidance. You do not need to pre-process it — ControlNet (12 in 1) extracts the control map for you from the raw image. You can also upload an already-made control map (e.g. a depth map from /image/depth-map/) and it is used directly.

Yes — generation runs on Stable Diffusion XL with ControlNet-Union SDXL ProMax, fully self-hosted on our own GPUs. Nothing is sent to a third-party image API. Because it is self-hosted and open-source, it draws from your free daily token pool rather than a per-image charge.

Yes. ControlNet generation costs ~1,000 tokens per image, the same as standard SDXL generation. Anonymous users get 2,500 tokens/day; a free account gives 10,000/day. No credit card, no watermark. Higher resolutions or multiple variations cost proportionally more from the pool.

Let the control image own the structure and let the prompt own everything else. Describe subject, style, materials, lighting, and mood in the prompt — e.g. with a Depth map of a living room, prompt "cozy Scandinavian living room, warm morning light, oak floors, photorealistic". Add a negative prompt to exclude unwanted elements. If the output follows the reference too rigidly, loosen the control strength; if it ignores it, raise the strength.

Yes. SDXL and ControlNet-Union are open-source with permissive licenses, so images you generate here are yours to use for personal and commercial work — ads, products, client projects, merch. Just make sure any reference image you upload is one you have the rights to use.

Plain text-to-image at /image/ has no structural anchor — you describe and hope the layout lands. Image editing at /image/edit/ changes an existing image. ControlNet sits in between: it generates a fresh image but forces it to obey the composition/pose/perspective of your reference. Use it when you know the layout you want (a pose, a room, an outline) but want total freedom over the look.

Yes — POST multipart to /v1/image/generate/ on api.free.ai with the control image, the conditioning type, and your prompt, using Bearer auth with a developer API key. The response is OpenAI-compatible (image_url + share_token). Good for batch pipelines that need consistent composition across many generations. Python / Node / cURL snippets at /api/.

ControlNet — 12 conditioning types in one tool

Result

How ControlNet works

The 12 conditioning types

Three steps

ControlNet — 12 conditioning types in one tool — FAQ

What is the Free.ai ControlNet tool?

Which conditioning types are available?

How does ControlNet actually guide the image?

When should I use Pose vs Depth vs Canny?

What about Scribble, Segmentation, Lineart, and the others?

What kind of reference image should I upload?

Does it run on SDXL, and is it self-hosted?

Is it really free? What does it cost in tokens?

How do I combine the reference with my prompt for best results?

Can I use the generated images commercially?

How does this differ from regular image generation or image editing?

Is there an API?

Get 10,000 Free Tokens

Wait — 30K free tokens/day!

Want more?