How to Remove Background

From simple green screen algorithms to U²-Net neural networks.
A deep dive into pixels and how AI understands images.

When you click "Remove Background", GPUs perform billions of operations. We'll break down the tech stack from Pixel Calculation to Deep Learning like an algorithm engineer.

Phase 1: Chroma Key

Traditional "Green Screen" logic: Simple mathematical judgment based on color difference.

Core Algorithm

IF (Green > Red + Tol AND Green > Blue + Tol) THEN Alpha = 0 ELSE Alpha = 1

Pixel Probe

Hover over image
Canvas Real-time Render

Phase 2: Deep Learning (U²-Net)

How modern AI understands complex semantics and details via "Nested U-Structure".

U²-Net Architecture Diagram
Input Image
Encoder
Decoder
RSU-1
RSU-2
RSU-3
RSU-1
RSU-2
RSU-3
Alpha Mask
RSU Block (Nested U)
Upsample Fusion

Semantic Segmentation

Unlike green screen, AI classifies every pixel: "This is a face" vs "This is a leaf". It distinguishes them by shape, texture, and context even if colors are similar.

Training Data

Trained on datasets like COCO, DUTS, ADE20K with tens of thousands of labeled images. It has seen portraits in thousands of lighting conditions.

Why U²-Net?

Standard networks lose detail as they get deeper. U²-Net uses a Nested U-Structure to capture both global semantics and local details (like hair strands) efficiently.

Phase 3: Alpha Matting & Unknowns

For semi-transparent edges (like hair), AI needs to solve a complex math equation.

Segmentation masks are usually binary (0 or 1). Fine for solid objects, but for hair, smoke, glass, we need an Alpha Channel (0.0 - 1.0 gray scale).

Core Concept: Trimap

Foreground: Keep (Alpha=1)
Background: Remove (Alpha=0)
Unknown: Solve Alpha

The model applies expensive Matting algorithms only on "Unknown" pixels, inferring foreground opacity from surrounding pixel correlations.

Final Alpha Matte
Binary Mask (0/1)
Drag to Compare

Core AI Models

U²-Net

SOTA Saliency Detection, core variant for remove.bg

High Precision Portrait
MO
MODNet

Designed for Real-time Portrait Matting, Trimap-free

Real-time Video Conf
De
DeepLabV3+

Google's General Semantic Segmentation Model

General Stable

Industry Workflow

Step 1: Coarse Segmentation

Generate low-res Mask via CNN. Determine rough human outline.

Step 2: Edge Refinement

Identify "Unknown Regions" (e.g., hair edges) to generate Trimap.

Step 3: Alpha Matting

Solve transparency and apply Color De-spill during composition.