AI Toolkit

The ultimate training toolkit for finetuning diffusion models by Ostris.

4.5k Stars 60 Watching 100 Issues 26 Pull Requests 505 Forks

Installation

System requirements: Python 3.10+, Nvidia GPU (recommended at least 8GB VRAM), Python virtual environment, Git.

Linux Installation Steps

git clone https://github.com/ostris/ai-toolkit.git
cd ai-toolkit
git submodule update --init --recursive
python3 -m venv venv
source venv/bin/activate
# install torch first
pip3 install --no-cache-dir torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126
pip3 install -r requirements.txt

Windows Installation Steps

git clone https://github.com/ostris/ai-toolkit.git
cd ai-toolkit
git submodule update --init --recursive
python -m venv venv
.\\venv\\Scripts\\activate
pip install --no-cache-dir torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

AI Toolkit Web UI

Provides an intuitive web interface for easily starting, monitoring, and managing AI model training tasks without writing complex commands.

Requirements

  • Node.js 18+ (latest LTS version recommended)
  • Completed installation of AI Toolkit main program

Launch Web Interface

cd ui
npm run build_and_start

After launching, visit http://localhost:8675 to access all features

Security Settings (Optional)

Set the environment variable AI_TOOLKIT_AUTH to add password protection and prevent unauthorized access.

Gradio Training Interface

Provides a simpler Gradio graphical interface, especially suitable for beginners to quickly get started with model training, data processing, and LoRA publishing.

Key Features

  • One-click upload and management of training data
  • Automatic annotation and data preprocessing
  • Simplified training parameter settings
  • Direct publishing to Hugging Face
# After installing ai-toolkit
cd ai-toolkit
huggingface-cli login # Login to HF
python flux_train_ui.py

FLUX.1 Model Training Guide

Supports training for the latest FLUX.1 diffusion models, providing industry-leading image generation quality. Hardware requirements: NVIDIA GPU with at least 24GB VRAM.

FLUX.1-dev Version

  • Non-commercial license, for personal use only.
  • Requires Hugging Face authorization access and valid token.
  • Provides highest quality generation results, suitable for professional work.
  • Supports more advanced training features and parameter adjustments.

FLUX.1-schnell Version

  • Apache 2.0 open source license, can be used for commercial projects.
  • No Hugging Face Token required, ready to use out of the box.
  • Needs to be paired with specific adapters for training optimization.
  • Faster training speed, lower resource requirements.

Training Process Guide

  1. Start with example configuration files (config/examples/), modify parameters according to your needs.
  2. Prepare high-quality training datasets, recommend at least 20 images per concept.
  3. Execute training command: python run.py config/your_config_name.yml
  4. View generated samples in real-time during training to evaluate model performance.
  5. Completed model files will be saved in the specified output directory.

Advanced tip: Adjusting learning rate, training steps, and batch size can significantly impact training results. Refer to official documentation for best practices.

Dataset Preparation Guide

High-quality training data is key to successful model training. AI Toolkit supports various data formats:

  • Supported image formats: JPG, JPEG, PNG (lossless PNG recommended)
  • Each image needs a paired .txt text file with the same name as description labels
  • Text files should contain detailed image descriptions, the more precise the better
  • Supports automatic replacement and standardization of trigger words
  • Built-in smart cropping and scaling features, no need for manual image preprocessing
  • Supports data augmentation and random transformations to increase training sample diversity

Dataset Best Practices

  • Recommend using 20-50 high-quality images for each concept
  • Maintain image style consistency for better training results
  • Using consistent trigger words can improve the model's ability to recognize specific concepts
  • Avoid mixing too many different styles or themes in the training set

Cloud Training Platform Support

AI Toolkit provides multiple cloud training options, suitable for users without high-end GPUs or projects requiring large-scale training.

RunPod Cloud Training

Provides complete RunPod templates and deployment scripts, supporting one-click deployment:

  • Pre-configured Docker containers, no complex setup required
  • Supports high-performance GPUs like A100, H100
  • Automatic data synchronization and model saving
  • Hourly billing, economical and affordable

Modal Cloud Service

Provides serverless training solutions for the Modal platform:

  • Zero infrastructure management, fully automatic scaling
  • Pay-as-you-go, only billed during training
  • Supports team collaboration and version control
  • Built-in data caching and model storage

Cloud training is ideal for large models and long-duration training. See official documentation for detailed configuration.

Advanced Training Techniques and Optimization

LoRA Layer-Precise Training

AI Toolkit provides fine-grained layer control for optimized training of specific network layers:

network:
  type: "lora"
  # ... other params
  network_kwargs:
    only_if_contains: ["layer_name_suffix"]
    # or
    ignore_if_contains: ["layer_name_suffix"]

By precisely controlling training layers, you can significantly improve generation quality for specific content types, such as facial details, textures, or specific styles.

LoKr Advanced Training Method

Supports LoKr (Low-Rank Kronecker product) training method, providing more efficient parameter utilization:

network:
  type: "lokr"
  lokr_full_rank: true
  lokr_factor: 8
  # ... other params

LoKr technology can achieve better training results with fewer parameters, especially suitable for complex styles and detail-rich concepts.

Mixed Precision Training

Supports mixed precision training, significantly reducing VRAM requirements while maintaining model quality:

  • Automatically selects the best precision configuration (FP16/BF16)
  • Supports gradient accumulation, achieving larger effective batch sizes
  • Optimized memory management, reducing OOM errors during training
  • Supports gradient checkpointing technique, further reducing memory requirements

For more advanced training techniques and optimization methods, please refer to the Advanced Training Documentation.

Need more detailed usage guides or having issues? Visit the AI Toolkit GitHub Official Repository or join the Discord Community for help.