What is LoRA?
LoRAs solved a big problem with AI model customization: full fine-tuning requires GBs of GPU memory and hours of training, and produces a full-sized copy of the model for every variant. LoRA, introduced by Microsoft researchers in 2021, trains only a low-rank decomposition of weight updates — typically 0.1–1% of the original parameters.
This makes LoRAs practical to train on a single consumer GPU in under an hour and cheap to share (hundreds of MB instead of many GB). They've become the de facto customization format for Stable Diffusion, with thousands of community-trained LoRAs for specific art styles, characters, and photographic looks.
How it works
Low-rank decomposition
Instead of updating the full weight matrix W (millions of parameters), LoRA trains two smaller matrices A and B such that the update ΔW = B·A has the same shape but far fewer trainable parameters.
Mergeable at inference
At inference time, the LoRA's weights can be merged into the base model (W' = W + s·B·A, where s is the LoRA strength) with no runtime overhead.
Stackable
Multiple LoRAs can be composed at different strengths — e.g., 0.8 strength for an art-style LoRA and 0.5 for a character LoRA, producing that character in that style.
Common use cases
- Training a model on a specific person's likeness from 5–20 photos
- Teaching a model a specific art style (e.g., 1980s anime, watercolor)
- Creating brand-consistent generations across a whole marketing campaign
- Fine-tuning LLMs on domain-specific data without catastrophic forgetting