LoFA: Learning to Predict Personalized Prior for Fast Adaptation of Visual Generative Models

1 SSE, CUHKSZ   2 FNii-Shenzhen
3 Guangdong Provincial Key Laboratory of Future Networks of Intelligence, CUHKSZ
4 SDS, CUHKSZ   5 Cardiff University

*Equal Contribution

Corresponding Author

Personalized Prior Formulation

Common Prior

Although general generative models learn a common prior from massive data, this prior often misaligns with user-specific needs and distributions. To address this, we introduce a framework that rapidly predicts personalized priors tailored to individual users.

Directly Predict Personalized Prior

Personalized Prior

Traditional approaches typically obtain a personalized prior by manually collecting task-specific datasets and running time-consuming LoRA training, which further requires substantial expert experience in hyperparameter tuning. In contrast, we propose LoFA, a general framework that predicts personalized priors (i.e., LoRA weights) within seconds for fast adaptation of visual generative models and achieves performance comparable to, and even exceeding, conventional LoRA training.

Abstract

Personalizing visual generative models to meet specific user needs has gained increasing attention, yet current methods like Low-Rank Adaptation (LoRA) remain impractical due to their demand for task-specific data and lengthy optimization. While a few hypernetwork-based approaches attempt to predict adaptation weights directly, they struggle to map fine-grained user prompts to complex LoRA distributions, limiting their practical applicability. To bridge this gap, we propose LoFA, a general framework that efficiently predicts personalized priors for fast model adaptation. We first identify a key property of LoRA: structured distribution patterns emerge in the relative changes between LoRA and base model parameters. Building on this, we design a two-stage hypernetwork: first predicting sparse response maps that capture key adaptation regions, then using these to guide final LoRA weight prediction. Extensive experiments demonstrate that our method consistently predicts high-quality personalized priors within seconds, across multiple tasks and user prompts, even outperforming conventional LoRA that requires hours of processing.

Method Overview

Method Overview of LoFA

An overview of our LoFA. Conditioned on different user prompts, our network takes the base model weight W as the input, and predicts LoRA response map at Stage-I. Next, Stage-II inherits Stage-I’s architecture, and uses the learned information of the response map to guide the final prediction of the full LoRA weights.

Comparison on Text Conditioned Human Action Video Generation

Comparison on Pose Conditioned Human Action Video Generation

Comparison on Text-to-Video Stylization

Comparison on Identity-Personalized Image Generation

BibTeX

@article{YourPaperKey2024,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}