My AI paper
Subscribe
Language Models
Optimizer
BAdam
Training
Deep Learning
BAdam Optimizer for Large Language Models

Adopted by a plethora of AI applications, large language models necessitate efficient optimizers for training. Luo, Yu, and Li present BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, and optimize the Llama 2-7B model. BAdam uses block coordinate optimization with Adam, to not only save memory but also expedite training through the chain rule property.

The optimizer packs several advantages:

  • Outperforms LoRA and LOMO in convergence.
  • Exhibits a small performance gap with Adam.
  • Enables full-parameter training on standard hardware.

BAdam is a major stride towards democratizing full-parameter training of large models. Its ability to run on a single RTX3090-24GB GPU empowers researchers with limited resources, potentially galvanizing a wave of grassroots innovation in model training dynamics.

Personalized AI news from scientific papers.