Adopted by a plethora of AI applications, large language models necessitate efficient optimizers for training. Luo, Yu, and Li present BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, and optimize the Llama 2-7B model. BAdam uses block coordinate optimization with Adam, to not only save memory but also expedite training through the chain rule property.
The optimizer packs several advantages:
BAdam is a major stride towards democratizing full-parameter training of large models. Its ability to run on a single RTX3090-24GB GPU empowers researchers with limited resources, potentially galvanizing a wave of grassroots innovation in model training dynamics.