LLMs which are pre-trained are fine tuned to adapt to specialized domains, accommodate human instructions and cater to individual preferences. LLM is adjusted on a smaller and domain specific dataset.
Scaled up LLM models are demanding computationally to fine tune. It is memory intensive too for the process of backpropagation.
Princeton University researchers have addressed the memory issue by developing a new optimizer –MeZO, a memory-efficient zeroth order optimizer. It is an adaptation of traditional ZO-SGD method of estimating gradients. ZO method can estimate gradients using only two forward passes. Thus it is memory-efficient. It is a modification of ZO-SGD method. MeZO can improve non-differentiable goals ( such as accuracy or F1 score) still using the same amount of memory as inference. MeZO outperforms zero-shot, ICL and linear probing in experiments.
While evaluating, MeZO was able to train a 30-billion parameter model using a single NVidia A 100 80 GB GPU whereas backprapogation can only train a 2.7 billion parameter language model with the same memory constraints.