Time and Cost Reduction in Pretraining LLMs

Training an LLM is very costly — ranging from $10 million to tens or hundreds of times costlier than that. Thus cost-wise LLMs are not affordable for smaller organisations or research/academic groups. It is necessary to revisit the current optimization methods of the LLMs. Standford researchers started working on this. The aim was to curtail the training time of these models to half. There are millions or billions of parameters. These parameters have curvature — the maximum achievable speed these models reach as they progress towards the final goal of LLM pretraining. Curvature in short is the workload of parameters in LLM model. It is for this reason that while optimizing LLM pretraining the curvature estimation step is foregone.

Researchers noticed a possible inefficiency in previous methods which used parametric curvature estimation. The curvature estimates were updated at every step of optimization. Thought was given to the proposal whether the process can be improved upon by decreasing the number of updates. The idea was tested by designing Sophia to estimate parameters’ curvature every ten steps. That was a winning proposition.

Another trick tried was clipping. Inaccurate estimation of curvature increases the workload. Clipping prevents that by setting a threshold or a maximum curvature estimation.

Sophia was used to pretrain a small LLM on par with GPT-2. It reached optimization in half the number of steps and half the time. It means substantial improvement in pretraining and massive cost reduction.

In future, the researchers would like to experiment with a larger LLM using Sophia, and other models such as CV and multi-modal models.

As Sophia is open source, the experiment can be carried forward. Sophia is a new approach developed by Stanford researchers to train LLMs. The other optimization algorithms previously used are Stochastic Gradient Descent (SGD), RMSProp Optimisation and Adam.

print

Leave a Reply

Your email address will not be published. Required fields are marked *