Pre-trained LLM models are retrained on specific datasets. That is called fine tuning the model. It makes the model ready for the specific content of the present needs, say you train an LLM on medical datasets to help diagnosis of a specific disease. An LLM can be made more specific in particular domain by fine tuning. The model is trained on smaller but targeted dataset relevant to the desired task or subject matter.
A fine tuning API is used to fine tune a model. It is done online. If the weights of the model are open source, fine tuning can be done on our premises. HuggingFace offers an easy Auto Train feature. You can select the parameters on your own, (manual) or can do so using Auto Parameter Selection On Hugging Face. The best parameters for the task are selected.
A fine tuned model is evaluated on three criteria — perplexity, accuracy and F1 score. Perplexity refers to how well the model predicts the next word in a sequence. The lower the perplexity score, the higher is the ability of the model to predict the next word. Accuracy refers to how well the model performs on a given task. It is given by division of correct predictions by the total number of predictions. F1 score refers to how well the model performs on a binary classification of tasks. Here harmonic mean of precision and recall is taken.
If fine tuning is to be done for a different application, one will have to repurpose the model by a small change in its architecture. Here embeddings produced by the transformer part of the model are used. (embeddings are numerical vectors).
In repurposing, the model’s embedding layer is connected to a classifier model, e.g. a set of fully connected layer. The LLM’s attention layers are frozen. They should not be updated. It saves compute costs. Classifier is trained on supervised learning dataset.
In some instances, parameters weights of the transformer are updated. Here attention layers are not frozen. Fine tuning covers the entire model. It is expensive computationally.
To update a model knowledge-wise, say in medical literature, an unstructured data set is used. The model is trained through unsupervised or self-supervised learning. Foundation models are trained this way.
At times, more then knowledge upgradation, an LLM’s behaviour is to be modified. Here supervised fine tuning (SFY) dataset is used. It is a collection of prompts and the responses elicited. It is also called instruction fine tuning.
Some organisations use reinforcement learning from human feedback (RLHF), taking SFT to the next level. It is an expensive process. Human reviewers and auxiliary models are needed for RLHF. Only well-equipped AI Labs can afford this. RLHF brings humans in the loop.
Research is directed to parameter-efficient-fine-tuning (PEFT), e.g. low-rank adaptation (LORA).
Some models cannot be fine tuned, especially models available through API. At times there is no sufficient data, or data changes frequently. The application could be dynamic or context-sensitive. Here one can use in-context learning or retrieval augmentation.