Advanced LLMs: LLM 2.0

We have written several blogs on LLMs or large language models technology. Essentially, these models are trained to predict the next tokens or to guess missing tokens. However, they are not trained to accomplish tasks they are expected to perform. The training is expensive — billions of parameters and so many GPUs. The user ultimately pays the bill. The performance is derived from the heavy hardware around the models, and not from the neural networks themselves.

Therefore, there is research to improve the functioning of LLMs. Let us call it LLM 2.0. LLM 2.0 focuses on robust back-end architecture. It retrieves and leverages knowledge graph from the corpus (smart crawling). LLM 2.0 is hallucination free and does not require prompt engineering. LLM 2.0 has few tokens. It uses customizable PMI metric for key word associations. LLM 1.0, we know depended on vector databases, dot product and cosine similarity. (instead of PMI). LLM 1.0 relies heavily on faulty Python libraries for NLP. LLM 2.0 uses contextual tokens with non-adjacent words, sorted n-grams, variable key word associations, variable length embeddings, in-memory nested hashes.

LLM 2.0 focuses on conciseness and accuracy. LLM 1.0 gives lengthy English prose suited for novices.

In LLM 2.0, there are specialized sub-LLMs. LLM.1.0 does one prompt at a time. There is no real-time fine tuning. It is not based on explainable AI. LLM 2.0 uses multi-index and deep retrieval techniques. LLM 1.0 uses a single index. Proprietary and standard libraries may miss some elements in PDFs. It has shallow retrieval.

ChatGPT o1 works in a way that is closer to how a person thinks. It gives the hope that some form of AGI might be imminent, or even already there. LLMs rely on a method called next token prediction. The model is repeatedly fed samples of text broken into chunks called tokens. The last token in the sequence is hidden or masked. The model is asked to predict it. The training algorithm then compares the prediction with the masked token. It then adjusts models parameters to enable it to make a better prediction next time. The process continues, using billions of fragments of language, text or code till the model can reliably predict the masked tokens. At this stage, the model parameters have captured the statistical structure of the training data and the knowledge contained therein. The parameters are then fixed and the model uses them to predict new tokens. It is called inference.

The transformer architecture allows the model to learn that some tokens have a strong influence on others, even if they are widely separated in a sample text.

Chain-of-thought (COT) prompting improves LLMs by breaking dawn a problem into smaller steps to solve it. However, this process does not work for small LLMs.

print

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *