Generative models such as ChatGPT and Bard are taking efforts to get rid of ‘hallucinations’ or misinformation emerging from AI systems. Hallucinations occur when the model fabricates information entirely, and acts as if it is providing hard facts. Even the state-of-the-art models are prone to falsehoods. The situation is very problematic when multi-step reasoning is required, since a single logical error affects the whole solution.
OpenAI has formulated a new strategy to fight fabrications. Instead of rewarding the final correct answer, reward the model for each correct step of reasoning. It is implemented while training the model, and is called ‘process supervision’, rather than ‘outcome supervision’. It makes the model follow a chain-of-thought like approach, just like human beings. This is not the invention of OpenAI. It is just pushing it forward.
This may not solve the whole problem. What is more problematic in the output is the model making up citations and references. Recently, an American lawyer had a brief with decisions or quotations cited by ChatGPT which could not be verified as ChatGPT had invented them. The present solution may not work for this.
OpenAI has not released the full dataset to train the model. There is opacity here.