Human Element Indispensable in AI

Generative AI models such as Bard and ChatGPT depend on human-generated content for training. Thus these models do depend on human element to succeed. Researchers from prestigious institutions such as Imperial College of London, Oxford, Cambridge conducted a study called ‘The Curse of Recursion : Training on Generated Data Makes Models Forget.’ LLMs do face a major threat in future.

If LLMs are trained on AI-generated content, and not the human-generated content, there are significant risks. Their reliance is on the existing data. Data that has been originally created by the human beings.

If Bing is asked about drones, it will present in its answer the material it has collected from articles written by human beings. The data could be in the form of papers, books, photos.

If the models rely on AI-generated content for presenting the information, there are adverse effects. It could be called a ‘model collapse’. The models deviate from reality and become corrupted. There is a deteriorative process in which models gradually forget the true underlying data distribution. In long term learning, this process may set in.

The errors in AI-generated content will multiply as time passes. There is a cumulative effect. The future drafts are more distorted.

Such a model collapse can perpetuate a model bias on sensitive attributes such as ethnicity gender, complexion etc.

To curtail the risks in future, it is necessary to preserve original human-generated content and data.

At present, there is no fool-proof method to distinguish between AI-generated and human-generated data.

In future human-generated data is likely to become a valuable resource for training the AI models.

print

Leave a Reply

Your email address will not be published. Required fields are marked *