Facebook has introduced Mobile LLM. It is a less-than-billion parameter model for use on devices. It avoids sheer quantity of data and parameters. The architecture is deep and thin. There is sharing of embedding and group-query attention mechanism.
Accuracy is achieved by block-wise weight sharing. It is suitable for chat and API calling. There is shared information between different parts of AI. It makes the phone smarter, and it does not slow down. This approach enables deployment of powerful AI models directly on consumer devices.
Organizations are already adding generative AI features on smartphones. Mobile LLM extends it beyond. It is a shift to sustainable and accessible AI with good computational possibilities on user’s device.