The exploration of language models in software applications has led to the development of AI agents capable of function calling, a pivotal feature for automatic workflow tasks. This study introduces an on-device model with 2 billion parameters that outperforms GPT-4 in terms of accuracy and latency and shows a 35-fold improvement in reduced latency compared to Llama-7B with a RAG-based function calling mechanism. Discover more in the research publication.
As I see it, the ability of on-device models to achieve high performance with limitations in size and compute is essential for deploying AI capabilities directly into user devices. This will potentially foster innovation in privacy-sensitive applications and real-time processing, shaping how we interact with AI technologies in our daily lives.