
The paper proposes a novel concept, LLM as a system service on mobile devices (LLMaaS), to bolster user privacy by executing powerful language models on the device itself. The key developments in this work include:
LLMS architecture that manages app and LLM memory contexts separately, minimizing context switching overhead.Empirical studies show that this approach reduces context switching latency by up to two orders of magnitude compared to baseline solutions. By integrating LLMs as a system service, mobile devices can leverage the model’s power while ensuring user privacy. This signifies a substantial leap in mobile AI that could reshape user-device interactions. Read more.