JW

Intelligent Assistant Language Understanding On Device

8/7/2023

A systems paper from my time at Apple, describing the natural language understanding stack that powers on-device assistant functionality.

The Problem

Server-based digital assistants send user queries to the cloud for processing. This introduces latency, requires network connectivity, and raises privacy concerns since voice data leaves the device. As on-device compute became powerful enough to run meaningful ML models, the question was whether a full natural language understanding pipeline could run locally with competitive accuracy. The challenge is not just model size — it is maintaining a system that handles the full complexity of assistant queries (intents, slots, entities, dialog state) within the memory and compute constraints of a phone.

The Approach

The paper describes the end-to-end NLU system architecture for on-device processing: how user utterances are parsed into structured representations, how the system handles the combinatorial complexity of intent and slot filling, and how the models are kept small enough to run on-device while maintaining accuracy. The paper emphasizes practical deployment considerations — some approaches that work well in academic dialog systems research are difficult to maintain at scale over time. The design choices prioritize reliability and maintainability alongside accuracy.

Recollections

[To be filled in.]