Article • Artificial Intelligence • News
12 September 2024

The age of Apple Intelligence and the rise of small language models

During October 2024, a billion devices will gain access to functionality, which will appear to be incremental, but hides a fundamental shift in how devices, applications and user interaction happen on Apple devices.

The first set of Apple Intelligence features are set to be released with iOS 18.1. As this roll out occurs across devices worldwide, users will start to see intelligent features and functionality across writing tools, photo clean-up, natural language searches in images, notification summaries, smart-reply in mail and more. Alongside a slew of additional Siri enhancements, including product knowledge, resilient request handling and look & feel!

So as of Q4 2024, over a billion devices will gain access to functionality, which will appear to be incremental but will ultimately drive a new era of intelligent real-time experiences.

Let’s find out more about what Apple Intelligence really means for Apple users and what the advent of small language models (SLMs) means for the industry and consumers worldwide.

Apple Intelligence in context

Apple Intelligence is by no means new to the market, it is merely starting to deliver natively a set of experiences that many of us have been using since the end of 2022. As many millions of people now depend on ChatGPT and Gemini in their day-to-day lives, along with the plethora of services living on OpenAI service kits and APIs.

But as is often the case, Apple is leading the way with a mass-market user testing of new technologies by making them familiar and even opaque to the device user. And it’s all thanks to the development of Small Language Models (SLMs) and a federated model of applying Generative AI in the smallest, most practical delivery methods for its use cases.

SLMs have emerged as a pivotal technological shift in extending the functionality of mobile devices and applications. This innovation, largely propelled by Apple’s integration of small models into its ecosystem, offers a range of benefits such as enhanced performance, privacy, and adaptability to everyday tasks.

In particular, Apple’s emphasis on personalisation, privacy, and efficiency has allowed SLMs to become central to their AI strategy, particularly through the Apple Intelligence initiative. This development has positioned SLMs as the future of mobile intelligence, marking a significant departure from the large language models (LLMs) that dominate generative AI spaces.

From the limitations of LLMs. Arise the SLM

At the heart of this shift is Apple’s approach, which recognises the limitations of large language models (LLMs) for mobile platforms.

LLMs, with their enormous parameter sets—often in the hundreds of billions—consume vast amounts of computational resources and energy, making them impractical for mobile devices. They necessitate cloud-based infrastructure, which only increases concerns over privacy and latency issues. Conversely, SLMs, with significantly fewer parameters, can operate efficiently on-device, circumventing these industry-agnostic challenges.

Apple’s SLMs, such as the 3-billion-parameter models embedded within iOS, are designed to execute diverse tasks, from text generation to image creation, directly on mobile devices.

By harnessing the power of SLMs, Apple ensures that its users benefit from enhanced performance without relying on constant cloud connectivity. This has marked a breakthrough, particularly in areas like personal context understanding, which can now be processed locally on devices. Furthermore, the lightweight nature of SLMs means that mobile devices can handle multiple tasks simultaneously while maintaining efficiency in power consumption and performance.

Another key reason why SLMs are ideal for mobile environments lies in the inherent privacy benefits they offer. With LLMs, user data is often transmitted to cloud servers, where it is processed and analysed. While robust security measures exist, this model leaves room for potential data exposure.

Apple’s integration of SLMs in their mobile architecture is underpinned by a focus on user privacy, where the majority of data processing occurs on-device. This ensures that sensitive personal information remains localised, enhancing security while reducing dependence on cloud-based data exchanges. This shift aligns with Apple’s broader commitment to responsible AI practices, particularly in light of increasing scrutiny over data privacy and security.

From generalised AI to personalised intelligence

Apple’s approach to SLMs also underscores the importance of personalised AI experiences. By focusing on smaller, more adaptable models, Apple ensures that its AI can be tailored to individual users’ needs. This is especially critical for mobile devices, which often house deeply personal data such as location, preferences, and usage habits.

SLMs can adapt to these datasets, offering personalised recommendations, suggestions, and actions in real-time without needing to send data to external servers. This approach allows for faster and more contextualised responses, enhancing the overall user experience and making mobile AI more responsive to individual preferences.

The technical underpinnings of Apple’s SLM approach further highlight why these models are optimal for mobile environments. Apple’s models are designed with quantisation techniques, which reduce the size of the models without compromising accuracy. By leveraging low-bit personalisation strategies, Apple’s on-device models are capable of generating text and other outputs efficiently, with lower memory and computational demands. This is crucial for mobile devices, which operate with far more limited resources compared to the powerful server farms typically required for LLMs. The optimisation of Apple’s models allows for smoother performance and less battery drain, a critical factor for mobile applications.

Moreover, SLMs allow Apple to integrate AI into a wider range of mobile applications. From enhancing Siri’s capabilities to offering intelligent in-app actions across iOS and macOS platforms, the integration of small, highly specialised models provides a more seamless user experience.

These models can prioritise notifications, summarise content, and even create custom images, all while operating within the constraints of mobile hardware. For instance, Apple’s use of adapters, which dynamically load small neural networks for specific tasks, allows the SLMs to switch between different activities without compromising on performance or quality. This versatility is a core advantage of SLMs over larger models that struggle to function efficiently across such a wide array of tasks on constrained devices.

While Apple’s integration of SLMs through Apple Intelligence has garnered significant attention, Google has developed a comparable system through its Gemini AI platform.

What Apple have learned from Google

Google’s approach to SLMs is exemplified in the Gemini Nano model, a foundation model that operates locally on devices like the Google Pixel 8 Pro and Samsung S24 Series. Much like Apple’s SLM strategy, Google’s use of on-device processing through Gemini Nano allows sensitive data to remain on the user’s device, enhancing privacy and security.

This localised approach ensures that personal data is not unnecessarily transmitted to the cloud, a feature increasingly sought after by privacy-conscious users. In addition to privacy, Google emphasises offline functionality. Gemini Nano continues to provide AI services even when the device is offline, an essential feature for users with inconsistent connectivity.

However, Google’s Gemini system offers a slightly broader ecosystem integration compared to Apple’s SLMs. For instance, Gemini’s deep integration into the Android platform allows for greater interaction with native apps. Users can ask Gemini to pull data from across various apps—such as searching for a recipe in Gmail and adding ingredients to a shopping list in Google Keep—without needing to manually navigate between these apps. This level of integration streamlines workflows and adds to the seamless experience Google is aiming to create, a function also seen in Apple’s contextual in-app actions. Nonetheless, Google’s use of a unified assistant to access various services within its ecosystem offers a distinct advantage by consolidating user interactions through one centralised interface.

One of the more innovative aspects of Gemini is its conversational overlay, which elevates the typical assistant capabilities found in mobile AI. Known as Gemini Live, this feature allows users to have continuous, flowing conversations with the AI, adapting to deeper interactions as the conversation evolves. While Apple’s SLMs also offer highly personalised interactions, the conversational experience with Google’s Gemini appears more focused on an ongoing, natural dialogue that fits seamlessly into everyday tasks. Moreover, Google has designed Gemini Live to function in a hands-free mode, allowing users to interact while the phone is locked or during other activities, which brings an extra layer of convenience.

In terms of technical implementation, Google’s use of the Android AICore system—a foundational part of Android 14—parallels Apple’s on-device infrastructure. AICore is designed to pre-install Gemini models onto compatible devices, which not only streamlines the setup process but also enables high-performance, on-device inference without the need for developers to distribute models separately within their applications. This architecture mirrors Apple’s approach of embedding SLMs into iOS, which allows for quick and efficient task execution across various apps. However, Google’s strategy to support LoRA (Low-Rank Adaptation) fine-tuning of the models for specific use cases on the user’s device introduces an added layer of flexibility. This enables developers to customise models for specific tasks, potentially making Google’s implementation more adaptable in niche applications.

Google’s Gemini system also shows promise in overcoming some of the challenges that generative AI faces in mobile environments. By offering different versions of Gemini—ranging from Nano to Pro—Google is providing a scalable solution that can accommodate both mobile and more resource-heavy tasks depending on the device capabilities. Users can opt for more robust cloud-based models when necessary, particularly for more complex operations, much like Apple’s hybrid use of SLMs and cloud-based LLMs. This flexible architecture makes Google’s system comparable to Apple’s, but with more options for scaling between different devices and task requirements.

It is also worth noting that Google’s AI development is closely integrated with its broader suite of services, such as Google Maps, YouTube, and Google Messages. This allows Gemini to enhance everyday experiences in unique ways, such as generating restaurant recommendations from YouTube videos or adding travel itineraries directly to Google Maps. Although Apple’s SLMs are deeply embedded in iOS and offer personalised interactions based on user context, Google’s breadth of integrations across its service ecosystem provides a more expansive range of use cases. For example, Apple’s SLM might suggest travel destinations based on a user’s previous photos, while Gemini can provide a more detailed itinerary that spans multiple services.

Both systems, however, face similar limitations in terms of complex reasoning and large-scale data processing. While on-device models such as Gemini Nano and Apple’s SLMs are incredibly efficient for everyday tasks and personalised recommendations, more intricate tasks that require advanced data analysis often still require server-side computation. Google addresses this by offering larger Gemini models through cloud-based APIs for more resource-intensive operations, ensuring that the mobile models remain lightweight without sacrificing the ability to scale up when needed.

What are the industry implications?

In terms of broader industry implications, Apple’s leadership in SLM-driven mobile intelligence may well dictate future trends in AI development. The performance gap between large and small models is quickly narrowing, and with companies like Microsoft and Meta also investing in smaller, highly efficient models, the era of massive LLMs may be drawing to a close. Apple’s integration of SLMs reflects a broader industry trend towards models that are not only more efficient but also more ethical in their data handling. The environmental footprint of SLMs, which require far less energy than their larger counterparts, also positions them as a more sustainable solution in the AI space.

While SLMs offer immense benefits, there are still certain limitations compared to their larger counterparts. SLMs, for instance, may not handle more complex reasoning or large-scale data analysis as effectively as LLMs. For these more intricate tasks, Apple integrates cloud-based models or collaborates with external AI systems, like OpenAI’s ChatGPT, to handle the overflow. This hybrid model, where SLMs manage personalised and context-specific tasks while LLMs handle broader, more computationally intensive queries, creates a balanced ecosystem. It optimises mobile device performance without sacrificing the advanced capabilities offered by larger models when required.

Our conclusion

In conclusion, small language models represent the future of AI on mobile platforms, and Apple’s pioneering integration of SLMs into its devices underscores their suitability for this role.

By focusing on privacy, performance, and personalisation, Apple’s SLMs offer a powerful alternative to the cloud-heavy, resource-intensive LLMs that currently dominate the AI landscape. As mobile devices continue to evolve, the efficiency, adaptability, and security provided by SLMs will likely drive further innovation, positioning Apple and other tech giants at the forefront of AI-driven mobile experiences.