Apple Introduces OpenELM AI Language Models

In the rapidly evolving world of artificial intelligence, a noteworthy trend has emerged: the growing popularity of “small language models” capable of running on local devices rather than relying on powerful cloud-based servers. Recently, Apple made significant strides in this area with the introduction of OpenELM, a collection of compact, source-available AI language models that can operate directly on smartphones. These models are primarily proof-of-concept at present, but they hint at future on-device AI capabilities from Apple.

OpenELM, short for “Open-source Efficient Language Models,” was unveiled by Apple and is available on Hugging Face under an Apple Sample Code License. Despite some licensing restrictions, the source code for these models is accessible. Apple’s move parallels similar efforts by Microsoft, which recently introduced the Phi-3 models, designed to deliver robust language understanding and processing performance in smaller packages suitable for local deployment. Microsoft’s Phi-3-mini, for instance, boasts 3.8 billion parameters, while Apple’s OpenELM models range from 270 million to 3 billion parameters.

In the landscape of AI models, parameter count serves as a rough indicator of capability and complexity. Meta’s Llama 3 family, with its largest model containing 70 billion parameters, and OpenAI’s GPT-3, which features 175 billion parameters, illustrate the scale of large language models. However, contemporary research has increasingly focused on enhancing the efficiency and performance of smaller models, making them as capable as their larger predecessors.

Apple’s eight OpenELM models are divided into two categories: four pretrained models, which function as raw, next-token predictors, and four instruction-tuned models, fine-tuned for following instructions, making them ideal for developing AI assistants and chatbots. The models include OpenELM-270M, OpenELM-450M, OpenELM-1_1B, OpenELM-3B, OpenELM-270M-Instruct, OpenELM-450M-Instruct, OpenELM-1_1B-Instruct, and OpenELM-3B-Instruct, all featuring a 2048-token maximum context window. These models were trained on a diverse set of publicly available datasets, amounting to approximately 1.8 trillion tokens.

Apple’s approach with OpenELM emphasises a “layer-wise scaling strategy,” which allocates parameters more efficiently across each layer of the model. This method not only conserves computational resources but also enhances model performance, even when trained on fewer tokens. According to Apple’s white paper, this strategy resulted in a 2.36 percent improvement in accuracy over Allen AI’s OLMo 1B, despite using half as many pre-training tokens.

In a notable move towards transparency and reproducibility, Apple has released the code for CoreNet, the library used to train OpenELM, along with reproducible training recipes. This allows the model weights to be replicated, which is a rare level of openness for a major tech company. Apple highlights the importance of transparency and reproducibility in large language models, stressing their role in advancing open research, ensuring trustworthy results, and investigating biases and potential risks in the models.

By providing source code, model weights, and training materials, Apple aims to support and enrich the open research community. However, the company also warns that since these models were trained on publicly sourced datasets, there is a risk of generating outputs that could be inaccurate, harmful, biassed, or objectionable. While these new AI language model capabilities have not yet been integrated into Apple’s consumer devices, rumours suggest that the upcoming iOS 18 update might include new AI features utilising on-device processing to ensure user privacy.

The annual Worldwide Developers Conference (WWDC) is a significant event for Apple, and WWDC24, scheduled to run from June 10 to June 14, 2024, will be no exception. The event, held at Apple’s Cupertino headquarters and streamed online, is expected to showcase the latest advancements in Apple’s platforms, technologies, and tools. The keynote presentation on the first day will likely introduce key features of the next round of software updates for iOS, iPadOS, macOS, watchOS, visionOS, and tvOS.

Speculation is rife that Apple may announce new hardware and its first significant foray into generative AI at WWDC24. Analysts and commentators suggest that Apple might collaborate with a partner like Google to integrate a chatbot into its operating system, design its own AI tools, or offer an AI App Store featuring various chatbots. Despite Apple’s use of AI across its products for several years, it has lagged behind competitors like Microsoft and Google in generative AI and large language models. However, Apple’s leadership is expected to address these developments during the keynote.

Apple’s quiet yet strategic expansion in AI capabilities involves a series of acquisitions, staff hires, and hardware updates aimed at enhancing AI functionality in its devices. Since 2017, Apple has acquired 21 AI startups, including the recent purchase of WaveOne, an AI-powered video compression startup. According to PitchBook research, this activity underscores Apple’s commitment to integrating AI into its ecosystem.

Morgan Stanley’s research note reveals that nearly half of Apple’s AI job postings now mention “Deep Learning,” indicating a focus on the algorithms driving generative AI. Apple hired Google’s top AI executive, John Giannandrea, in 2018, reflecting its serious intent in this domain. Although Apple has been secretive about its AI plans, insiders report that the company is developing its own large language models to power generative AI products like chatbots.

CEO Tim Cook has emphasised Apple’s responsible approach to AI research and innovation. The company’s goal is to operate generative AI on mobile devices, allowing AI chatbots and apps to function on the phone’s hardware and software rather than relying on cloud services. This ambition presents a technological challenge, requiring smaller, efficient AI models and high-performance processors.

Apple’s recent chip developments support this AI focus. The M3 Max processor, unveiled for the MacBook, enables workflows previously impossible on a laptop, such as AI developers working with billions of data parameters. The S9 chip in the latest Apple Watch allows Siri to access and log data offline, while the A17 Pro chip in the iPhone 15 features a neural engine twice as fast as its predecessors.

In October, Apple released an open-source LLM in partnership with Columbia University, called “Ferret,” which enhances AI’s ability to understand and interact with visual content. This technology could potentially be used as a virtual assistant capable of recognizing and responding to visual cues, such as identifying clothing brands during a video call and facilitating purchases.

While Microsoft recently surpassed Apple as the world’s most valuable company, driven by its AI advancements, analysts remain optimistic about Apple’s future. Bank of America analysts upgraded their rating on Apple stock, citing the expected demand for new generative AI features in upcoming iPhone models. Laura Martin, a senior analyst at Needham, believes Apple’s AI strategy aims to enhance its ecosystem and protect its user base, rather than competing directly with Google and Amazon in the AI infrastructure market.

Apple’s methodical approach to AI ensures it waits for the right technological confluence to deliver refined, impactful solutions. With the impending launch of iOS 18 and new AI capabilities, Apple is poised to make significant contributions to the AI landscape, continuing its tradition of innovation and user-centric design.

for all my daily news and tips on AI, Emerging technologies at the intersection of humans, just sign up for my FREE newsletter at www.robotpigeon.be