A new challenger in AI: A Deep dive into Mistral AI

In the ever-evolving landscape of artificial intelligence, the latest breakthroughs are shaping a future where powerful language models become increasingly accessible. This week, Mistral AI unveiled its Mixtral 8x7B, Meta introduced the Llama 2 family, and details about OpenAI’s GPT-4 architecture emerged. These developments mark a significant stride towards democratizing AI, raising questions about the implications, capabilities, and the direction in which the field is headed.

Paris-based Mistral AI, founded by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, has been making waves in the AI community. With the announcement of Mixtral 8x7B, Mistral is positioning itself as a formidable force, challenging industry giants like OpenAI. The Mixtral model boasts a “mixture of experts” (MoE) architecture with open weights, promising performance comparable to OpenAI’s GPT-3.5.

What sets Mixtral apart is its ability to process a 32K token context window, supporting multiple languages including French, German, Spanish, Italian, and English. Its local execution, coupled with open weights, offers a level of freedom not commonly seen with closed AI models. According to Mistral, Mixtral outperforms Meta’s Llama 2 70B model, showcasing its prowess in compositional tasks, data analysis, software troubleshooting, and programming—a significant leap in the realm of AI capabilities.

The surprising speed at which Mixtral caught up with GPT-3.5 has captivated the AI community. Users are reporting impressive performance, with some claiming to run Mistral 8x7B locally at 27 tokens per second. The potential of having a GPT-3.5-level AI assistant running on local devices is not only groundbreaking but also opens new possibilities for AI applications.

Moreover, Mistral’s strategic approach of championing smaller models with eye-catching performance challenges the status quo dominated by large-scale models from OpenAI, Anthropic, or Google. Mistral’s commitment to open weights provides users with the flexibility to download and use the model with fewer restrictions than closed AI models, fostering a more collaborative and accessible AI ecosystem.

In a parallel development, Meta introduced Llama 2, a family of AI language models with a source-available commercial license. This move positions Llama 2 as a significant player in the market, allowing integration into commercial products. Ranging from 7 to 70 billion parameters, Llama 2 aims to outperform open-source chat models on various benchmarks, according to Meta.

The models come in two variants: pretrained, trained on 2 trillion tokens with a 4,096-token context window, and fine-tuned, specifically designed for chat applications like ChatGPT. While Llama 2 might not match the performance of OpenAI’s GPT-4, it presents a compelling option for those seeking source-available models with commercial usability. The availability of Llama 2 on Microsoft Azure and future for integration with AWS, Hugging Face, and other providers signify a potential shift in the dynamics of the large language model market.

However, Meta’s decision to limit the use of Llama 2 for licensees with over 700 million active daily users has sparked discussions about the power dynamics in the AI industry. The open-source nature of Llama 2 raises questions about its potential risks, from misuse to the generation of spam or disinformation.

Moreover, the release of Llama 2 with a commercial license signifies Meta’s unique approach to balancing openness with the need for control. While offering a source-available model, Meta acknowledges the importance of regulating the use of its models to prevent potential misuse, striking a delicate balance between openness and responsibility.

Simultaneously, insights into OpenAI’s GPT-4 architecture have emerged, shedding light on the model’s scale and intricacies. GPT-4 boasts approximately 1.8 trillion parameters across 120 layers, dwarfing its predecessor GPT-3. The utilization of a Mixture of Experts (MoE) architecture, with 16 experts each handling around 111 billion parameters, showcases the model’s sophistication in decision-making processes.

Trained on a massive dataset of around 13 trillion tokens, which includes both text-based and code-based data, GPT-4 stands as a testament to the advancements in AI training capabilities. The model’s training cost of $63 million underscores the computational power required for such groundbreaking developments.

The inclusion of a vision encoder in GPT-4, allowing autonomous agents to interpret web pages and transcribe images and videos, adds a multi-modal dimension to the model. While GPT-4 might not be a direct match for OpenAI’s GPT-4, its performance on reasoning tasks and coding benchmarks positions it competitively in the AI landscape.

The Mixture of Experts architecture utilized in GPT-4 represents a novel approach to large language models, enabling more efficient and scalable model training and inference. The gate network routing input data to specialized neural network components, or “experts,” allows for streamlined processing and reduced computational load compared to monolithic models.

As the AI industry moves towards greater openness, the debate around open-source models intensifies. Mistral’s Mixtral, Meta’s Llama 2, and OpenAI’s GPT-4 all represent different approaches to making powerful language models accessible. The open-source nature of Mixtral and Llama 2 allows for transparency, economic competition, and democratized access to AI. However, it comes with its set of challenges, including the potential for misuse and ethical concerns.

Meta’s decision to support major openly-licensed and weights-available foundation models stands in contrast to closed-source approaches adopted by industry giants like OpenAI, Microsoft, and Google. The release of Llama 2 with a commercial license signifies a unique approach, potentially shaping the future dynamics of the AI market.

However, critics argue that open-source AI models carry potential risks, such as misuse in synthetic biology or in generating spam or disinformation. It’s easy to imagine Llama 2 filling some of these roles, although such uses violate Meta’s terms of service. Currently, if someone performs restricted acts with OpenAI’s ChatGPT API, access can be revoked. But with the open approach, once the weights are released, there is no taking them back.

In the wake of these groundbreaking announcements, the AI community finds itself at a crossroads. The unveiling of Mixtral 8x7B, Llama 2, and insights into GPT-4 signals a shift towards greater accessibility and usability of powerful language models. The next steps involve navigating the challenges of responsible AI development, addressing ethical concerns, and finding the right balance between openness and control.

The future of AI appears promising, with local execution capabilities, commercial integration, and powerful architectures paving the way for innovative applications. As developers and researchers explore these models, its public usage will be exciting. For all my daily news and tips on AI, Emerging technologies at the intersection of humans, just sign up for my FREE newsletter at www.robotpigeon.beehiiv.com