X has quietly had its own AI chatbot, Grok, for a while, but it hasn’t garnered the same attention as OpenAI’s ChatGPT or Google Gemini. Despite the lack of recognition, X has been diligently working to enhance Grok, leveraging the vast data from its extensive user base.
The latest iteration, Grok-2, has now entered beta testing. In a recent blog post, X described Grok-2 as “a significant step forward from our previous model Grok-1.5, featuring frontier capabilities in chat, coding, and reasoning.” Alongside Grok-2, X has introduced Grok-2 Mini, a more compact yet powerful version of its larger counterpart.
An early version of Grok-2, known as “sus-column-r,” has already been tested on the LMSYS leaderboard and is outperforming notable models such as Claude 3.5 Sonnet and GPT-4-Turbo.
Grok-2’s performance is impressive. The Elo score for an early model of Grok-2 surpasses almost all other comparable chatbots, with the exception of ChatGPT-4 and Google Gemini. X claims that Grok-2 and Grok-2 Mini achieve competitive performance levels in several areas:
- Graduate-level science knowledge (GPQA)
- General knowledge (MMLU, MMLU-Pro)
- Math competition problems (MATH)
Additionally, Grok-2 shows improvements in vision-based tasks, enhancing its versatility and usability.
Grok-2 is set to receive a new interface on the X platform, which includes the ability to generate images based on prompts. This feature is made possible through the integration of the Flux AI image generation model from Black Forest Labs. Furthermore, X plans to offer Grok-2 through a new enterprise API later this month, which will provide a “bespoke tech stack” and enforce mandatory multi-factor authentication for enhanced security.