SambaNova, Cerebras, and Groq Clash Over Token Speed in Wake of OpenAI o1 Launch [Video]

The battle for token speed is intensifying as SambaNova, Cerebras, and Groq push the limits of inference performance. With SambaNova setting records on Llama 3.1 405B, Cerebras delivering unmatched speeds with its WSE-3, and Groq’s LPU challenging traditional GPU makers, the race to dominate inference hardware is heating up.

Recently, OpenAI released o1 series of models with reasoning abilities and capacity to ‘think’.

OpenAI o1 is a perfect example proving that reasoning doesn’t require large models. Today many parameters in current models are dedicated to memorising facts for trivia-like benchmarks. Instead, reasoning can be managed by a smaller ‘reasoning core’ that interacts with external tools, such as browsers or code verifiers

This marks a significant shift towards inference-time scaling in production, a concept focused on enhancing reasoning through search rather than purely through learning. “This approach reduces the need for massive pre-training compute and a significant portion of compute is now allocated to inference rather than pre- or post-training,” saidNVIDIA’s Jim …

Watch/Read More