Categories
AI Marketing

SambaNova’s Llama 3.1 405B Model Hits 114 Tokens Per Second, Setting Speed Record [Video]

  • Last updated August 28, 2024
  • In AI News

The company’s technology is built around the SN40L chip, which features a reconfigurable dataflow architecture.

SambaNova Systems has achieved a new performance milestone, setting a world speed record with Meta’s Llama 3.1 405B model, processing 114 tokens per second. The performance, verified by Artificial Analysis, outpaces other providers by over four times, positioning SambaNova as a leader in AI speed and efficiency.

“I’ve been playing with SambaNova Systems‘s API serving fast Llama 3.1 405B tokens. Really cool to see the leading model running at speed. Congrats to Samba Nova for hitting a 114 tokens/sec speed record,” said DeepLearning.ai founder Andrew Ng.  

The benchmark was set using a single 16-socket node, operating with full 16-bit precision on SambaNova’s custom RDU chips. This advancement addresses the challenge of balancing quality and speed in large models like Llama 3.1 405B, enabling the deployment of the model in more speed-sensitive applications, such as …

Watch/Read More