OpenAI recently introduced a new model called sCM (simplified continuous-time consistency model), which achieves sample quality comparable to diffusion models but requires only two sampling steps, significantly speeding up the generative process for tasks like image, audio, and video generation.
In a new paper titled “Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models”, OpenAI researchers Cheng Lu and Yang Song argue that simple, consistency models, aka sCM, offer a more efficient and faster method for generating high-quality samples compared to traditional diffusion models.
The researchers believe that this provides a significant ~50x speedup in generation time, making it much more efficient for real-time applications across image, audio, and video domains. The sCM’s stability in training and scalability to large datasets further enhance its edge over traditional diffusion models.
The researchers said that the model is trained with 1.5 billion parameters on ImageNet 512×512, generating high-quality samples in 0.11 seconds on an A100 GPU.
The outcome: …