OpenAI o1 Likely Uses RL over Chains of Thought to Build System 2 LLMs [Video]

Recently, OpenAI released two models – OpenAI o1-preview and OpenAI o1-mini – marking a significant leap in the AI world. These models can now reason using chain of thoughts and reasoning tokens.

Jim Fan, in a recent post on X, mentioned that o1 models mark a significant shift towards inference-time scaling in AI, emphasising the importance of search and reasoning over mere knowledge accumulation. This approach suggests that effective reasoning can be achieved with smaller models.

By implementing techniques like Monte Carlo tree search during inference, the model can explore multiple strategies and scenarios to converge on optimal solutions. The key advantage of using MCTS during inference is that it allows the model to consider many different approaches to a problem, rather than committing to a single strategy early on.

Subbarao Kambhampati, professor at Arizona State University, saidthat OpenAI’s o1 model uses reinforcement learning over auto-generated chain of thought—similar to AlphaGo’s self-play approach—to optimise problem-solving by building a generalised …

Watch/Read More