OpenAI’s new o1 model seems to be a sea-change in the development of AI with the function of ‘thinking time’ built in, which allows it time before it starts to answer a prompt.
At the beginning of the AI era, Deepmind’s AlphaGo was one of the first game-playing AIs that took no human instruction and read no rules which also represented a fundamental breakthrough in AI development.
AlphaGo used a technique called self-play reinforcement learning (RL) to build up its own understanding of the game to attempt to learn from the results from trial and error across billions of virtual games.
ChatGPT and other Large Language Model (LLM) AIs have been built based on the ground of early chess AIs that have been trained on as much human knowledge as was available, including the entire written output of species.
However, LLMs specialize in language and do not get facts right or wrong. As a result, it could provide wrong information but in beautifully phrased sentences, sounding confident.
As language is a collection of gray areas where there’s rarely an answer that’s 100% right or wrong, LLMs are typically built using reinforcement learning with humans. Therefore, it could give answers that sound closer to the kind of answer you were wanting.
In addition to being pretty much the same as its predecessors, the new o1 model has been improved with a ‘thinking time’ function, allowing it to consider and reason its way through a problem.
Unlike previous models, this model starts to be given the freedom to approach problems with a random trial-and-error approach in its chain of thought reasoning. This is also the first time that LLM can create a super-effective AlphaGo-style understanding of problems.
If there’s a domain where it’s now surpassing Ph.D.-level capabilities and knowledge, it could get answers essentially by trial and error. It could get toward a correct answer by chancing upon the correct answers over millions of self-generated attempts, and by building up its own theories.
In topics where there’s a clear right and wrong answer, AI is now taking the first steps past us on its own. It not only has human-generated reasoning steps to draw from but also being free to apply them randomly and draw its own conclusions about what’s a useful reasoning step and what’s not.