DeepSeek develops AI models that are capable of self-improvement
DeepSeek, together with China's Tsinghua University, is working on a new approach to training artificial intelligence models, which should reduce the costs of this process, writes Bloomberg.
Under the new approach, AI will self-reinforce its own learning. This method is designed to help models better respond to human preferences. Reinforcement learning has proven effective in speeding up AI tasks in specialized areas.
However, applying this method to more general models has proven to be a challenge, and this is exactly the problem the DeepSeek team is trying to solve. The strategy outperformed existing methods and models in various tests, and the result showed better performance with fewer computational resources.
DeepSeek calls these new models DeepSeek-GRM (generalist reward modeling) and plans to release them as open source.