DeepSeek presents experimental model with new learning method

Yevgeny Demkivskyi Автор новин Mezha.Media та гік. Пишу про технології, кіно та ігри. Можливо, про ігри з трохи більшою пристрастю.

30 September, 01:48 PM

Chinese startup DeepSeek has announced the release of an experimental model, DeepSeek-V3.1-Exp, which uses a new technique called DeepSeek Sparse Attention (DSA). The development is expected to improve efficiency when processing long text sequences. This is an intermediate stage to the next generation of AI architectures, the company said on its Hugging Face page.

DSA is designed to streamline the training and operation of models, reducing computational costs without significantly sacrificing accuracy. This is especially important for large-scale language models that work with millions of parameters.

DeepSeek also announced that the new version supports the FP8 (Floating Point 8) format, which saves memory and speeds up calculations, making large models more suitable for running on limited hardware. In the future, support for BF16 (Brain Floating Point 16) is planned, which provides greater accuracy during training.

In parallel, the company announced a halving of prices for its software tools, joining other Chinese developers trying to expand their user base through aggressive pricing.

The updated model will run on chips from Huawei Technologies, China's largest AI processor maker. The partnership could be a major step forward for DeepSeek, which is looking to maintain its lead after the success of its R1 model, which previously stunned the industry with its high complexity.

Advert:

DeepSeek presents experimental model with new learning method

Top Discussion

Latest News

Partner news