Researchers managed to create a diffusion model of a neural network that is able to decode human brain activity (fMRI) and reproduce what it sees. Diffusion models are currently the most popular method of AI image generation and are used at the core of such popular services as DALL-E and Midjourney.

As the researchers note, decoding visual stimuli from brain recordings aims to deepen our understanding of the human visual system and build a solid foundation for connecting human vision and computer vision through a brain-computer interface. However, due to the scarcity of data annotations and the complexity of the underlying brain information, decoding images with verifiable details and meaningful semantics is a challenging task.

The neural network was able to decode what a person sees based on brain activity

Using self-supervised machine learning, that is, self-learning of a neural network on the same brain activity data for different people, the authors of the study added a previously trained diffusion model of text-to-image transformation and a transformer model (cross-attention) to these imaginary representations. After a short adjustment to 1.5 thousand fMRI image pairs, the model was able to decode what a person sees in front of him.

The neural network training data and code have already been made publicly available by the researchers, while they provide model weights upon request.