NVIDIA Blackwell: betting on "intelligence"
Each successive change in NVIDIA graphics generations, despite the high-quality general update, has certain features. In one case, the emphasis is on technical parameters - more modern processing technologies, an increase in the number of functional blocks, an increase in operating frequencies. In another - developers spend more effort to obtain maximum relative performance per clock or improve energy efficiency. The case with NVIDIA Blackwell is somewhat special. The appearance of the generation of video cards coincided with the widespread introduction of artificial intelligence (AI) algorithms. So the manufacturer, which actually initiated and became the engine of AI computing on user systems, had to adapt - the "intelligent" work for the GPU has increased significantly. Processors with NVIDIA Blackwell graphics architecture, used for GeForce RTX 50 series video cards, have received many deep transformations and interesting features. It is impossible to cover all the changes in one material, but we will consider the most significant improvements.
Architectural accents
New solutions based on NVIDIA Blackwell architecture offer not only an increase in computing power, but also bring fundamental changes in how GPUs process different types of data. With the development of artificial intelligence algorithms that require both high-precision calculations and fast integer processing, NVIDIA has developed a solution that optimizes performance for both scenarios. Blackwell offers for the first time unified processing of floating-point (FP32) and integer (INT32) operations, allowing GPUs to operate significantly more efficiently, adapting to the dynamic needs of modern games and AI applications.
In previous NVIDIA GPU architectures, Streaming Multiprocessors (SM) had separate paths or dedicated blocks for performing floating point (FP32) and integer (INT32) operations. This meant that if, for example, the GPU was performing a lot of FP32 operations and then needed to perform INT32 operations, those INT32 compute blocks could be idle, or vice versa. This created some bottlenecks.
In the Blackwell architecture, NVIDIA has redesigned the SM blocks so that each shader core (CUDA Core) can perform both FP32 and INT32 operations simultaneously and in parallel. This means that there are no separate, hard-coded "FP32 cores" and "INT32 cores" within the SM.
This unification brings several significant practical benefits, especially in the context of modern workloads. In many modern gaming and AI workloads, GPUs often have to switch between FP32 (rendering, physics, ray tracing) and INT32 (texture indexing, code branching, address calculations, certain AI operations). With unified cores, the GPU can perform both types of operations without waiting for specialized blocks to become free. This makes more efficient use of SM computing resources. For example, the peak INT32 performance of the GeForce RTX 5090 is 104.8 TOPS, compared to 41.3 TOPS for the RTX 4090.
The architecture is becoming more flexible and able to adapt to new types of workloads that may emerge in the future. If new technologies require an unexpected FP32 to INT32 ratio, Blackwell will be able to handle it better than previous architectures.
Another major architectural change in Blackwell concerns the significant improvement and deeper integration of Tensor Cores - specialized cores designed by NVIDIA to accelerate matrix calculations. The new GPUs have received the fifth generation of Tensor Cores, which not only offer increased speed for traditional AI tasks, but also introduce new precision formats such as FP4, which doubles the throughput and effectively expands the size of models that the GPU can process.
With the advent of the Blackwell architecture, NVIDIA is taking a significant step towards "neural rendering," where artificial intelligence becomes not just an additional tool, but an integral part of the image creation process. This is done through the introduction of neural shaders - special software blocks that integrate small neural networks directly into the GPU shader code. This allows you to achieve a new level of realism, increase efficiency and reduce resource consumption.
To more efficiently distribute heterogeneous shaders between compute units, the Blackwell architecture uses the accelerated Shader Execution Reordering 2.0 mechanism. Since the overall graphics pipeline can now contain a combination of typical and neural shaders, SER helps quickly divide tasks between CUDA compute units and tensor cores.
In Blackwell architecture solutions, the 4th generation RT cores have received a significant upgrade. In addition to a significant increase in performance, the ray tracing algorithm processing units are now equipped with special hardware modules: Triangle Cluster Intersection Engine and Triangle Cluster Decompression Engine. These engines allow you to work more efficiently with groups of triangles combined into clusters, instead of processing each triangle separately.
This Mega Geometry radically changes the approach to working with scenes containing a huge number of polygons. Unlike the traditional, extremely resource-intensive method of calculating ray tracing, the cluster format significantly simplifies the task. This is due to more efficient data organization and faster search for intersections of rays with geometric objects.
The 4th generation RT cores also received the Linear Swept Spheres (LSS) block, which is mainly designed to dramatically improve and accelerate ray tracing of complex, thin geometric objects, including hair and fur.
LSS in Blackwell RT cores models thin objects (like hair) not with triangles, but with "drawn spheres". This significantly reduces the amount of geometric data, as each hair segment requires only two spheres instead of dozens of triangles. Hardware acceleration of intersections with these spheres makes ray tracing for such objects up to 2 times faster and saves graphics card memory.
The main upgrade of the 5th generation tensor cores was the support for the FP4 format, which allows you to perform twice as many calculations compared to FP8 (and 4 times more than FP16) in the same amount of time. This is critical for inference of large language models (LLM) and generative AI, where you need to quickly process huge amounts of data. In practice, this means faster responses from chatbots, instant generation of images, video or text. For many models, the accuracy provided by the FP4 format is quite sufficient, so doubling the performance here is fundamentally important.
The integration of AI models into games creates new challenges in maintaining smooth and responsive gameplay. To optimize the workload, making the most efficient use of graphics card resources, the AI Management Processor (AMP) is introduced - a specialized coprocessor integrated directly into the Blackwell GPU.
Modern GPU workloads increasingly include both traditional rendering (rasterization, ray tracing) and AI-intensive computations (e.g., DLSS 4, neural shaders, NPC behavior models, content generation). These tasks compete for the same GPU resources (compute cores, memory), and their efficient coordination becomes critical. In environments where graphics rendering and AI computations occur simultaneously, AMP helps avoid resource conflicts.
It can prioritize response-critical tasks (such as DLSS Frame Generation in a game) over less latency-sensitive background AI processes. It also ensures proper synchronization between different processing stages, where the output of one AI calculation can be the input for the next graphics stage, and vice versa.
One of the notable features of the Blackwell architecture is the use of GDDR7 memory. It is twice as fast as GDDR6, while consuming half the power per bit of data transferred.
The biggest change is the signaling technology. GDDR6X uses PAM4 signaling, which relies on four logic levels to sample the clock signal. GDDR7 uses PAM3, with three logic levels instead of four. PAM3 signaling can operate at a higher speed than PAM4, meaning it will transfer more data per second, despite a smaller amount of data per clock.
The use of GDDR7 chips has significantly increased the overall memory bandwidth for GeForce RTX 50 series graphics cards. For the flagship GeForce RTX 5090 32 GB model, this figure was an impressive 1792 GB/s, compared to 1008 GB/s for the GeForce RTX 4090 24 GB. The use of a 512-bit bus instead of the 384-bit one in the older model of the previous generation undoubtedly contributed to this.
NVIDIA has paid a lot of attention to the issue of energy efficiency of solutions based on the Blackwell architecture. The power regulation system of various blocks on the chip has been improved, which allows for very fine-grained power control.
New Clock Gating capabilities allow entire clusters to be turned off very quickly, saving dynamic power consumption even in areas where only part of the chip is idle, or where the idle time is so short that it is usually considered “active.”
A new voltage bus has been added that powers the GPU cores and memory system separately. The separate buses provide independent voltage control over large areas of the chip that can be optimized for workloads, increasing performance. It also allows Blackwell to power down unused parts of the chip during short periods of idle time. An additional factor in reducing power consumption was the use of energy-efficient GDDR7 memory chips.
The dynamic frequency control system has undergone additional optimizations. Accelerated frequency switching allows you to realize the full performance of the GPU within a given power consumption budget.
For mobile solutions, the speed of transition between active and "deep sleep" states is especially important. Shortened phases at each stage save energy, and therefore the laptop's battery life.
Solutions with the Blackwell architecture are equipped with a significantly upgraded display engine. The graphics cards have received DisplayPort 2.1 with UHBR20 support and a bandwidth of up to 20 Gb/s per line. This allows you to connect screens with high resolution and refresh rates.
The display engine also received High Speed HW Flip Metering, a hardware module that optimizes frame rate, providing more efficient and smoother operation in games and applications that use DLSS 4 with Frame Generation and especially Multi Frame Generation.
The updated video encoders/decoders support AV1 UHQ and MV-HEVC, and also include the ability to hardware transcode video using the 4:2:2 subsampling standard.
Another major upgrade for Blackwell is the use of the PCI Express 5.0 connection interface, which has twice the bandwidth of the previous standard. Current desktop platforms have offered PCI-E 5.0 for some time, but the GeForce RTX 50 series are the first graphics cards to benefit from the increased bandwidth between the CPU and GPU.
Neural rendering
The use of neural shaders provides developers with fundamentally new opportunities for additional visual experiments, deep system optimizations, and the practical implementation of ideas that were previously considered impossible due to high resource requirements. NVIDIA offers developers the RTX Kit, which already allows them to join a new visualization paradigm with the active involvement of developments in artificial intelligence.
RTX Mega Geometry is an NVIDIA technology that dramatically improves ray tracing performance in 3D scenes with extremely high levels of geometric detail. It is designed to solve one of the most difficult challenges in ray tracing: processing a huge number of triangles.
In modern games and professional applications (such as design, architecture, simulation), the number of triangles in a scene grows to millions, and sometimes billions. This is especially true with the advent of technologies such as Nanite in Unreal Engine 5, which allows developers to use cinematic-quality assets with extremely high geometric detail. Blackwell provides the hardware foundation to work seamlessly with this level of complexity.
Traditional ray tracing methods that rely on Bounding Volume Hierarchy (BVH) face challenges when dealing with such a huge number of triangles.
Thanks to the ability to efficiently handle "mega geometry," developers can create worlds of unprecedented complexity—dense forests, detailed cities, intricate character models where every leaf, pebble, or hair can be rendered with ray tracing.
Mega Geometry is available for processing by all generations of RTX graphics cards, but if specialized accelerators are available, models on GPUs with the Blackwell architecture will have an advantage.
NVIDIA also offers RTX Neural Texture Compression (NTC), a special method of compressing and decompressing textures using neural networks. Instead of traditional block-based compression algorithms (like BCn), which compress data in fixed blocks and can lose detail, NTC uses a small neural network (decoder) for each material. The original texture data is converted into a combination of weights for that network and hidden features, which are then passed through a decoder to restore colors.
High-resolution textures take up a huge amount of video memory (VRAM) and disk space. Whereas NTC demonstrates up to 8x compression compared to traditional methods, allowing developers to use higher quality textures or increase the number of assets in a scene without running out of memory.
RTX Neural Materials is used to render extremely complex materials. Instead of describing the properties of a material using complex mathematical models or multiple textures, Neural Materials uses a trained neural network to synthesize the visual appearance of the material in real time.
Some materials, such as leather with its subsurface scattering, velvet with its unique shine, iridescent silk, or complex metals, require very high computational effort and complex shader models. Neural networks can approximate complex physical models with less effort, speeding up material processing by up to 5 times and reducing their memory footprint. Similar to Neural Textures, they can compress material information, making it more compact.
In addition to RTX Global Illumination, NVIDIA has developed a Neural Radiance Cache (NRC) technique to efficiently compute indirect global illumination in real-time, especially in path-tracing scenes. Traditional path tracing is very resource-intensive because it requires tracing many rays that bounce off surfaces multiple times to accurately calculate illumination.
Neural Radiance Cache uses a small neural network that learns during real-time rendering to cache and approximate indirect lighting information. Instead of tracing each ray to infinity, NRC allows you to stop tracing after a few reflections and then use the trained neural network to "figure out" the rest of the indirect lighting for that point.
Skin is a tricky object to render in games. Unlike opaque materials like wood or metal, where light only reflects off the surface, translucent materials like skin behave differently. Light penetrates them, scatters inside, and then exits elsewhere.
To realistically recreate skin, NVIDIA has adapted cinematic Subsurface Scattering technology for real-time rendering using path tracing. RTX Skin is the first ray-traced implementation of subsurface scattering in gaming, giving artists the flexibility to control the intensity of the effect, achieving a new level of photorealism.
RTX Neural Faces offers a new approach to improving facial rendering quality using generative AI. Neural Faces takes a simple rasterized face and 3D position data as input and uses a real-time generative AI model to produce a more natural-looking face. The generated face is trained on thousands of offline images of that face from any angle, lighting, emotion, and occlusion conditions.
This is far from an exhaustive list of new capabilities. In general, neural shaders allow you to rethink how images are created, moving from pure coding to a combination of traditional methods and the power of AI. This is the path to a new level of photorealism, efficiency, and reduced memory requirements, which is critical for the next generation of games and professional applications. In other words, this is not just "another feature", but a fundamental change in the approach to graphics, where AI becomes an active participant in the visual process.
Accelerating AI with NIM Microservices and AI Blueprints
One of the key features of the RTX 50 series launch is NVIDIA’s commitment to bringing generative AI workflows directly to creators’ PCs through the use of NIM. NVIDIA Inference Microservices (NIM) are out-of-the-box services that make it easier to run AI models on local RTX GPUs.
These microservices contain everything needed (the model itself, optimizations, APIs) so that developers and enthusiasts can integrate AI features into their applications with a simple API call. This significantly lowers the barrier to using AI models on the PC – without having to struggle with model tuning or optimization, creators can use capabilities such as image generation, text-to-speech, or advanced language models in their everyday creative applications.
To simplify AI-powered creativity, NVIDIA also introduced AI Blueprints, ready-to-use workflows built on NIM microservices. AI Blueprints are essentially reference applications that combine multiple AI models and tools to complete complex tasks.
For example, at CES 2025, NVIDIA unveiled its first two Blueprints: one that converts a PDF document into an audio podcast, and another that generates images based on 3D models. In the PDF-to-podcast Blueprint, the pipeline uses AI to extract text and images from a PDF, generate a script, and then synthesize audio, allowing users to efficiently create audio podcasts from written material.
Blueprint gives visual artists an intuitive way to control the results of generated images: the creator can create a simple scene using 3D objects in Blender (placing shapes to lock in composition and camera angle), and the AI (based on the FLUX text-to-image model) will generate a detailed image that matches that 3D scene.
This project effectively combines 3D mockup with image generation, giving artists greater control over AI art through spatial guidance instead of textual cues. These AI projects demonstrate how RTX 50 Series GPUs allow creators to use multiple AI models together for new workflows – all running locally on their own hardware.
FP4 Precision – Faster AI for Creative Work
The GeForce RTX 50 series of graphics cards based on the Blackwell architecture GPUs support FP4 (4-bit floating point) precision, which significantly improves local AI performance. Traditional AI models often use FP16 (16-bit) or FP8 precision; FP4 is an even smaller numerical format that serves as a form of aggressive model compression. As previously noted, the GPUs for the RTX 50 series of graphics adapters include new 5th generation tensor cores capable of performing FP4 calculations, effectively allowing more operations to be performed in parallel with less memory usage.
In practice, FP4 can reduce the memory requirements of neural networks by half compared to FP16 and double the output bandwidth on a GPU. NVIDIA notes that this allows many large generative models to be run on a single RTX 50 GPU, which previously would have required an entire data center server.
A concrete example is the FLUX.1 text-to-image model from Black Forest Labs: at full FP16 precision, FLUX.1 requires over 23 GB of VRAM to run, meaning it can only comfortably run on GeForce RTX 4090 or PRO GPUs. When quantized to FP4, the same model uses less than 10 GB – enough for more modest GPUs and runs much faster.
In fact, on a GeForce RTX 5090 using FP4, FLUX.1 can generate an image in ~4 seconds, compared to ~17 seconds at FP16 on the RTX 4090 (or ~10 seconds at FP8). And thanks to NVIDIA’s advanced quantization techniques (such as those in TensorRT Optimizer), this speedup comes with virtually no noticeable loss in output quality.
For creatives, this means that AI-powered tools (image generators, upscalers, AI effects in apps, etc.) run faster and handle larger projects without hitting memory limits. Complex AI-powered features—from image synthesis in design apps to machine learning-based effects in video editing—can now be used more seamlessly in your desktop workflow.
With FP4 support, the RTX 50 series effectively gives artists and content creators data center-class AI capabilities, allowing them to locally use AI models that were previously unavailable on consumer hardware.
New NVIDIA Broadcast Features for Streamers
Content creators who live stream or record videos have more options with the NVIDIA Broadcast 2.x update for the RTX 50 generation.
NVIDIA Broadcast software (part of the Studio suite) uses artificial intelligence to improve voice and video quality in real time, and the latest version introduces two important beta features: Studio Voice and Virtual Key Light.
Studio Voice: This effect uses AI to dramatically improve the sound quality of your microphone. It not only eliminates background noise (based on NVIDIA’s existing noise cancellation technology), but also improves the clarity and richness of your audio, making a regular home microphone sound like a high-end studio microphone. For podcasters, vloggers, or streamers, Studio Voice means your audience hears clear, professional-quality audio without expensive equipment or complex post-processing. Even in noisy environments, AI filters out distractions (keyboard clicks, room echoes, etc.) while preserving the speaker’s voice.
Virtual Key Light: Good lighting is crucial to video quality, and this feature uses AI to simulate professional studio lighting that illuminates your subject’s face. Virtual Key Light automatically analyzes the webcam image and illuminates your face with balanced light, as if you were standing in front of a physical softbox or ring light. The result is a more attractive, evenly lit look on camera, helping streamers and video call participants look their best without complicated lighting setups. This is especially useful for creators working in improvised environments – AI can instantly compensate for dim or uneven room lighting.
Studio Voice and Virtual Key Light are currently in beta and, due to their AI-intensive processing, require a powerful RTX graphics card (at least an RTX 4080 or the new RTX 5080). They are intended for creative broadcasts such as talk shows, webinars, or live art demonstrations, not game streaming (where GPU resources are needed for the game itself). Along with that, the Broadcast app update improves existing features such as Background Noise Removal (now providing even clearer voice isolation) and Eye Contact (which preserves the natural gaze of the speaker). The interface has been updated for ease of use, allowing you to run multiple AI effects at once.
It’s worth noting that NVIDIA has made it easy for third-party developers to integrate Broadcast effects into their own software. The underlying technology (such as Studio Voice or Virtual Key Light) is available through the NVIDIA Maxine SDK or even as NIM microservices for those who want to add these AI effects to their own applications. This means that in the future, creative applications or streaming platforms will be able to integrate NVIDIA’s AI features for broadcast.
Now, anyone with a compatible RTX GPU can download the free NVIDIA Broadcast app and instantly improve the audio and video quality of their content creation.
3D animation and visualization
For 3D artists, animators, and architects, the GeForce RTX 50 series offers a significant leap in real-time rendering power. With fourth-generation RT cores and increased memory capacity, the RTX 50 series is designed to handle complex scenes and ray-tracing workloads at high speeds. NVIDIA claims a 40% performance boost for 3D applications thanks to architectural improvements in the Blackwell GPU.
In practice, tasks like ray-traced rendering, physics simulation, and viewport manipulation are smoother. A large architectural visualization model or a dense animated scene that might have rendered at, say, 20 frames per second on the previous generation can now approach 28–30 frames per second on an equivalent RTX 50 GPU—a significant gain for interactive work.
Importantly, the GeForce RTX 5090 is equipped with 32GB of ultra-fast GDDR7 memory (RTX 5080 - 16GB). This means creators can load extremely large 3D assets, high-resolution textures, or multitasking workflows into memory without limitations. This is especially useful for architectural visualization and VFX, where project files are large. The GeForce RTX 5080 is also able to accommodate large CAD models or detailed environments, ensuring smooth panning and editing, while less powerful GPUs may not be as efficient or have insufficient memory capacity.
Another important feature is DLSS 4 (Deep Learning Super Sampling 4), which comes with the RTX 50 generation and benefits both gaming and creative workflows. In supported 3D visualization and rendering applications, DLSS 4 introduces Multi Frame Generation: the GPU's AI generates additional intermediate frames to increase the frame rate for interactive rendering.
For example, in tools like D5 Render or Chaos Vantage (popular in architecture and animation), DLSS 4 can interpolate up to 3 additional frames for every 1 rendered frame, effectively increasing the frame rate by four times without having to completely re-render the scene each time.
An animator working on a heavy scene or an architect doing a virtual tour will see much smoother results – similar to a four-fold increase in GPU rendering power – while maintaining rendering accuracy, as AI frames are formed from motion vectors and rendered frames. This enables near-real-time rendering (60+ FPS) even at high quality settings, making the creative process more WYSIWYG (“what you see is what you get”) and reducing the time it takes to wait for previews. When it comes time to render the final frames, the performance of ray tracing and AI noise reduction on the RTX 50 also reduces production time, meaning faster processing of high-quality images or animation sequences.
NVIDIA Studio Drivers: Optimized for Creators
At the heart of all of these hardware innovations is NVIDIA’s ongoing commitment to software stability and optimization through Studio Drivers. NVIDIA Studio Drivers are specialized versions of GPU drivers that are targeted at creative applications, not gaming, to give artists the most reliable and highest performance in their work.
To do this, each Studio Driver is rigorously tested using leading creative software—not just with a single application, but also in multitasking workflows that replicate real-world creative usage (e.g., editing in Premiere, then compositing in After Effects, then color correcting in DaVinci Resolve in the same project). This rigorous testing means that when you install a Studio Driver, you can be confident that your Adobe, Autodesk, Blackmagic, Unreal Engine, and other creative applications have been tested for compatibility with that driver version.
Studio Drivers are released at a slightly slower rate than Game Ready Drivers, prioritizing stability and compatibility over day-one game optimization. Each Studio Driver update contains dozens of bug fixes and performance tweaks specifically designed for content creation tools.
For example, the Studio driver can coincide with major updates, such as the release of Adobe Creative Cloud or the annual update of Autodesk products, ensuring that any new features are accelerated by RTX GPUs and that any new bugs are fixed. NVIDIA often reports performance improvements in creative applications with new Studio drivers - for example, previous releases have provided an 8-12% increase in rendering or exporting performance for applications such as Blender Cycles and Photoshop.
9th generation NVENC and 4:2:2 video encoding
The RTX 50 cards feature NVIDIA's 9th generation encoder (NVENC) and 6th generation decoder, adding support for 4:2:2 color subsampling and improving HEVC/AV1 encoding quality. The flagship RTX 5090 features three hardware encoders (and two decoders), delivering 40% faster video export compared to the RTX 4090 and significantly faster than previous generations.
This multi-encoder configuration is also beneficial for streamers, as it provides higher quality broadcasts (approximately 5% better image quality at the same bit rate) in AV1/HEVC format for platforms such as Twitch and YouTube.
4:2:2 is a chroma subsampling standard that describes how color information (chroma) is stored relative to brightness information (luma) in a video file. The human eye is more sensitive to changes in brightness than color, so video formats often "compress" color data to reduce file size without significantly losing visual quality.
The designation 4:2:2 means that for every 4 pixels horizontally, 4 brightness values (luma) and 2 color values (chroma) are stored. That is, the color information is sampled at half the horizontal resolution, but at full vertical resolution.
4:2:2 provides a good balance between image quality and file size. It preserves significantly more color information than the more common 4:2:0 (used in most consumer video and streaming), while only increasing the file size by about 30% compared to 4:2:0.
The additional color information gives video editors much more flexibility for precise and subtle adjustments during color correction. The format allows for cleaner cutting of objects from the background, providing sharper and more detailed edges, especially on small elements such as hair.
In video encoded in 4:2:0, color information can "blur" around text, making it harder to read. 4:2:2 keeps text much sharper.
Hardware support for 4:2:2 encoding and decoding in NVIDIA GeForce RTX 50 series graphics cards (based on Blackwell architecture) provides an advantage for content creators and professionals. Processing 4:2:2 video is a very resource-intensive task for the central processing unit (CPU) or requires powerful professional graphics cards. Blackwell provides hardware video encoders/decoders (NVENC/NVDEC) with full 4:2:2 support for H.264, HEVC, and AV1 (including MV-HEVC).
This makes converting video files to 4:2:2 much faster, for example up to 10x faster than CPU alone. It also enables smooth playback and editing of multiple 4K 4:2:2 streams simultaneously. For example, the RTX 5090 can decode up to 10 simultaneous 4K 30fps streams of 4:2:2 video per decoder.
Thus, 4:2:2 support on the Blackwell architecture significantly expands the capabilities of the RTX 50 Series graphics cards for professionals and video enthusiasts, making them even more powerful tools for creating high-quality content.
Hands-on experiments with GeForce RTX 5080
Graphics cards have long been the main computing power even in ordinary consumer systems. They are used for such resource-intensive tasks as video transcoding, rendering, modeling, media file processing, as well as for other needs that require significant calculations. Relatively recently, the list of "heavy" tasks has been added to the list of work with large language models LLM (Large Language Model) for applied local applications.
We conducted several quick tests to evaluate the performance of a system equipped with a GeForce RTX 5080 16GB graphics card. The platform used was an ASUS TUF Gaming GeForce RTX 5080 16GB OC graphics adapter and a Ryzen 7 9800X3D CPU (8/16; 4.7/5.2 GHz; 96 MB L3).
The accelerated version of the video card has a modified frequency formula of the graphics processor - instead of the recommended 2295/2617 MHz, it has the form 2295/2700 MHz. The maximum computing performance is declared at 1858 TOPs.
The GDDR7 memory chips operate at 1875 MHz (30,000 MHz effective). With a 256-bit bus, the total memory bandwidth is 960 GB/s. This is almost the same as NVIDIA's previous-generation flagship, the GeForce RTX 4090 24GB (1010 GB/s).
The ASUS TUF Gaming GeForce RTX 5080 16GB OC has a 3.6-slot design with three fans and a large radiator, so the model obviously has an effective cooler that allows it to maintain proper temperatures even under maximum load.
During the experiments, NVIDIA Studio Driver (576.80) drivers were installed in the system, which have high stability and are best optimized for work tasks.
MLPerf Client 0.5
The MLPerf Client 0.5 test is designed to evaluate the performance of systems in working with artificial intelligence tasks, in particular large language models (LLM). This version uses the Meta Llama 2 7B model, optimized for reduced memory and computational requirements using 4-bit quantization. This brings the tests closer to real-world scenarios of using LLM on local hardware.
The benchmark includes four main tests that simulate common LLM usage scenarios: Content Generation, Creative Writing, Summarization, Light, and Moderate. At all stages, AI text is generated in response to various inputs.
Based on the results of the testing, we get the following parameters. Time to First Token (TTFT) – time to the first token. This is a critically important metric for interactive AI applications, measuring the delay between a request and receiving the first element of the response. For chatbots and generative tasks, this provides a sense of "response". Tokens Per Second (TPS) – the number of tokens per second. Estimates the average rate of token generation after receiving the first token. This is an indicator of the system's sustainable throughput.
For the GeForce RTX 5080 16GB, we have the following geometric mean indicators: Time to First Token (TTFT) – 0.113 s, Token Rate (TPS) – 157.3 tokens/s.
For comparison, a system with a GeForce RTX 4070 Ti SUPER 16GB offers a TTFT of 0.166 s, and an average rate of 120.44 tokens/s.
If you are interested in evaluating the capabilities of the top model GeForce RTX 5090 in this discipline, we present the corresponding MLPerf Client 0.5 results obtained from NVIDIA. The RTX 5080 indicators have already been tested in practice, so the values of the flagship video card are frankly impressive.
Procyon AI Text Generation
Developing the "text" theme, we also tested the system's capabilities in the Procyon AI Text Generation test, which focuses on evaluating device performance when performing text generation tasks using large language models (LLMs).
It uses several models of different sizes and architectures at once, which allows for a more comprehensive and representative assessment of AI performance on a wide range of devices and LLM usage scenarios.
Microsoft Phi-3.5-mini: A lightweight, efficient model, often used for testing on integrated GPUs (iGPUs) or less powerful devices.
Mistral 7B: A very popular 7 billion model, known for its efficiency and high performance. Tested on both iGPU and discrete graphics cards.
Meta Llama 3.1-8B: One of the latest versions of Llama from Meta, offering improved performance. Also used on iGPUs and discrete graphics cards.
Meta Llama 2-13B: A larger model that typically requires more powerful hardware (video cards with increased memory capacity).
The GeForce RTX 5080 is already expected to get very good results. Note that when using Meta Llama 2-13B with the ONNX Runtime inference engine, the process required more than 12 GB of video card memory. So, to be able to experiment with large models, powerful graphics adapters with at least 16 GB of memory are required.
Procyon AI Image Generation
Recently, a very popular model for using the system's AI capabilities is generating images based on text descriptions. The Procyon AI Image Generaiton test allows you to test these platform capabilities.
The benchmark uses a standardized set of text prompts to generate images, ensuring consistent workloads. The testing includes several stages built on different versions of the Stable Diffusion model (1.5 and XL), allowing for performance measurements on both powerful discrete GPUs and less demanding NPUs.
The results of testing the platform with GeForce RTX 5080 are presented in the corresponding screenshots. Among the advantages offered by models on GPUs with Blackwell architecture is support for the FP4 format. So, when using models with this accuracy, the generation speed will double compared to FP8.
Blender Benchmark 4.4.0
The benchmark focuses primarily on rendering using Cycles, Blender's physically accurate renderer. Cycles uses ray tracing to create realistic lighting, shadows, and reflections, making it very demanding on hardware.
Based on the results of the Monster, Junkshop, and Classroom component stages, the system with the GeForce RTX 5080 scores just over 9,000 points.
Blender Benchmark also allows you to use the central processor to perform similar tasks. The corresponding screenshots show the results of the Ryzen 7 9800X3D. The final overall score of 327 points is a clear example of the fact that in certain resource-intensive tasks, choosing a powerful graphics card is the only alternative.
V-Ray 6 Benchmark
To test the capabilities of the video card, we also used V-Ray 6 Benchmark, a free tool from Chaos Group (developers of the V-Ray renderer), which allows you to objectively assess the capabilities of the GPU in conditions close to real professional activity.
During testing, the V-Ray renderer itself is used, which is used for photorealistic visualization in architecture, design, film, and animation.
According to the test results, when rendering on the GeForce RTX 5080, the system scored 9200 points. For comparison, the previous generation model - RTX 4080 usually provides ~7700 points at this stage, RTX 4070 Ti SUPER - 6600.
So what does the GeForce RTX 5080 offer for those working with demanding tasks? Thanks to the Blackwell architecture, it brings a tangible boost in AI performance, which is especially important for working with large language models (LLM) and image generation. When it comes to 3D rendering and visualization or video processing, the RTX 5080 also proves to be a powerful tool, significantly accelerating calculations thanks to improved tensor and RT cores.
With active support and continuous development of the GPU computing ecosystem, older models of the GeForce RTX 50 line allow you to effectively execute truly complex projects, providing the speed and stability necessary for comfortable work with modern applications.
A wide selection of NVIDIA GeForce RTX 50 graphics cards is available in the Telemart.ua online store