Saturday, 2 September 2023

MUSINGS: nVidia's GeForce RTX 4090. Power Limited Overclock settings. On technological progress into cognitive domains.

For this post, I thought I'd explore technological progress in the computing space. While I speak mainly about audio topics in these pages, I hope this post gives you an idea of the broader technological progress happening around us.

As per the image above, I've updated my workstation graphics card to a Gigabyte GeForce RTX 4090 Gaming OC. This card is truly a beast, covering 3.5 slots in the computer thanks to the extra vapor cooling chamber. [Related to this card is the slimmer Gigabyte GeForce RTX 4090 WindForce V2 that "just" takes up 3 slots.]

Years ago in 2017, I spoke about my GTX 1080 used in the game machine of that time. Since then, it's obvious that technology has moved forward very substantially! Every couple years has seen a new generation of graphics cards churned out by nVidia typically with AMD following suit and these days Intel making headway with their more budget-friendly ARC line.

With each iteration, we're seeing objective improvements in computational ability and physical characteristics like the shrinkage of the transistor "process" from 16nm for the GTX 1080 down to the 5nm (near limits of silicon) in the RTX 4090 today. With the ability to fit more transistors into a smaller area (higher density), the number of parallel computational cores has increased from 2560 shading cores (also known as CUDA cores) in the GTX 1080 to 16384 units in the RTX 4090, a 6.4x increase in this one metric alone; not to mention the addition of new features like the Tensor cores (high speed, mixed precision, high-dimensional matrix multiplications) released in the RTX 20*0 generation, important in deep-learning tasks.

Let's look at the theoretical computational speed by considering 32-bit floating point processing. Here's what it looks like over the last few generations focused on just the higher-end GTX/RTX *080(Ti) and *090 GPU models (we'll ignore the lower-end GTX 16 series here):

With architectural improvements beyond just the core count, notice that the TFLOPS (that would be trillions of floating point operations per second) increased about 10x over the last 7 years. Furthermore, this performance evolution isn't just with the GPU but also essential parts like the GDDR RAM subsystem and the widening of memory bus width - 256-bit 1080/2080/4080, 320-bit RTX 3080, 384-bit RTX 3090/4090. At this point, the RTX 4090 memory bus is capable of >1TB/s.

The top picture is a hasty photo I took when I first unboxed this monster! Weighing at >2kg, the RTX 4090 card includes an "anti-sag" retaining bracket that one should attached to the motherboard especially if oriented at 90°. There's also a 16-pin 12VHPWR cable which is a standard for new power supplies (like this GAMEMAX 1300W).

I didn't have my GTX 1080 card available to show, but here's a look at the 4090 compared to a 3GB GTX 1060:

For FP32 computation, the little GTX 1060 can deliver a "mere" 3.9 TFLOPS.

With the large 110mm fans, the card runs quietly for the most part but speeds up and is audible under a heavy load. Nothing too obtrusive or distracting though and it's a low-pitched hum. The card has dual BIOS and there's a little switch for changing from "OC" and "SILENT" modes at the top. I've actually never tried the "SILENT" mode since it's quite quiet already.

Note the little OC/SILENT BIOS switch.

I've heard that some people have complained about high pitched "coil whine" from these high-power cards. I have had no issues at all with this Gigabyte even under heavy load (here's a comparison with the ASUS TUF 4090). No melting of 16-pin power connector here either which was likely user error and poor connections heating up given the high current demand these days. Make sure to plug things in firmly especially the 12VHPWR-to-PCIE header adaptors. :-|

Typical arrangement of outputs: single HDMI 2.1a (48Gbps) + 3 DisplayPort 1.4a (32.4Gbps).

With a card that size, it takes up a huge chunk of real estate even in a full tower case. In fact, I had to remove a multidrive cage to accommodate the length of this card:
Anti-sag retainer in place under the card, can't be seen in image. Notice the 4x 8-pin PCIE to 12VHPWR adaptor cable sticking up top - make sure some case clearance! 180°/90° adaptors might be helpful.

I'm putting this in my Ryzen 9 3900X, 32GB workstation discussed a few years back. For 4K games at >60fps, the 12-core Ryzen 9 should not be a limiting factor with this GPU. The AM4 X570 chipset motherboard is able to support PCIe 4.0 for full transfer speed as well. I may upgrade the CPU and motherboard (tempted by Intel's lower power i9-13900), but this current set-up is fine for my uses for now. The EVGA G5 750W power supply is still serving me very well; these days, nVidia recommends to start at 850W for the RTX 4090 though, ideal would be 1000W.

The default RTX 4090 GPU has a base clock of 2235MHz with performance boost. Third party cards like Gigabyte typically add an overclock in the BIOS which is often small - for example my Gigabyte RTX 4090 Gaming OC has upped the "boost" speed to 2535, an insignificant 15MHz over the 2520MHz "boost" of the reference "Founders Edition" card.

Here's a look at the GPU-Z page for this card with some tech details:

Notice in the "GPU Clock" section, I'm actually running this card overclocked with base 2360MHz and boost 2660MHz, with memory overclock from 1313MHz to 1438MHz when I took this snapshot. We will talk about this stuff later. Notice that I've enabled "Resizable BAR" - see this for more details. It's typically a setting in the BIOS.

Let's make sure it "works" as expected using benchmarks. These days the 3DMark "Speed Way" is becoming popular for the latest DirectX 12 Ultimate API with all kinds of funky shader effects and ray-tracing. Here's what I get with this machine using the RTX 4090 at stock speed:

3DMark Speed Way scene.

Not bad, the card is basically running at the "average" submitted speed. Nothing to be afraid of if the score is a little lower than average. A bunch of overclockers submit their results hence you see that "Best" score up above 11000! A score close to "average" is already very good and as expected.

Some games are still based on DirectX 11, let's have a look with the 4K 3DMark Fire Strike Ultra:

Interesting that my score is a little lower than "average". Again, I wouldn't get too concerned given the kinds of systems people are testing their gear with. So long as my results are in the "zone" of that spike in the graph of hardware results, it's as expected.

Something interesting about the high-speed GDDR RAM is that we can turn on error-correction (ECC):

By doing this, the RAM access will slow down slightly. Here's what happened to Speed Way and Fire Stroke Ultra with ECC "ON":

We're seeing something between 1.5-6% drop in performance with ECC on. Looks like there's more of a performance hit in Speed Way perhaps due to the advanced shaders and ray-tracing effects being more dependent on RAM speed.

Normally for gaming, there's no need to turn on ECC. In the past, this ECC feature was offered on the professional nVidia cards like the Quadro and Tesla models. I could turn this on for more "critical" computational tasks.

For fun, let's compare these scores with my old nVidia GTX 1080 card currently running at stock speed in an overclocked 4.4GHz Intel i7-3770K gaming rig. Since the GTX 1080 can't run Direct X 12 Ultimate 3DMark Speed Way, here's Direct X 11 Fire Strike Ultra:

Notice how the graphics performance is not being limited by the old i7 CPU. You can see on the "CPU Clock Frequency" graph how through most of the Graphics Test, the clock speed is bouncing below 4GHz indicating the CPU is not even being strained. In this test, the GTX 1080 is scoring at less than 25% that of the RTX 4090. Obviously at 4K with this test, the system is GPU limited.

For those of you gamers familiar with benchmarks, you might be wondering why I'm not showing 3DMark Time Spy. Unfortunately it looks like there's a bug somewhere between Windows 11 22H2 and 3DMark at least in some systems like mine. When I try Time Spy, it's not running with full CPU load so the results have been inaccurate on my RTX 4090; down by up to 50%. I've seen this issue mentioned in a few message threads already.

A quick look at overclocking...

Well, with a beast like this and that huge cooler system, let's try some overclocking. This is about the fastest I could run it on air using MSI Afterburner stably:

You can see the MSI Afterburner window in the bottom right with Core Clock set +200MHz, and Memory Clock +1500MHz. With a score of 10,355 points, we're about +6% over the stock settings while still keeping Power Limit at 100%.

Of course a single reading isn't all that meaningful if not stable. I have no plans to keep the machine overclocked like this, so let's just do a modest "stress test" and loop Speed Way 10x to make sure it doesn't crash:

Nice, 99% frame rate stability across the 10 runs which tells us that there's no issue with thermal throttling. We can look at the Monitoring details to make sure everything looks alright:

Notice that GPU temperature stays below 72°C (temperature limit typically >80°C). The CPU is getting a bit toasty at around 95°C though!

Given the punishing load of Speed Way, stability with 10 loops is probably good enough for most games. To be even safer, I'd recommend stability with 20 loops which is the usual 3DMark default. Yes, I can push this even faster by increasing the "Power Limit" to say 110%, but I don't see the need...

My daily settings: Power Limit Overclock.

Philosophically, I always like to find a reasonable balance when it comes to the technology and what I'm using it for. For this kind of computing performance, it's no surprise that at 100% power limit, the RTX 4090 can suck up to 432W. You can easily feel that heat dissipation in the room when under persistent load. It's good to save power if I can for all kinds of reasons from being a good steward of resources all the way to just keeping heat production low during the warm days of summer. Lower heat improving service life, and this will help since I'm currently still running a lower-power 750W PSU.

Here's the setting I run my RTX 4090 at most of the time:

Notice that in MSI Afterburner, it's set to 80% power limit which reduces maximum demand down to 350W while pushing the GPU clock speed +130MHz and Memory +1300MHz. This gets my 3DMark Speed Way score to 9800 which is around the reported systems average (including those overclocked submissions of course). If I didn't overclock the GPU & Memory, I'd get a score of around 9420 which is about 4% lower at 80% power limit.

I can then run the stress test to check stability - how about looping Speed Way 40 times (basically running the GPU close to 100% load for 40 minutes):

Nice, basically no drops in performance with the highest and lowest runs within ~1% variance. And let's have a look at the Details to see how hot this gets.


Not bad. Notice that the GPU temperature doesn't get over 66°C now at 80% power limit across 40 minutes. I'll happily take the lower temperature and decreased maximum power consumption while still running around the 3DMark "average" performance!

Finally, let's throw in an oldie-but-goodie DirectX 11 benchmark that still looks pretty good despite the years. Let's run Unigine Heaven (2009) at 4K screen resolution with all settings max'ed out (ultra quality, extreme tessellation, x8 anti-aliasing), audio turned on, full screen:



Average 110.8fps with score of 2791 is fine at 4K resolution on the AMD Ryzen 9 3900X + RTX 4090 with my 80% power limited overclock settings. Compare this to the Intel i7-3770K + stock nVidia GTX 1080:


Again, we see the massive gain in performance. Only a score of 641 with the GTX 1080; less than 1/4 of that for the RTX 4090 system.

In summary...

Well the nVidia RTX 4090 GPU is an impressive device to say the least. This kind of computing power does not come cheap however; we're currently looking at around US$1600 for the Gigabyte RTX 4090 Gaming OC card I have here with similar priced models from Galax, Zotac, MSI, etc. Add another $100 and there's the ASUS TUF 4090 with an extra HDMI output. Pay even more and you can get liquid-cooled models like the ASUS ROG Strix LC. I don't mind paying the price given the technology I'm getting.

[Maybe think about this, audiophiles, before paying $5k-10k for "the best" audio cables. Just as importantly, as educated consumers, we probably should ask: What kind of companies, and people have the temerity to even pretend that this amount of money represents fair value for RCA wires?!]

When talking about gaming, it's always tempting to compare to game consoles. However, no comparison would be apples-to-apples due to the differences in software, optimizations, and typically different-spec CPU. From an nVidia-equivalent GPU hardware angle, the Playstation 5 GPU would be about the speed of an RTX 2070 (~7.5 TFLOPS FP32 if you want to compare to some of the numbers above), and the XBox Series X a little faster like the RTX 2070 Super (~9 TFLOPS FP32). The AMD CPU's in these boxes would be about an AMD Ryzen 7 5700G or slightly below the Intel i5-12600K - not bad at all. In any event, it's going to take awhile before mainstream budget cards and consoles at the current price point of ~$500 gets to the RTX 4090's level so there will be many years of fun to be had. For context, by percentage share, the July 2023 Steam hardware numbers are showing that the top 5 graphics cards, >20% of gamers, are using these nVidia devices: GTX 1650, RTX 3060 (12.7 TFLOPS FP32), GTX 1060, RTX 2060, and RTX 3060 Mobile (10.9 TFLOPS FP32 - notice slower for laptops). The RTX 3060 is the top card there and I think a "sweet spot" these days priced at around US$300.

Beyond just raw horse power, we're seeing ways not just of "working harder", but also "working smarter" to enhance performance of the hardware. Just last week, nVidia showed off DLSS 3.5 with enhanced ray-tracing upscaling which looks impressive. I suspect these techniques will be increasingly substantial pieces of how graphics is produced as technology matures and implemented in A-list games. We can find an analogy between upsampling and how the human brain reconstructs the world we experience using the limited data it receives from the visual and auditory sensory systems.

At some point, all consumer technologies hit a "good enough" point. I've discussed over the years that audio technology was the first to reach that point. Once the 2-channel Audio CD came on the scene as a hi-fi source in the early 1980's, while there continues to be evolutionary advances (especially in delivery and access at home), the resolution needs of reasonable "music lovers" was satisfied - a long time ago.

While computer graphics remains one of the frontiers where generational progress like what we're seeing between the RTX 3090 to RTX 4090 in TFLOPS can be very substantial, whether this kind of ongoing progress will find many consumers I think remains to be seen (obviously, on the bleeding edge research fronts, there's no limit to demand for processing power!). Already we hear from various corners of the Internet the idea that there might be a limit that gamers "need" when it comes for photorealistic graphics. Screen resolution has already plateaued based on visual acuity (I would argue years ago that 4K is basically the equivalent of "CD resolution" for video). At what point gamers are adequately satiated remains to be seen, but it might not be far away. I know, media articles like to play up the idea that some games already drop <60fps even with the RTX 4090 (like this one recently). That should be no surprise if we push geometry settings up, increase antialiasing, apply advanced effects like illumination and shading models, etc. At this point in history, I would think that games should at least be very playable on the RTX 4090 in 4K based on the concept of "fun", not just some tech numbers! The onus is on game developers to deliver a reasonable product with good frame rates if they expect to sell copies.

[There is a point of objective-overkill that goes beyond what is needed to be "fun". This is a bit like worshipping numbers such as SINAD in the objectivist-audiophile world without controlled listening tests as recently discussed.] 

Once that point of "good enough" gaming visual quality is hit, what then should we use all that computing processing power for? I think the obvious answer is in modeling the mind and the cognitive processing at the heart of AI (especially now that crypto-mining has diminished, although I suspect it'll make a comeback in a year or two). This is inevitably the future "killer app" in all its incarnations from "imagining" images to "generative thought". It's clearly what investors are excited about with companies like nVidia (stock doubling in the last year). Nothing goes in a straight line when we're dealing with human emotions, and there will obviously be hype around AI that needs to be dispelled in the markets. But the path forward over the next many years is inevitable for ever-improving "cognitive" technology.

Panzer Dragoon: Remake (top left), Nex Machina (bottom left), Killer Instinct (top right), Dead Or Alive 6 (bottom right). The occasional gaming fun here at the Arch household...

As for my RTX 4090...

No, I'm not going to be using this for gaming. I don't have much time these days for gaming other than the occasional lazy weekend afternoon so the price-benefit would be a waste for such little use! Games like IkarugaNex Machina, Furi, Tetris Effect, DOA6, Panzer Dragoon: Remake, Killer Instinct, the various Street Fighters (including the new Street Fighter 6), or the new Armored Core 6: Fires of Rubicon run beautifully on my i7+GTX 1080 already in 4K with effects turned up!

Armored Core VI (2023) and Forza Horizon 5 (Nov 2021 release). Still playing these with the GTX 1080. Love the Hot Wheels section of Horizon 5; even with 85% power limit on the GTX 1080, typically achieving >50fps in 4K with high settings which is absolutely "good enough" for most gaming I think.

These days if I wanted to build a fresh 4K/2160P gaming rig with capable GPU ~US$1000, the recent AMD Radeon RX 7900 XTX cards (RDNA 3.0, 24GB GDDR6, 61.5TFLOPS) would do the job well at a decent price, competing at the nVidia RTX 4080 (16GB, 48.7TFLOPS) level for game fps while somewhat less expensive. In the more mainstream price point (sub-US$400, aiming at 1440P performance), keep an eye on Intel's discrete graphics cards. The current Intel ARC A770 (16GB GDDR6, 19.7TFLOPS) looks good on paper with more potential optimizations ahead. It performs at about the level of nVidia's RTX 4060 (8GB GDDR6, 15TFLOPS) thus far, but depending on the game, could continue to improve with new drivers.

Instead of gaming, my card is going to be for running large language models that we're training / developing at work; you know, the ChatGPT-like natural language processing these days but of course on smaller scale projects. While AI-type work can be done in AMD and Apple, currently the tools are very much nVidia-centric.

For those looking to build "AI workstations", VRAM amount and speed is important due to the large data sets especially for those LLMs, so a dual-GPU memory pooled system like a couple of RTX 3090's with 24GB each is a great way to do it at lower cost when you want to try stuff like fine-tuning language models. For the extra cost, there are of course speed benefits with the RTX 40 generation plus new significant features such as the Tensor core working with 8-bit float datatype (FP8) - see nVidia's Ada architecture paper and this article for more.

While I have some stuff I still want to do around audio, I can see that the need to discuss high quality audio reproduction will be limited given the kinds of quality and prices in the hi-fi available today. Compared to the computing advances in this article, hi-fi audio is very "mature" technology. As tech hobbyists (aka "geeks"), there are all kinds of other interesting stuff to explore beyond audio! :-)

Hope you're all enjoying the music, and maybe having fun with the phenomenal gaming technologies these days!

PS: For "hard core" GPU overclockers only, recently released BIOS hacking tool available! Be careful and don't go crazy and kill your expensive GPU.

We know LLM's can come up with interesting "subjective" reviews and show some creative abilities previously. Let's end off with a few "visual hallucinations" from my RTX 4090 the other day:

"Day of the Furry Monsters"

"The Reproduction of AI"

"Vision of Armageddon"

"Alien World"

"Angelic Messengers"

"I am become death..."

"Future Cityscape - Global Boiling Edition"

"Garden of Earthly Delights - with apologies to H. Bosch" - you should see what the GPU came up with for "Garden of Earthly Carnal Delights", but we'll keep this PG. :-)

7 comments:

  1. Hello fellow geek!
    The advances in computer graphics have been nothing but astounding. From the humble GeForce 256 released in 1999 to the awesome RTX 4090 in a little more than twenty years.
    For the past year or so I have been lusting for the 4090 but I could not justify paying so much for a GPU. But I am sorely tempted! For the most part you get what you pay for with pc gear as opposed to certain audio brands/products. In fairness Nvidia have had their share of questionable releases with not entirely honest specs. They had to settle with GTX 970 buyers because the claimed 4GB VRAM was only 3.5GB. They also cancelled production of the 4080 12GB after the backlash from consumers who were only getting the performance of 4070. Nevertheless, computer products generally stay true to a linear price/performance graph.
    On the subject of “when is it good enough” I recommend watching this: https://youtu.be/AjP7B2QFB9E?si=A8h9n0gi62qTT-rK

    Thanks Arch for an enjoyable in-depth look at the RTX 4090. Maybe, one day….
    Take Care. Cheers Mike

    ReplyDelete
    Replies
    1. Hey there Mike,
      Yeah man, computing progress for all those very complex tasks in real-time like graphics processing, real-world modeling, artificial cognitive processing / "intelligence", etc. is where it's at... Audio, at least what most audiophiles talk about (especially the typical 2-channel audiophile), not so much :-).

      For sure, there are stumbles here and there like that recent weak RTX 4080 12GB debacle when there's also the 16GB around. But for the most part, it's been good...

      Oh I'm sure one day the prices will go down and you'll get your 4090/equivalent!

      LOL, the "Audio Masterclass" video. He's hilarious and correct. :-) Some "differences don't make a difference". A way wiser and more realistic perspective than the ridiculous unqualified "everything matters" phrase we see in "High End Audio", or the refusal to acknowledge obvious "diminishing returns" on price:performance; shades of "Flat Earth" mentality when all kinds of evidence is around to indicate/measure otherwise. As usual, the guys who spout this nonsense have something to sell you and typically these reviewers themselves do not appear to be particularly wealthy individuals.

      Delete
  2. That GPU is a nice fit for HQPlayer processing too!

    ReplyDelete
    Replies
    1. Indeed Miska!

      You've really put the mathematical processing to work in some of those DSD filters!

      Delete
  3. I wonder if RTX 5090 will use GDDR7, if we assume the same memory arrangement as previous flagships that could mean 36GB VRAM at 1.5TB/s... that would make me upgrade my gaming rig from 3080 for sure. Games will "keep up" with hardware for some time still in my opinion, even my skyrim modlist can choke my 3080 if I increase from 1440p to 4k

    ReplyDelete
    Replies
    1. Hey Tacet,
      Yeah that would be an amazing push forward with the faster memory infrastructure and simply just more. I agree, games will keep sucking up the processing power until the day we have truly photorealistic imagery indistinguishable from "real" photographs.

      I'm certainly all for it so long as game design remains "fun". ;-)

      Delete
  4. Fantastic article! The in-depth analysis of the NVIDIA GeForce RTX 4090's power efficiency is impressive. I also recommend checking out the PNY VCNRTXA5000-PB for a great balance of performance and power.

    ReplyDelete