For this post, I thought I'd explore technological progress in the computing space. While I speak mainly about audio topics in these pages, I hope this post gives you an idea of the broader technological progress happening around us.
As per the image above, I've updated my workstation graphics card to a Gigabyte GeForce RTX 4090 Gaming OC. This card is truly a beast, covering 3.5 slots in the computer thanks to the extra vapor cooling chamber. [Related to this card is the slimmer Gigabyte GeForce RTX 4090 WindForce V2 that "just" takes up 3 slots.]
Years ago in 2017, I spoke about my GTX 1080 used in the game machine of that time. Since then, it's obvious that technology has moved forward very substantially! Every couple years has seen a new generation of graphics cards churned out by nVidia typically with AMD following suit and these days Intel making headway with their more budget-friendly ARC line.
With each iteration, we're seeing objective improvements in computational ability and physical characteristics like the shrinkage of the transistor "process" from 16nm for the GTX 1080 down to the 5nm (near limits of silicon) in the RTX 4090 today. With the ability to fit more transistors into a smaller area (higher density), the number of parallel computational cores has increased from 2560 shading cores (also known as CUDA cores) in the GTX 1080 to 16384 units in the RTX 4090, a 6.4x increase in this one metric alone; not to mention the addition of new features like the Tensor cores (high speed, mixed precision, high-dimensional matrix multiplications) released in the RTX 20*0 generation, important in deep-learning tasks.
Let's look at the theoretical computational speed by considering 32-bit floating point processing. Here's what it looks like over the last few generations focused on just the higher-end GTX/RTX *080(Ti) and *090 GPU models (we'll ignore the lower-end GTX 16 series here):
With architectural improvements beyond just the core count, notice that the TFLOPS (that would be trillions of floating point operations per second) increased about 10x over the last 7 years. Furthermore, this performance evolution isn't just with the GPU but also essential parts like the GDDR RAM subsystem and the widening of memory bus width - 256-bit 1080/2080/4080, 320-bit RTX 3080, 384-bit RTX 3090/4090. At this point, the RTX 4090 memory bus is capable of >1TB/s.
The top picture is a hasty photo I took when I first unboxed this monster! Weighing at >2kg, the RTX 4090 card includes an "anti-sag" retaining bracket that one should attached to the motherboard especially if oriented at 90°. There's also a 16-pin 12VHPWR cable which is a standard for new power supplies (like this GAMEMAX 1300W).
I didn't have my GTX 1080 card available to show, but here's a look at the 4090 compared to a 3GB GTX 1060:
For FP32 computation, the little GTX 1060 can deliver a "mere" 3.9 TFLOPS. |
With the large 110mm fans, the card runs quietly for the most part but speeds up and is audible under a heavy load. Nothing too obtrusive or distracting though and it's a low-pitched hum. The card has dual BIOS and there's a little switch for changing from "OC" and "SILENT" modes at the top. I've actually never tried the "SILENT" mode since it's quite quiet already.
Note the little OC/SILENT BIOS switch. |
I've heard that some people have complained about high pitched "coil whine" from these high-power cards. I have had no issues at all with this Gigabyte even under heavy load (here's a comparison with the ASUS TUF 4090). No melting of 16-pin power connector here either which was likely user error and poor connections heating up given the high current demand these days. Make sure to plug things in firmly especially the 12VHPWR-to-PCIE header adaptors. :-|
Typical arrangement of outputs: single HDMI 2.1a (48Gbps) + 3 DisplayPort 1.4a (32.4Gbps). |
With a card that size, it takes up a huge chunk of real estate even in a full tower case. In fact, I had to remove a multidrive cage to accommodate the length of this card:
Anti-sag retainer in place under the card, can't be seen in image. Notice the 4x 8-pin PCIE to 12VHPWR adaptor cable sticking up top - make sure some case clearance! 180°/90° adaptors might be helpful. |
I'm putting this in my Ryzen 9 3900X, 32GB workstation discussed a few years back. For 4K games at >60fps, the 12-core Ryzen 9 should not be a limiting factor with this GPU. The AM4 X570 chipset motherboard is able to support PCIe 4.0 for full transfer speed as well. I may upgrade the CPU and motherboard (tempted by Intel's lower power i9-13900), but this current set-up is fine for my uses for now. The EVGA G5 750W power supply is still serving me very well; these days, nVidia recommends to start at 850W for the RTX 4090 though, ideal would be 1000W.
The default RTX 4090 GPU has a base clock of 2235MHz with performance boost. Third party cards like Gigabyte typically add an overclock in the BIOS which is often small - for example my Gigabyte RTX 4090 Gaming OC has upped the "boost" speed to 2535, an insignificant 15MHz over the 2520MHz "boost" of the reference "Founders Edition" card.
Here's a look at the GPU-Z page for this card with some tech details:
Notice in the "GPU Clock" section, I'm actually running this card overclocked with base 2360MHz and boost 2660MHz, with memory overclock from 1313MHz to 1438MHz when I took this snapshot. We will talk about this stuff later. Notice that I've enabled "Resizable BAR" - see this for more details. It's typically a setting in the BIOS.
Let's make sure it "works" as expected using benchmarks. These days the 3DMark "Speed Way" is becoming popular for the latest DirectX 12 Ultimate API with all kinds of funky shader effects and ray-tracing. Here's what I get with this machine using the RTX 4090 at stock speed:
3DMark Speed Way scene. |
Not bad, the card is basically running at the "average" submitted speed. Nothing to be afraid of if the score is a little lower than average. A bunch of overclockers submit their results hence you see that "Best" score up above 11000! A score close to "average" is already very good and as expected.
Some games are still based on DirectX 11, let's have a look with the 4K 3DMark Fire Strike Ultra:
Interesting that my score is a little lower than "average". Again, I wouldn't get too concerned given the kinds of systems people are testing their gear with. So long as my results are in the "zone" of that spike in the graph of hardware results, it's as expected.
Something interesting about the high-speed GDDR RAM is that we can turn on error-correction (ECC):
By doing this, the RAM access will slow down slightly. Here's what happened to Speed Way and Fire Stroke Ultra with ECC "ON":
We're seeing something between 1.5-6% drop in performance with ECC on. Looks like there's more of a performance hit in Speed Way perhaps due to the advanced shaders and ray-tracing effects being more dependent on RAM speed.
Normally for gaming, there's no need to turn on ECC. In the past, this ECC feature was offered on the professional nVidia cards like the Quadro and Tesla models. I could turn this on for more "critical" computational tasks.
For fun, let's compare these scores with my old nVidia GTX 1080 card currently running at stock speed in an overclocked 4.4GHz Intel i7-3770K gaming rig. Since the GTX 1080 can't run Direct X 12 Ultimate 3DMark Speed Way, here's Direct X 11 Fire Strike Ultra:
Notice how the graphics performance is not being limited by the old i7 CPU. You can see on the "CPU Clock Frequency" graph how through most of the Graphics Test, the clock speed is bouncing below 4GHz indicating the CPU is not even being strained. In this test, the GTX 1080 is scoring at less than 25% that of the RTX 4090. Obviously at 4K with this test, the system is GPU limited.
For those of you gamers familiar with benchmarks, you might be wondering why I'm not showing 3DMark Time Spy. Unfortunately it looks like there's a bug somewhere between Windows 11 22H2 and 3DMark at least in some systems like mine. When I try Time Spy, it's not running with full CPU load so the results have been inaccurate on my RTX 4090; down by up to 50%. I've seen this issue mentioned in a few message threads already.
A quick look at overclocking...
Well, with a beast like this and that huge cooler system, let's try some overclocking. This is about the fastest I could run it on air using MSI Afterburner stably:
You can see the MSI Afterburner window in the bottom right with Core Clock set +200MHz, and Memory Clock +1500MHz. With a score of 10,355 points, we're about +6% over the stock settings while still keeping Power Limit at 100%.
Of course a single reading isn't all that meaningful if not stable. I have no plans to keep the machine overclocked like this, so let's just do a modest "stress test" and loop Speed Way 10x to make sure it doesn't crash:
Nice, 99% frame rate stability across the 10 runs which tells us that there's no issue with thermal throttling. We can look at the Monitoring details to make sure everything looks alright:
Notice that GPU temperature stays below 72°C (temperature limit typically >80°C). The CPU is getting a bit toasty at around 95°C though!
Given the punishing load of Speed Way, stability with 10 loops is probably good enough for most games. To be even safer, I'd recommend stability with 20 loops which is the usual 3DMark default. Yes, I can push this even faster by increasing the "Power Limit" to say 110%, but I don't see the need...
My daily settings: Power Limit Overclock.
Philosophically, I always like to find a reasonable balance when it comes to the technology and what I'm using it for. For this kind of computing performance, it's no surprise that at 100% power limit, the RTX 4090 can suck up to 432W. You can easily feel that heat dissipation in the room when under persistent load. It's good to save power if I can for all kinds of reasons from being a good steward of resources all the way to just keeping heat production low during the warm days of summer. Lower heat improving service life, and this will help since I'm currently still running a lower-power 750W PSU.
Here's the setting I run my RTX 4090 at most of the time:
Notice that in MSI Afterburner, it's set to 80% power limit which reduces maximum demand down to 350W while pushing the GPU clock speed +130MHz and Memory +1300MHz. This gets my 3DMark Speed Way score to 9800 which is around the reported systems average (including those overclocked submissions of course). If I didn't overclock the GPU & Memory, I'd get a score of around 9420 which is about 4% lower at 80% power limit.
I can then run the stress test to check stability - how about looping Speed Way 40 times (basically running the GPU close to 100% load for 40 minutes):
Nice, basically no drops in performance with the highest and lowest runs within ~1% variance. And let's have a look at the Details to see how hot this gets.
In summary...
Panzer Dragoon: Remake (top left), Nex Machina (bottom left), Killer Instinct (top right), Dead Or Alive 6 (bottom right). The occasional gaming fun here at the Arch household... |
Armored Core VI (2023) and Forza Horizon 5 (Nov 2021 release). Still playing these with the GTX 1080. Love the Hot Wheels section of Horizon 5; even with 85% power limit on the GTX 1080, typically achieving >50fps in 4K with high settings which is absolutely "good enough" for most gaming I think. |
"The Reproduction of AI" |
"Vision of Armageddon" |
"Alien World" |
"Angelic Messengers" |
"I am become death..." |
"Future Cityscape - Global Boiling Edition" |
"Garden of Earthly Delights - with apologies to H. Bosch" - you should see what the GPU came up with for "Garden of Earthly Carnal Delights", but we'll keep this PG. :-) |
Hello fellow geek!
ReplyDeleteThe advances in computer graphics have been nothing but astounding. From the humble GeForce 256 released in 1999 to the awesome RTX 4090 in a little more than twenty years.
For the past year or so I have been lusting for the 4090 but I could not justify paying so much for a GPU. But I am sorely tempted! For the most part you get what you pay for with pc gear as opposed to certain audio brands/products. In fairness Nvidia have had their share of questionable releases with not entirely honest specs. They had to settle with GTX 970 buyers because the claimed 4GB VRAM was only 3.5GB. They also cancelled production of the 4080 12GB after the backlash from consumers who were only getting the performance of 4070. Nevertheless, computer products generally stay true to a linear price/performance graph.
On the subject of “when is it good enough” I recommend watching this: https://youtu.be/AjP7B2QFB9E?si=A8h9n0gi62qTT-rK
Thanks Arch for an enjoyable in-depth look at the RTX 4090. Maybe, one day….
Take Care. Cheers Mike
Hey there Mike,
DeleteYeah man, computing progress for all those very complex tasks in real-time like graphics processing, real-world modeling, artificial cognitive processing / "intelligence", etc. is where it's at... Audio, at least what most audiophiles talk about (especially the typical 2-channel audiophile), not so much :-).
For sure, there are stumbles here and there like that recent weak RTX 4080 12GB debacle when there's also the 16GB around. But for the most part, it's been good...
Oh I'm sure one day the prices will go down and you'll get your 4090/equivalent!
LOL, the "Audio Masterclass" video. He's hilarious and correct. :-) Some "differences don't make a difference". A way wiser and more realistic perspective than the ridiculous unqualified "everything matters" phrase we see in "High End Audio", or the refusal to acknowledge obvious "diminishing returns" on price:performance; shades of "Flat Earth" mentality when all kinds of evidence is around to indicate/measure otherwise. As usual, the guys who spout this nonsense have something to sell you and typically these reviewers themselves do not appear to be particularly wealthy individuals.
That GPU is a nice fit for HQPlayer processing too!
ReplyDeleteIndeed Miska!
DeleteYou've really put the mathematical processing to work in some of those DSD filters!
I wonder if RTX 5090 will use GDDR7, if we assume the same memory arrangement as previous flagships that could mean 36GB VRAM at 1.5TB/s... that would make me upgrade my gaming rig from 3080 for sure. Games will "keep up" with hardware for some time still in my opinion, even my skyrim modlist can choke my 3080 if I increase from 1440p to 4k
ReplyDeleteHey Tacet,
DeleteYeah that would be an amazing push forward with the faster memory infrastructure and simply just more. I agree, games will keep sucking up the processing power until the day we have truly photorealistic imagery indistinguishable from "real" photographs.
I'm certainly all for it so long as game design remains "fun". ;-)
Fantastic article! The in-depth analysis of the NVIDIA GeForce RTX 4090's power efficiency is impressive. I also recommend checking out the PNY VCNRTXA5000-PB for a great balance of performance and power.
ReplyDelete