Tuesday, April 9, 2019

Thermal mods

This is a response to a question on the Gamers Nexus Patreon.

Delidding
Removes the IHS of a CPU or GPU to enable some of the following modifications.

    Glue removal (die to IHS transfer improved)
        the black glue holding the IHS on the CPU adds distance from the CPU die to the IHS. Removing the sealant eliminates this gap. This improves temperatures as the heat has to travel through less thermal interface material to reach the IHS

    TIM replacements (die to IHS transfer improved)
        The stock thermal paste under a CPU's IHS is design for extra long lifespans not for max thermal performance. Replacing the stock TIM with a more conductive material gives you better temperatures.

    IHS replacements (heat transfer through IHS improved)
        The stock IHS isn't manufactured to any super strict tolerances. Custom IHSs can be made to be flatter and have more contact area to the cooler than the stock IHS and even use more thermally conductive materials(like silver). This improves thermal transfer through the IHS.

    Direct die (die to heatsink transfer)
        With the right configuration it's possible to completely elimitate the IHS. This means that heat now has a much more direct path to the actual cooler.

    Die sanding/lapping (heat transfer through die improved)
        The circuitry actually making up a CPU/GPU is located at the bottom of the die and doesn't penetrate all that deep into it. So reducing the amount of silicon that heat has to go through to leave the die leads to better temperatures. This is very risky and tedious to do.

    Liquid Metal (thermal transfer material)
        Liquid metal alloys tend to come in thermal conductivity around 65W/mK this makes it a far superior to conventional thermal pastes that top out around 10-15W/mK. The down sides are that it's expensive. It destroys aluminum. It diffuses into copper and stains nickle. However in high thermal density applications it significantly out performs normal thermal pastes. High thermal density applications are primarily intel CPU dies. Using liquid metal on 450-700mm^2 300-500W GPU doesn't see anywhere near as much benefit as a 100-500mm^2 intel kabylake die putting out 150-600W. For ambient liquid metal is the end all be all of thermal transfer materials. For sub ambient liquid metal doesn't work as it hardens and loses contact around -20 to 10C depending on the alloy.*

    Indium Solder (thermal transfer material)
        CPUs that don't use a conventional thermal paste under the IHS use a relatively thick layer of indium solder. Indium solder is inferior to liquid metal as it has roughly the same thermal conductivity but is much thicker. It also seems to require the use of thicker silicon dies for reliability. However solder is vastly superior to conventional thermal pastes. Delidding soldered CPUs to run sub zero cooling is ineffective as solder is superior to even direct die thermal paste(for complex reasons). At ambient replacing indium solder with direct die or liquid metal improves thermal transfer to the heatsink/IHS.

    IHS sanding/lapping (IHS to cooler transfer improved)
        If you have a soldered CPU or don't want to buy a custom IHS for your delidded one sanding down your stock IHS is a good way to flatten it out which leads to more direct thermal transfer from the IHS to the cooler.

General Note: If your cooler is incapable of dissipating enough heat cramming heat into it faster is not really gonna achieve anything. (Using liquid metal on an aircooler that already has it's heatpipes completely saturated isn't gonna achieve much)

* Liquid metal on top of a CPU die is as far as I'm concerned a requirement for good cooling even in direct die applications as the there's just not enough contact area on CPU dies for thermal paste to do a good job. Also if you don't mind the negatives of liquid metal, using it on top of an IHS of a high heat output CPU can help shave of a few °C.

Saturday, February 23, 2019

AM4 mobos I consider worth while

Because I keep getting questions from people having a hard time choosing AM4 mobos I've decided to make this post.

#1
ASUS Crosshair VII Hero
Amazing for LN2 overclocking. However for daily I'd say it's very excessive and you'd be better of with a cheaper board and spending the extra budget on something else. Like say better RAM or CPU cooling.

Unfortunately I haven't tested the other top end X470 boards.

#2
MSI X470 Gaming Pro Carbon
When it comes to static overclocking this board can absolutely max out a 2700 or 2700X. For PBO the board doesn't have the best BCLK overclocking implementation but I honestly think messing with BCLK is a waste of time since at best you're gonna pick up like 2-4% more performance across the board. I honestly consider this a go to board for a daily build. Strong VRM with good heatsinks + solid memory overclocking at a reasonable price.

#3
MSI B450M Mortar/Tomahawk/Gaming Pro Carbon
Overclocking wise it's like an X470 Gaming Pro Carbon with a VRM downgrade. It can still handle a 2700X just fine. You do some non overclocking features due to these being being cheaper B450 boards but I'll leave it up to you to decide if you need those or not.

#4
Gigabyte B450M Gaming
I think this is a really good fit for an APU build. The VRM is rather weak but for an APU that doesn't really matter much and with 2DIMMs the board actually does a really good job of memory overclocking. Which is where 90% of an APU's GPU performance comes from. The only major issue with this board is that the voltage monitoring is kinda messed up.


Boards that didn't make the cut and why.

ASUS X470 Prime Pro
It costs almost the same as the Gaming Pro Carbon from MSI while having a worse VRM and doesn't really make up for that in any way that I consider significant.

Gigabyte X470 Ultra Gaming / Gaming 5
The VRM on both of these sucks(it's the same one after all). There's no LLC settings. Voltage only works in offset mode which leads to problems with overclocking CPUs that have a low stock voltage like the R7 2700. Memory overclocking was awful when I last tried it. Really at this price point you can just go for one of the listed X470 or B450 MSI board and get a better board.

Asrock X470 Master SLI / Gaming K4
This is basically a Gigabyte X470 Ultra Gaming with worse VRM cooling and a worse BIOS(so it doesn't have LLC settings). The memory overclocking is pretty good but again at this price you can get an X470 or B450 MSI board that's just better overall.


Sunday, October 7, 2018

Cap modding an i7 7740X // wasting time and resources (I was bored)

First of all I did not expect this to actually do anything. I'm fully aware that the bypass capacitors on the back of a CPU are selected for filtering the very high frequency switching noise that a CPU running at several GHz produces. The capacitors I went with are definitely not capable of doing that. However it's not like adding around 1000uF to the back of a CPU is going to make it clock worse and so motivated by boredom and the fact that a CPU with some massive caps is gonna look ridiculous I proceeded with the operation.

First up I wanted to get an idea of how hard soldering to a CPU would be.
CPU: i3 7350K
Status: dead
Cause of death: PCB got crunched by prototype cooling system

So that extra cap right there still allows the CPU to fit into a socket just fine. One of the main issues with cap modding CPUs is that it's really hard to tell what caps are Vcore and which ones are say VCCGT/VCCSA/VCCIO. The best way to do that is to install the CPU in a motherboard(has to have IHS and socket locked down other wise the contact sucks) and then measure the resistance from Vcore to GND on the mobo. This i3 7350K has around 2.9ohms of resistance from Vcore to ground. Once you take the CPU out you just have to check which caps have the same resistance across them.
Anyway lets move on to the real victim.

CPU: i7 7740X
Status: ~5.465GHz 4C/8T in Cine R15 at 1.45V

And here is the real victim. Stock Vcore to GND resistance on it is about 0.6ohms. Before I started with all this modding this chip would do 5.454GHz cinebench R15 at 1.45V. It's also very temperature sensitive as even with 1.53V it still doesn't do 5.5GHz in CB R15. The caps added were 2 47uF 4V X6S ceramics. They seem to have maybe helped the CPU clock about 10MHz maybe. Wasn't really that hard to get them on. Originally I tried to have them sit on the CPU substrate but that didn't clear the CPU socket cut out. Surprisingly enough the socket is deep enough for me to stack them like this.









Once I realized just how deep the center of the LGA 2066 socket is I figured I could probably fit some 2917 polymers over the field of very little MLCCs on the back of the CPU.


Yeah looks like I definitely could fit them. Only issue is that polymer caps are polarized and the MLCCs on the back of the CPU aren't. So I need to figure out what side of one of the Vcore MLCCs was GND/Vcore. The only way to do that would be to measure the voltage / resistance from the cap pad to GND/Vcore on a motherboard. In order to do that the CPU would need to be installed in a mobo......





The X299 OC Formula has a pretty large hole in the center of the CPU socket meant for temperature probes when benching on LN2. So I ran a wire from one of the caps on the back of the CPU and through the board to get my measurements.

And the pad I had the wire attached to was Vcore.





















And here we have the end result of my boredom. It still fits in the socket on the X299 OC Formula. It still runs. It still does like 5.465GHz. I know I said that earlier the CPU did only 5.45GHz. But honestly I'd say the variance in temperature of the room I was testing in probably made that 15MHz difference rather than the 2 470uF Panasonic polymers and the 2 47uF MLCCs. I went on to mess around with the chip some more and it kinda tends to take longer to crash at higher clocks than before but again were talking like 5.47GHz at most and what I'd consider margin of error differences. I'm probably biased towards saying this did something more so than admitting that it did absolutely nothing just due to the effort involved. In more practical testing this chip does boot and validate 5.65GHz at 1.6V on an AIO and will run cinebench R15 with HT turned off at 5.55GHz with l.55V. So it's pretty much perfect for some Unigine Heaven benching I plan to do soon. Might even do 5.6Ghz for that.

AHOC Patreon: https://www.patreon.com/buildzoid
AHOC T-shirts US: https://teespring.com/stores/ahoc-store-us
AHOC T-shirts EU: https://teespring.com/stores/ahoc-store-eu


Monday, November 6, 2017

R9 Fury X LN2 session.

So I mentioned on the last livestream I might do a Fury X in 3Dmark Vantage for the ROG OC Showdown Team Edition. That plan got scrapped when Elkim the Alza OC team captain managed over 82000 points using a 7820X and a 1070. Since the best CPU I could use for vantage under the ROG OC Showdown rules was a 7700K I decided not to bother.

I still had prepped for LN2 Fury X so I decided to wrap it up in towels and take for a spin through some Firestrike and Unigine Heaven.

Everything went totally great and I hit 1400MHz core with 666MHz HBM at -100C with 1.4V core and 1.42V HBM....

Except that didn't happen at all. First of all the Fury X has cold slow. So the card gets locked at 300MHz core clock the moment it goes bellow 0 core temp. Luckily I had a cold slow removal BIOS. So I copied some of the settings from it's power play table into my 1.3Vcore 1.42V HBM BIOS and fixed the cold slow situation. At the cost of temperature readings as well as idle clocks and software voltage control(I need to look into this some more).

Now that the card was stuck at max core clock instead of 300MHz I started doing some testing. 600 and 666MHz HBM clocks worked great for 3Dmark Firestrike. Though 600HBM works on all my cards with 1.36V. On the other 666HBM did need the cold so that worked well enough. Core clock on the other hand was a mess. First of all something on the card breaks when going bellow -50C. Not sure what it is but either way idling at -60 to -50C involved plenty of artifacting. Running benchmarks at those temperatures met an instant crash regardless of core clock. Tweaking some of the secondary voltages on the card should fix this but I didn't have the mods for that on the card. Vcore was stuck at 1.3V because the cold slow fix broke software voltage control. At the time I didn't feel like adding more Vcore through the GPU's BIOS so I got stuck at 1205 core at around -45C and 1.3V. With clocks barely better than what the card did on it's stock AIO I didn't bother to save scores from firestrike and instead decided to try Unigine Heaven. Which went worse....

666HBM causes a crash in the first scene 1200 core does the same 1190 core crashes in the second scene 1180 core ran once I could not get it to work again. 1170 core and 600MHz HBM got me a score of 5400. Which was almost entilrely thanks to the 5.4GHz 7700K with 3866 16-18-18-28-2T memory. I could have probably managed the same score on the stock AIO. http://hwbot.org/submission/3701018_
 I spent some more time trying to get higher core or HBM clock to work in Heaven or 3Dmark but the card simply wasn't having any of that. 1.35V core was less stable than 1.3V at best. 1.45V HBM was equally useless. 1.4V HBM wasn't any better. So 1.3V core with 1.42V was evidently the best for this card without tweaking the minor rails that require hard mods. Unfortunately stability deteriorated further through out the session eventually I was just getting completely random crashes and errors that all seemed to be driver or memory related and I decided that I probably had a water problem.

I did. Pulling the still cold card revealed a soaked PCI-e slot and gold fingers. My attempt to dry them was a fail and I called it a day since evidently getting into windows was going to be borderline impossible without encountering a slew of random crashes. I just wish I had tried the higher Vcore BIOS on the session because it is entirely possible all my issues were due to all the water.

On the bright side the Fury X pulls basically no power at all when on LN2 so I didn't really use all that much LN2 to keep it at -45C. The whole system power draw peaked at only 470W and the card's VRM was perfectly happy relying for cold to travel to it through the PCB.

Anyway here's some pretty setup pics:






I used a truckload of Kryonaut during mounting just in case the card worked bellow -90C. Hopefully next time I take the card cold this will be necessary as I intend to have ALL THE MODs next time.

Saturday, August 5, 2017

First impressions of VEGA on LN2


This Friday I ran some very quick tests with VEGA FE on LN2. I don't have many screen shots but I do have some intresting information.

Does it work under LN2 at all?

Yes it does. All the way to -185C. No issues what so ever. No cold slow no cold bug or boot bug and no black screens under load either. I did only test at stock voltage and the black screen bug tends to show up at high core voltages but for now it looks like smooth sailing even at full pot.

Does it scale with temperature?

Kinda. My card on stock volts on water does 1680/1100 at best for 3Dmark timespy while bouncing of the power limit. Under LN2 at stock Vcore I was perfectly stable at 1800/1100. I was in a rush due to the card not being properly insulated so I didn't try 1850/1100 but 1900/1100 crashed. 2000/1100 would pass Timespy but the score was lower than at stock clock so it look like VEGA can pull that same stunt like Pascal where it seems to run at very high clocks while under performing horrifically. I'm sure extra Vcore will solve this as evidently I didn't have enough Vcore to run 1900 properly. Another intresting side effect of the LN2 is that VEGA's power draw fell of a cliff. Like 100W less than air cooled on +50% power limit. So the GPU core evidently is very happy with LN2 and could probably go well in excess of 2GHz with more voltage(I did all my testing at stock which is 1.2V).

The HBM2 on the other hand seems to have some major issues. I could get GPU-z's render test to run just fine at 1230MHz HBM2 clock however if I tried to run any real workloads like 3Dmark Timespy it would crash even 1MHz above 1100MHz. I can think of a number of things that could be causing this and all need more testing. First of all the HBM might just need more voltage on either VDD or VPP to sustain load. There might be some kind of issue with the memory timings just being too tight for any clock above 1100Mhz. It could also have been a thermal problem. Pulling the card down I heard something that sounded very much like the thermal paste failing however GPU core temperatures were still bellow 0 and the LN2 pot side temperature probe was responding under load as it should. However the HBM2 stacks don't have any accessible temperature readings and aren't exactly one with the GPU core silicon so the HBM stacks could have lost contact while the GPU core was doing just fine. So at low loads the HBM would stay cool but under full load it would warm up and crash. Either way this needs more testing.

The display outputs freeze over pretty quick as the only things between them and the GPU core are the VPP, VDDCI and display drive VRMs. None of which put out any significant amount of heat so the cold from the GPU core gets the display outputs pretty quick.
The back of the card. I used the mounting bracket from the Raijintek Morpheus II cooler to get a more secure mount for the Der8auer Raptor 4 LN2 pot.
The red wire is hooked up to the Vcore plain so I could check voltages with a DMM. I also added some 2.5V SMD polymer caps to both Vcore and the VDD rail. At ambient those did nothing for overclocking capabilities and I haven't done a before and after test on LN2 either. So I have no idea how much or if they are even helping.
This pic of the GPU frozen just looks cool. You can kinda see the infill around the HBM2 dies. I'm really glad that it's been added as it should make it much much more difficult to damage the HBM interposer when replacing cooling systems on the GPU core.




And here we can see something I found pretty interesting. For some reason and small piece of the thermal paste stayed on the GPU core while all the rest of the paste stayed on the LN2 pot.
With the card on LN2 the VRMs would all freeze over at idle as they don't produce and appreciable heat. Under load the ice on the Vcore VRM would very quickly melt. This might cause some major water problems for extended sessions as keeping the VRM from cycling between sub 0 and positive is pretty much impossible.





















Overall I must say I'm excited to try run VEGA on LN2 seriously once we get some proper drivers for the cards and I get more LN2.


The only score I saved from the session: http://www.3dmark.com/3dm/21393617

Sunday, April 23, 2017

Notes on modding HD7990's BIOS

Sharing is caring so here's some info of questionable quality on HD 7990 BIOS modding. I haven't tested any of the modifications yet and even it I did the usual applies to BIOS modding. If you kill your stuff due to a mod gone wrong it's not my fault so mod responsibly.

The HD 7990 Buildzoid edition. Since this photo I've replaced all the thermal pads. Changed thermal paste and changed the mounting hardware. I still haven't done the cap mods. In it current state it did this score:

 So it's already pretty fast. However pretty fast is just not enough. Ideally I would like to push the card beyond 1200MHz core maybe all the way to 1300MHz. Now I haven't really tested the limits of this card. Really that 5.9K Heaven score was me taking it "easy" during some late night testing. I expect to hit the card's power limit pretty hard some time soon so I figured I would look into modding the BIOS for MOAR POWER. Since I started digging in the BIOS I figure that I might as well also look into seeing if I could figure out how memory timings on the card work. Here are the notes on that:

HD 7990 reference
Power limit is set with 2 values
Min power
Max power
these are X% apart from the stock TDP. X% is the overdrive power limit slider

EXP
300W BIOS 20% = 360W max and 240W min
300W BIOS 50% = 450W max and 150W min

Power limits are found in this:


D9 00 05 00 E8 03 58 00 00 80 07 00 10 00 00 02 0A 2C 00 00 69 00 DB 00 05 23 01 0A 00 32 01
42 01 3D 03 00 00 C6 16 00 00 58 01 6D 01 73 01 00 00 A1 01 00 00 B3 00 00 00 77 00 00 00 60
EA 00 00 88 01 20 03 00 00 14 00 40

In this stock 7990 BIOS B3(179W) is the Hi limit and 77(119W) the Lo limit. I assume the current limit is also somewhere in that block. If you want to set a value greater than 255W it would look like this:

D9 00 05 00 E8 03 58 00 00 80 07 00 10 00 00 02 0A 2C 00 00 69 00 DB 00 05 23 01 0A 00 32 01
42 01 3D 03 00 00 C6 16 00 00 58 01 6D 01 73 01 00 00 A1 01 00 00 2C 01 00 00 04 01 00 00 60
EA 00 00 88 01 20 03 00 00 14 00 40

This gives you a 300W high limit and a 260W low limit. Which I think will translate into 280W stock.
I find that the original 179W maximum power limit of the reference HD 7990 is kinda concerning especially once you consider that the Vcore VRM for each of the 2 GPUs is only 4 phases. On stock settings the maximum current through them works out to only ~149A at 1.2V. I think 200A is probably safe however without a datasheet for the Volterra power stages I would recommend proceeding with extreme caution while carefully monitoring VRM temperatures(GPU-z supports VRM temps for the ref HD 7990).

MEMORY STUFF

Timing Straps
A timing strap is composed of 48 bytes
98 AB 02 = 02AB98 = 1750Mhz
77 71 33 20 00 00 00 00 31 62 7C 47 80 55 11 11 30 A7 1A 07 00 4C 06 01 22 22 9D 00 6C 0F 14 20 6A 89 00 A0 00 00 01 20 19 12 2F 36 48 28 31 15

C4 7A 02 = 027AC4 = 1625MHz
77 71 33 20 00 00 00 00 10 5A 7B 41 80 55 11 11 2E A5 99 06 00 4C 06 01 22 11 9D 00 6C 0F 14 20 6A 89 00 A0 00 00 01 20 17 11 2B 31 42 26 2F 15

F0 49 02 = 0249F0 = 1500MHz
77 71 33 20 00 00 00 00 CE 51 6A 3B 70 55 10 10 2B A2 18 06 00 4A E6 00 22 00 9D 00 64 0E 14 20 6A 89 00 A0 00 00 01 20 15 0F 27 2D 3C 23 2C 14

1C 19 02 = 02191C = 1375MHz
77 71 33 20 00 00 00 00 AD CD 69 37 70 55 0F 10 29 21 98 05 00 4A E5 00 22 EE 1C 00 64 0D 14 20 5A 89 00 A0 00 00 01 20 14 0E 24 2A 38 22 2A 14

48 E8 01 = 01E848 = 1250MHz
77 71 33 20 00 00 00 00 8C C5 58 31 60 55 0F 0F 25 1E 17 05 00 48 C4 00 22 CC 1C 00 5C 0B 14 20 4A 89 00 A0 00 00 01 20 12 0D 20 25 32 1F 26 13

74 B7 01 = 01B774 = 1125MHz
55 51 33 20 00 00 00 00 6B BD 57 2D 60 55 0D 0E 22 9C 96 04 00 28 C3 00 22 BB 1C 00 53 0A 14 20 BA 88 00 A0 00 00 01 20 10 0C 1E 22 2E 1D 23 12

A0 86 01 = 0186A0 = 1000MHz
55 51 33 20 00 00 00 00 29 B5 46 27 50 55 0C 0D 1E 99 05 04 00 26 A2 00 22 AA 1C 00 4B 08 14 20 AA 88 00 A0 00 00 01 20 0E 0A 1A 1E 28 1A 1F 11

90 5F 01 = 015F90 = 900MHz
55 51 33 20 00 00 00 00 29 31 46 24 50 55 0C 0D 1C 18 A5 03 00 26 A1 00 22 AA 1C 00 4B 07 14 20 9A 88 00 A0 00 00 01 20 0D 0A 18 1B 25 19 1D 11

80 38 01 = 013880 = 800MHz
55 51 33 20 00 00 00 00 E7 AC 35 20 50 55 0B 0D 1A 97 34 03 00 24 81 00 22 AA 1C 00 4B 06 14 20 9A 88 00 A0 00 00 01 20 0C 08 15 19 21 18 1B 11

40 9C 00 = 009C40 = 400MHz
33 31 33 20 00 00 00 00 84 94 22 10 F0 54 09 06 0F 0B A2 01 00 23 80 00 22 AA 1C 00 12 01 14 20 8A 88 00 A0 00 00 01 20 06 05 0B 0C 11 0C 10 0D

Well that's all for this. Hopefully you find this useful for your own BIOS modding. Though to be completely honest I mostly wanted to post this so I would have an easily accessible online backup of my BIOS modding notes because I keep forgetting how I did stuff the last time I modded a certain BIOS. I will probably be posting more notes like this for other cards.

Thursday, April 13, 2017

Some Ryzen power draw data

Ok so I finally have a operational Ryzen system and while the BCLK controls in the BIOS are being stupid I can still do other testing so here is that other testing.

The goal of this data is to figure out how the power draw of Ryzen is split between the cores and everything else like the SOC. For my testing I'm using Asrock's X370 Taichi and a Ryzen 7 1700. I'm not 100% sure how the 12V of the single 8pin CPU power connector is distributed but for the most part it doesn't matter because the focus here is figuring out core power draw in order to be able to gauge Vcore VRM requirements for overclocking.

So here is the data! Do note it is rather rough as far as error margin goes because CPU temperature was not maintained across test runs and higher temperatures do lead to elevated power draw. I didn't bother with more accurate measurement methods because chip to chip variance will cause larger power draw discrepancies than my measurement methods for this data.

Clock Voltage Core Config Power Draw
3.95Ghz 1.45V 4+4 170W
3.95Ghz 1.45V 3+3 135W
3.95Ghz 1.45V 2+2 100W
3.95Ghz 1.45V 1+1 60W
3.95Ghz 1.45V 4+0 100W
3.95Ghz 1.45V 3+0 75W
3.95Ghz 1.45V 2+0 65W
All tests were done with SMT turned on.

From this we can see that 1 core with SMT at 3.95GHz 1.45V consumes roughly 17W. The other things hooked to the 8pin connector pull a constant 30W. I suspect that most of this 30W would be the SOC portion of a Ryzen CPU.

This means that for an 8 core Ryzen chip Vcore current draw at 4Ghz/1.45V is only about 96A. 6 core CPUs would be about 72A and 4 cores only about 48A. Basically that means the Vcore VRM current through put required for your motherboard to not go up in flames is absolutely minuscule. Basically any AM4 motherboard should be capable of doing 4Ghz or more on 6 core CPUs. Motherboards with good 4 phase VRM designs should also have no issue pushing 4Ghz on 8 core CPUs.

Now of course this is only looking at current capability. Better VRMs also have better voltage regulation as well as current through put which means that they may clock a little better for any given voltage just because it won't fluctuate as much as it does on weaker VRMs. However it does mean that for daily OCs you really don't need the insane VRMs that come on boards like the Gigabyte Gaming K7, Asrock X370 Taichi/Professional Gaming or Asus Crosshair 6 Hero.

Also thanks to all the Patreons and shirt buyers for making this article possible!