![]() |
News | Profile | Code | Photography | Looking Glass | Projects | System Statistics | Uncategorized |
Blog |
As I mentioned recently, I'm now running an Intel Core i7-980X processor (6x physical cores, each capable of executing 2x threads via Intel HT, abstracted by the OS as 12x CPUs) in my workstation. It's running great, but I've been lacking the ability to view CPU temperature readings from the operating system (Debian GNU/Linux x86_64 with a slightly modified kernel 2.6.32).
None of the modules for lm-sensors picked up anything on my DX58SO board that would report temperature. The ACPI thermal zone module didn't pick up a thermal zone, and the coretemp module in the Linux kernel whined that it didn't support the CPU model:
[462200.646777] coretemp: Unknown CPU model 2c [462200.646780] coretemp: Unknown CPU model 2c [462200.646782] coretemp: Unknown CPU model 2c [462200.646783] coretemp: Unknown CPU model 2c [462200.646785] coretemp: Unknown CPU model 2c [462200.646786] coretemp: Unknown CPU model 2c [462200.646787] coretemp: Unknown CPU model 2c [462200.646789] coretemp: Unknown CPU model 2c [462200.646790] coretemp: Unknown CPU model 2c [462200.646792] coretemp: Unknown CPU model 2c [462200.646793] coretemp: Unknown CPU model 2c [462200.646795] coretemp: Unknown CPU model 2c
Yes, it apparently went through every virtual CPU (or should we call it a thread, I don't even know anymore!).
I took a guess that the interface didn't change too much in the Gulftown vs. the original Bloomfield line of processors, so I took a peek at coretemp.c, and added in the new model code:
--- coretemp-orig.c 2010-05-09 19:43:06.000000000 -0400 +++ coretemp.c 2010-05-09 19:43:12.000000000 -0400 @@ -455,12 +455,12 @@ /* check if family 6, models 0xe (Pentium M DC), 0xf (Core 2 DC 65nm), 0x16 (Core 2 SC 65nm), 0x17 (Penryn 45nm), 0x1a (Nehalem), 0x1c (Atom), - 0x1e (Lynnfield) */ + 0x1e (Lynnfield), 0x2c (Gulftown) */ if ((c->cpuid_level < 0) || (c->x86 != 0x6) || !((c->x86_model == 0xe) || (c->x86_model == 0xf) || (c->x86_model == 0x16) || (c->x86_model == 0x17) || (c->x86_model == 0x1a) || (c->x86_model == 0x1c) || - (c->x86_model == 0x1e))) { + (c->x86_model == 0x1e) || (c->x86_model == 0x2c))) { /* supported CPU not found, but report the unknown family 6 CPU */
I did a make-kpkg kernel-image to rebuild the kernel image the Debian way, and installed the new kernel (didn't bother rebooting, since only the module changed). It seemed to load fine. I got some warnings in the kernel log buffer after the module was loaded:
[463606.992361] coretemp coretemp.0: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992417] coretemp coretemp.1: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992449] coretemp coretemp.2: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992483] coretemp coretemp.3: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992516] coretemp coretemp.4: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992554] coretemp coretemp.5: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992592] coretemp coretemp.6: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992629] coretemp coretemp.7: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992665] coretemp coretemp.8: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992698] coretemp coretemp.9: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992734] coretemp coretemp.10: Unable to access MSR 0xEE, for Tjmax, left at default [463606.992768] coretemp coretemp.11: Unable to access MSR 0xEE, for Tjmax, left at default
After refining my Google search terms a few times to avoid clothing stores, apparently Tjmax is an abbreviation for Thermal Junction Maximum - essentially the maximum temperature of the processor that will trigger a shutdown. I figured that wouldn't affect current temperature readings, and looked at some of the values:
(destiny:19:49)% sensors coretemp-isa-0000 Adapter: ISA adapter Core 0: +78.0°C (high = +80.0°C, crit = +100.0°C) coretemp-isa-0001 Adapter: ISA adapter Core 1: +73.0°C (high = +80.0°C, crit = +100.0°C) [...] coretemp-isa-000a Adapter: ISA adapter Core 10: +70.0°C (high = +80.0°C, crit = +100.0°C) coretemp-isa-000b Adapter: ISA adapter Core 11: +70.0°C (high = +80.0°C, crit = +100.0°C)
Yikes! 12 values? Apparently coretemp iterates through every CPU (as we saw in the original module loading error messages), regardless if it's a virtual CPU or not. So, we apparently have 12 temperature values. They all seem to vary by a degree or so, too. And yes, I was doing some H.264 encoding at the time I took the snapshot above, normally temperature is around 32°C.
There's apparently a small discussion about this oddity, and how to correctly report CPU core temperatures on multicore processors with HT. Right now, I'm still scratching my head about the fact that the virtual CPUs are all reporting slightly different temperature values. I would have expected odd numbered CPUs to report NULL, or something. Or maybe even just the same value as the previous CPU.
Regardless, I'm now graphing all the threads / virtual CPUs!
Hi Aimon, I recompiled from the Debian kernel source (kernel-source-2.6.32-2.6.32-9). I think there's a newer version of it, now, though:
http://packages.debian.org/squeeze/linux-source-2.6.32
Hope that helps!
- Mark
New comments are currently disabled for this entry.
![]() ![]() ![]() ![]() ![]() |
This HTML for this page was generated in 0.001 seconds. |
Hi, I have same issue. I owuld like to recompile with your patch.. but the coretemp.c from the rpm you reference does not have that code in it.. I grepped it and the lm_sensors code.. no 0x1e or Nahalem anywhere. Where did you get that coretemp.c?
Regards.