Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/14gen #56

Merged
merged 7 commits into from
Nov 17, 2024
Merged

Feature/14gen #56

merged 7 commits into from
Nov 17, 2024

Conversation

scline
Copy link

@scline scline commented Mar 22, 2023

This adds a few fixes for Dell 14gen servers running idrac 3.30.30.30. Dell 640/740 series.

  • Remove 3rd party PCIe settings since this platform does not support it
  • Update CPU temp values as the IPMI output is a bit different. cpu_1 temp cpu_2 temp --> 01 25 02 28

Example Output on a Dell r740

docker logs Dell_iDRAC_fan_controller
iDRAC/IPMI host: local
Fan speed objective: 25%
CPU temperature threshold: 30°C
Check interval: 10s

                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
22-03-2023 00:11:22   22°C   28°C   29°C     24°C     User static fan control profile (25%)                                              Enabled  CPU temperature decreased and is now OK (<= 30°C), user's fan control profile applied.
22-03-2023 00:11:32   22°C   29°C   29°C     24°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:11:42   22°C   28°C   29°C     24°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:11:52   22°C   28°C   29°C     24°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:12:03   22°C   28°C   29°C     24°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:12:12   22°C   28°C   29°C     25°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:12:22   22°C   28°C   29°C     25°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:12:32   22°C   28°C   29°C     25°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:12:42   22°C   28°C   29°C     25°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:12:52   22°C   48°C   42°C     25°C  Dell default dynamic fan control profile                                              Enabled  CPU 1 and CPU 2 temperatures are too high, Dell default dynamic fan control profile applied for safety
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
22-03-2023 00:13:02   22°C   50°C   43°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:13:12   22°C   51°C   44°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:13:22   22°C   52°C   45°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:13:32   22°C   53°C   46°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:13:42   22°C   53°C   46°C     26°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:13:52   22°C   54°C   46°C     26°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:14:02   22°C   54°C   47°C     27°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:14:12   22°C   55°C   47°C     27°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:14:22   22°C   55°C   47°C     28°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:14:32   22°C   35°C   33°C     28°C  Dell default dynamic fan control profile                                              Enabled   -
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
22-03-2023 00:14:42   22°C   34°C   32°C     28°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:14:52   22°C   32°C   32°C     27°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:15:02   22°C   32°C   31°C     27°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:15:12   22°C   31°C   31°C     27°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:15:22   22°C   30°C   30°C     26°C     User static fan control profile (25%)                                              Enabled  CPU temperature decreased and is now OK (<= 30°C), user's fan control profile applied.
22-03-2023 00:15:32   22°C   30°C   30°C     26°C     User static fan control profile (25%)                                              Enabled   -
22-03-2023 00:15:42   22°C   49°C   46°C     25°C  Dell default dynamic fan control profile                                              Enabled  CPU 1 and CPU 2 temperatures are too high, Dell default dynamic fan control profile applied for safety
22-03-2023 00:15:52   22°C   53°C   54°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:16:02   22°C   56°C   57°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:16:12   22°C   58°C   59°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
22-03-2023 00:16:22   22°C   59°C   61°C     25°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:16:33   22°C   60°C   62°C     26°C  Dell default dynamic fan control profile                                              Enabled   -
22-03-2023 00:16:42   22°C   61°C   63°C     27°C  Dell default dynamic fan control profile                                              Enabled   -

Docker Run Command

docker run -d \
  --name Dell_iDRAC_fan_controller \
  --restart=unless-stopped \
  -e IDRAC_HOST=local \
  -e FAN_SPEED=25 \
  -e CPU_TEMPERATURE_THRESHOLD=30 \
  -e CHECK_INTERVAL=10 \
  -e DELL_14_GEN=true \
  --device=/dev/ipmi0:/dev/ipmi0:rw \
  test

@scline
Copy link
Author

scline commented Mar 22, 2023

For #55

@tigerblue77
Copy link
Owner

tigerblue77 commented Jan 20, 2024

Hello,
thanks for your PR !
I just reviewed it and would like to merge it but first I'd like it to be improved by removing the "DELL_14_GEN" environment variable and replacing it with an automated check. For example, the check could use dmidecode output (source).

@tigerblue77 tigerblue77 marked this pull request as draft January 20, 2024 20:51
@tigerblue77 tigerblue77 self-assigned this Jan 20, 2024
@tigerblue77 tigerblue77 added enhancement New feature or request good first issue Good for newcomers Needs reviews/tests labels Jan 20, 2024
@tigerblue77 tigerblue77 linked an issue Jan 20, 2024 that may be closed by this pull request
@mfoti
Copy link

mfoti commented Jan 28, 2024

I'm on iDrac 6.10.80.00, I guess I should downgrade to 3.30.30.30. Is it safe to do?

@tigerblue77
Copy link
Owner

tigerblue77 commented Jan 28, 2024

I'm on iDrac 6.10.80.00, I guess I should downgrade to 3.30.30.30. Is it safe to do?

I don't know if downgrade is safe but, according to PR's initial comment :

running idrac 3.30.30.30 or older. Dell 640/740 series

If your server is a Dell PowerEdge from x40 series, I expect it to work. Let us know.

@mfoti
Copy link

mfoti commented Jan 28, 2024

I will start the tour. God bless the queen

@tigerblue77
Copy link
Owner

@scline , could you tell us :

  • which server have you tested your PR on ?
  • what firmware version were/are they running ?

@mfoti
Copy link

mfoti commented Jan 28, 2024

I will start the tour. God bless the queen

I will stop my test for this reasons:

  • I'm on an UnRaid r740xd server
  • with tweaks plugin I schedule "Power Save" mode between 01.30am to 07.30am
  • downgrading from fw 6.10.80.00 to firmware5.10.50.00 allows me to go from ~9k rpm (30% min) to ~7k rpm (24% min)
  • in "Power Save" with ~7k rpm I have 55° C temp

I don't want to have an hotter system

@tigerblue77
Copy link
Owner

tigerblue77 commented Jan 28, 2024

@mfoti, thanks for your feedback but can you detail a bit more ?
Are you confirming this PR worked on your R740XD with iDRAC in v5.10.50.00 ? Or the current "main" branch did the trick ?

@mfoti
Copy link

mfoti commented Jan 28, 2024

nothing of them has worked, I had this error:

Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x30 rsp=0xd4): Insufficient privilege level
Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x30 rsp=0xd4): Insufficient privilege level

I had 47° C in Power Save with the min of 30% of the 6.x firmware, my goal was to reach 55° C with this tool. Downgrading iDrac to 5.5 allows me to set 24% as minimum that gives me 55°, so nothing more is needed (in my case of course, with some external card that puts fans at 70% this will be a great solution, but I can't test them)

@tigerblue77
Copy link
Owner

tigerblue77 commented Jan 28, 2024

OK I get, thanks. I'd prefer to be up-to-date and use this Docker container but up to you. Also, your help improving and fixing it would be welcome as I "only" have a R720XD to test.

About your error, you could try adding --cap-add SYS_RAWIO to your docker run command. Hope it helps

@mfoti
Copy link

mfoti commented Jan 28, 2024

Screenshot 2024-01-28 at 22 01 49

with "main" branch
Screenshot 2024-01-28 at 22 05 44

with "this" Dell_iDRAC_fan_controller.sh
Screenshot 2024-01-28 at 22 22 05

seems same

@tigerblue77
Copy link
Owner

@mfoti, thanks for your feedback. Can you just specify the firmware you were using during these tests ?

I can't figure out if setting fan speed is really no longer supported by iDRAC > 3.30.30.30 or if @scline has actually found a working solution/workaround. Without an answer from him and without access to a machine of this generation for testing, it seems very complicated to me to help you guys... 😕

By any chance, would you agree to give me access to the machine or to carry out tests by calling and sharing screens on Discord for example? I don't want to bother you, but it's the only solution I can think of...

@scline
Copy link
Author

scline commented Jan 29, 2024

Hey good afternoon, sorry missed a few of these messages.

I did verify when I originally set this PR the iDrac did have to be on 3.30.30.30 otherise this ends up not working. The commands to alter FAN settings via IPMI are blocked or do not exsist.

I am currently using this on Dell r640 and have tested this on (no longer online) Dell r740.

docker ps
CONTAINER ID   IMAGE          COMMAND                  CREATED        STATUS       PORTS     NAMES
63fb4f2a0445   fan_controll   "/Dell_iDRAC_fan_con…"   3 months ago   Up 2 weeks             Dell_iDRAC_fan_controller

Docker Run command:

docker run -d --name Dell_iDRAC_fan_controller --restart=unless-stopped -e IDRAC_HOST=local -e FAN_SPEED=20 -e CPU_TEMPERATURE_THRESHOLD=70 -e CHECK_INTERVAL=5 -e DELL_14_GEN=true --device=/dev/ipmi0:/dev/ipmi0:rw fan_controll

Some log examples

                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
22-11-2023 03:24:29   24°C   43°C   45°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:24:34   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:24:39   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:24:44   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:24:49   24°C   45°C   49°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:24:54   24°C   45°C   49°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:24:59   24°C   42°C   43°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:04   24°C   44°C   43°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:09   24°C   44°C   43°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:14   24°C   43°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
22-11-2023 03:25:19   24°C   43°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:24   24°C   43°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:29   24°C   43°C   48°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:34   24°C   43°C   48°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:39   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:44   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:49   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:54   24°C   42°C   44°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:25:59   24°C   42°C   43°C     38°C     User static fan control profile (20%)                                              Enabled   -
22-11-2023 03:26:04   24°C   42°C   43°C     38°C     User static fan control profile (20%)                                              Enabled   -

@scline
Copy link
Author

scline commented Jan 29, 2024

Screenshot 2024-01-29 at 12 04 42 PM
Screenshot 2024-01-29 at 12 06 00 PM

@scline
Copy link
Author

scline commented Jan 29, 2024

This is what things look like when the container is stopped, 33% vs 20% jump
Screenshot 2024-01-29 at 12 08 46 PM

@tigerblue77
Copy link
Owner

tigerblue77 commented Jan 30, 2024

@scline Thanks for your details.

when I originally set this PR the iDrac did have to be on 3.30.30.30 otherise this ends up not working
Can you then edit your original post to remove "or older" please ?

I'll make a few modifications to your PR before merging it. What do you think about :

  • adding an automatic PowerEdge generation check via IPMI commands and filter on *40 to detect 14 gen (and *50 for 15 gen ?)
  • adding firmware version check, if we're in on a 14 gen, as a precondition to container startup
  • edit/update documentation accordingly

@tigerblue77 tigerblue77 linked an issue Jan 31, 2024 that may be closed by this pull request
@scline
Copy link
Author

scline commented Jul 23, 2024

Just a note, you are not able to roll back to 3.30.30.30 if you update idrac to the latest versions now. 7.0.0.121 I believe breaks such a rollback.

@bitspill
Copy link

Just a note, you are not able to roll back to 3.30.30.30 if you update idrac to the latest versions now. 7.0.0.121 I believe breaks such a rollback.

Did you try iterating in smaller steps? I don't remember which version I started at but in May I went down the line 7.0.0.0 then the highest 6.x then 6.0.0.0 then the highest 5.x then 5.0.0.0 etc until I got back to 3.30.30.30

@scline
Copy link
Author

scline commented Jul 24, 2024 via email

@cfelicio
Copy link

Just a note, you are not able to roll back to 3.30.30.30 if you update idrac to the latest versions now. 7.0.0.121 I believe breaks such a rollback.

I just tried, I was on 7.0.0.171, was able to go back to 3.30.30.30. Have to do it in steps, per a reddit post I found:

7.00.00.00 -> 6.10.80.00 -> 6.00.02.00 -> 5.10.50.00 -> 5.00.00.00 -> 4.40.40.00 -> 4.40.10.00 -> 4.00.00.00 -> 3.30.30.30

on other news, I am getting this error on running the script on the R640, I suppose this PR would fix it?

22-08-2024 21:33:49 22°C 1°C 30°C 27°C User static fan control profile (40%) Enabled -
Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0xce rsp=0xc1): Invalid command

@cfelicio
Copy link

Just for the sake of completeness, I built a container with @scline edits, and everything is working good on the R640:

24-08-2024 04:09:18 21°C 51°C 61°C 40°C User static fan control profile (10%) Enabled -
24-08-2024 04:09:28 21°C 46°C 48°C 39°C User static fan control profile (10%) Enabled -

If possible I think committing those would be beneficial to people who want to use the script on 14th gen hardware.

@tigerblue77 tigerblue77 force-pushed the feature/14gen branch 3 times, most recently from fe76dd0 to bdf7a71 Compare November 10, 2024 17:00
@tigerblue77
Copy link
Owner

Hello @Steven-Emers, thanks ! Are you able to test the latest version of the code in this PR/branch after my edits please ? 😉 (or any happy boy with a Gen 14 +)

@bitspill
Copy link

I'm away from my lab until approximately December 3rd, I can get some outputs or test from a 14th gen after that if it's still necessary

@tigerblue77
Copy link
Owner

OK @bitspill, thanks !

OK guys (@mfoti, @cfelicio, @scline, @Qlii256, @blaze756), I've fixed 2 out of 3 prerequisites I set in this comment to get this PR merged. Could some of you help me with the last one please :

  • Test those modifications/improvements on at least 2 hardwares (pre-Gen 14 and Gen 14 or +) and in the 2 modes (local & LAN)

@tigerblue77 tigerblue77 marked this pull request as ready for review November 10, 2024 17:59
@Steven-Emers
Copy link

I can test the code but I cant quite figure out how to get it onto my server just yet. New to Docker and was using the Dockerhub link to download it.

@tigerblue77
Copy link
Owner

tigerblue77 commented Nov 10, 2024

@Steven-Emers

cd /tmp
git clone https://github.com/scline/Dell_iDRAC_fan_controller_Docker.git
cd Dell_iDRAC_fan_controller_Docker/
git checkout feature/14gen
docker build -t tigerblue77/dell_idrac_fan_controller:latest .
docker run...

(use the settings you wish to, based on the related readme file on @scline's Github fork repository)

@Steven-Emers
Copy link

Steven-Emers commented Nov 11, 2024

@tigerblue77

I think I figured it out finally though Im not sure if this is my doing or something is missing but Im getting this error with the new code. I see that the .sh file's directory was changed in the docker file but I'm unsure if this has anything to do with it.

exec ./Dell_iDRAC_fan_controller.sh: no such file or directory

exec ./Dell_iDRAC_fan_controller.sh: no such file or directory

exec ./Dell_iDRAC_fan_controller.sh: no such file or directory

Edit: This is my doing Im just not sure where Im going wrong just yet

@Steven-Emers
Copy link

Steven-Emers commented Nov 12, 2024

@tigerblue77

I got it working. Windows was messing with the .sh files so I converted them to Unix with notepad++
This is the output I see in the log file.

Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment

12-11-2024 00:45:21   25°C    1°C   41°C     36°C     User static fan control profile (10%)                                             Disabled   -

Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0xce rsp=0xc1): Invalid command

12-11-2024 00:45:36   25°C    1°C   41°C     36°C     User static fan control profile (10%)                                             Disabled   -                                Disabled   -

@tigerblue77
Copy link
Owner

Nice, so we have 1 test in R740XD network mode (I guess ?)
Has anyone been able to test this PR ? 😀

@Steven-Emers
Copy link

Nice, so we have 1 test in R740XD network mode (I guess ?) Has anyone been able to test this PR ? 😀

Yes this was with network mode. It looks like the CPU1 temp still isn't working.

Server model: DELL PowerEdge R740xd                

iDRAC/IPMI host:

Fan speed objective: 10%

CPU temperature threshold: 50°C

Check interval: 15s

Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0xce rsp=0xc1): Invalid command

                 ------- Temperatures -------

Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment

12-11-2024 20:15:27   24°C    1°C   38°C     32°C     User static fan control profile (10%)                                             Disabled  CPU temperature decreased and is now OK (<= 50°C), user's fan control profile applied.

Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0xce rsp=0xc1): Invalid command

12-11-2024 20:15:42   24°C    1°C   38°C     33°C     User static fan control profile (10%)                                             Disabled   -

@tigerblue77
Copy link
Owner

My bad, didn't see the error message and "1°C" in your previous message. Would you be able to help me diagnose this tonight (in 2 to 3 hours ?)

@Steven-Emers
Copy link

@tigerblue77
No worries, I can't test tonight however I should be able to free up some time either Friday or over the weekend. Whichever works best

@tigerblue77
Copy link
Owner

@Steven-Emers it should be fixed, can you confirm ? :)

@Steven-Emers
Copy link

Steven-Emers commented Nov 17, 2024

@tigerblue77
Looks like that did it, now reading CPU1 and its also not sending error messages like before. Great work!
I can also confirm it's responding to each CPU temp with the fan speed.

Server model: DELL PowerEdge R740xd                

iDRAC/IPMI host: 

Fan speed objective: 20%

CPU temperature threshold: 60°C

Check interval: 15s

                 ------- Temperatures -------

Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment

17-11-2024 20:44:54   24°C   44°C   38°C     35°C     User static fan control profile (20%)                                                       CPU temperature decreased and is now OK (<= 60°C), user's fan control profile applied.

17-11-2024 20:45:10   24°C   43°C   38°C     34°C     User static fan control profile (20%)                                                        -

17-11-2024 20:45:25   24°C   42°C   37°C     34°C     User static fan control profile (20%)                                                        -

17-11-2024 20:45:40   24°C   42°C   37°C     33°C     User static fan control profile (20%)                                                        -

17-11-2024 20:45:58   24°C   42°C   37°C     32°C     User static fan control profile (20%)                                                        -

@tigerblue77 tigerblue77 merged commit e22efc3 into tigerblue77:master Nov 17, 2024
@tigerblue77
Copy link
Owner

@Steven-Emers thanks for your tests, it's merged !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers Needs reviews/tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changed for Dell rx40 series Does not work on iDRAC 9
6 participants