Message boards :
Number crunching :
Underutilized CPU
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 1 Jul 12 Posts: 13 Credit: 2,099,843 RAC: 0 ![]() |
So, having got those errors on Get Decics, I switched to Bounded tasks. These run to completion but the CPU % is sometimes in single figures, often around the 50% mark and almost never above 90% (according to me sitting there watching Activity Monitor on and off all day, mostly off). its all a little odd. |
![]() Send message Joined: 8 Jul 11 Posts: 1362 Credit: 598,742,792 RAC: 771,135 ![]() ![]() ![]() |
So, having got those errors on Get Decics, I switched to Bounded tasks. So you are only seeing this on your Mac? Could it be related to the preferences that say "use at most x% of cpu time" or "suspend work when CPU usage is above x%"? And there may be other configuarbles that I am forgetting about. |
![]() Send message Joined: 1 Jul 12 Posts: 13 Credit: 2,099,843 RAC: 0 ![]() |
So, having got those errors on Get Decics, I switched to Bounded tasks. I have 2 MacBooks - I am seeing it on one but not the other. CPU time and usage are both set to 100% I have rebooted and it still only runs around 50%. |
![]() Send message Joined: 8 Jul 11 Posts: 1362 Credit: 598,742,792 RAC: 771,135 ![]() ![]() ![]() |
So, having got those errors on Get Decics, I switched to Bounded tasks. That's a mystery. I also noticed that your runtimes were several factors higher than what I would expect (most SF77 bounded app WUs should be taking ~1 hour). So your CPU does seem to be throttled. Last time that happened to me, my CPU was over-heating and the builtin safe guards were kicking in. |
Send message Joined: 19 Aug 11 Posts: 8 Credit: 253,606,519 RAC: 0 ![]() |
I've been having problems with my Linux hosts running Bounded recently. After digging in to it a bit this evening, it seems to be related to Multi-tasking and something the WU's are sharing causing the individual tasks to block, waiting for something one of the other WU's is locking. On my 8-32 core boxes, I've had a pile of WU's where they sit there idling with a Run State of "D" (Uninterruptable wait, probably IO) and the group of WU's together will only add up to a single full core worth of CPU utilization. If I abort half of them, the remaining ones that are running double their actual CPU utilization, but still leave tons of idle cores. For now, I'm going to limit the systems to only running a single WU at a time, but I wonder what could be the cause of this behavior so I can get back to running full out. :) I've ended up cancelling lots of WU's that have run for *days* and the stats report significantly less CPU time consumed than Wall time. Here's one example: http://numberfields.asu.edu/NumberFields/workunit.php?wuid=9092322 I'm also going to try switching to the Unbounded to see if this affects both Linux apps. |
![]() Send message Joined: 8 Jul 11 Posts: 1362 Credit: 598,742,792 RAC: 771,135 ![]() ![]() ![]() |
I've been having problems with my Linux hosts running Bounded recently. After digging in to it a bit this evening, it seems to be related to Multi-tasking and something the WU's are sharing causing the individual tasks to block, waiting for something one of the other WU's is locking. On my 8-32 core boxes, I've had a pile of WU's where they sit there idling with a Run State of "D" (Uninterruptable wait, probably IO) and the group of WU's together will only add up to a single full core worth of CPU utilization. If I abort half of them, the remaining ones that are running double their actual CPU utilization, but still leave tons of idle cores. For now, I'm going to limit the systems to only running a single WU at a time, but I wonder what could be the cause of this behavior so I can get back to running full out. :) I've ended up cancelling lots of WU's that have run for *days* and the stats report significantly less CPU time consumed than Wall time. Here's one example: Thanks for your analysis! The recent upgrade of the bounded app included an upgrade of the pari library. In this upgrade, the default factoring routine writes intermediate results to a temporary file. I have always disliked this factoring method, because of this dependence on file I/O. I'm wondering if this is causing some kind of contention between WUs. This file dependent factoring method is bypassed in the unbounded app (still using the older pari). So it will be interesting to see if you get better cpu usage with the unbounded app. In the meantime, I will accelerate my plans to reconfigure the current factoring algorithm for the bounded app. |
Send message Joined: 19 Aug 11 Posts: 8 Credit: 253,606,519 RAC: 0 ![]() |
Sounds like you've got it. Looking at some of the systems still running Bounded, the sar activity shows 99-100% disk activity with 6+ running. 92% on an 8 core system running 4 Bounded. Those running UnBounded or other projects (even VM based ones) report negligible disk activity. So, I think I'll simply limit to one, maybe two Bounded at a time with an app_config.xml and let Unbounded consume the rest. :) |
![]() Send message Joined: 8 Jul 11 Posts: 1362 Credit: 598,742,792 RAC: 771,135 ![]() ![]() ![]() |
Sounds like you've got it. Looking at some of the systems still running Bounded, the sar activity shows 99-100% disk activity with 6+ running. 92% on an 8 core system running 4 Bounded. Those running UnBounded or other projects (even VM based ones) report negligible disk activity. So, I think I'll simply limit to one, maybe two Bounded at a time with an app_config.xml and let Unbounded consume the rest. :) Thanks for taking the time to investigate. I should have some time on Friday to make the modifications to the bounded app. Stay tuned... |
![]() Send message Joined: 8 Jul 11 Posts: 1362 Credit: 598,742,792 RAC: 771,135 ![]() ![]() ![]() |
I made the necessary changes to turn off the file I/O inside pari. It will be interesting to see if this fixes all the under-utilized cpu issues that people have been having. I only needed to do this for the linux and mac apps. It turns out that the windows app is still using the older version of pari and does not suffer from the disk access problem. However, I still plan to upgrade the windows app in the near future (after I return from Xmas vacation). |
Send message Joined: 19 Aug 11 Posts: 8 Credit: 253,606,519 RAC: 0 ![]() |
OK, I've finished up all my older Bounded. I'll re-enable Bounded and watch a few systems. Thank Eric! |
Send message Joined: 19 Aug 11 Posts: 8 Credit: 253,606,519 RAC: 0 ![]() |
Yup. Bounded definitely looks a lot better. Thanks again for the tweak Eric! |
![]() Send message Joined: 8 Jul 11 Posts: 1362 Credit: 598,742,792 RAC: 771,135 ![]() ![]() ![]() |
Yup. Bounded definitely looks a lot better. Thanks again for the tweak Eric! I should be the one thanking you! You helped to find and test this problem. |