Underutilized CPU

Message boards : Number crunching : Underutilized CPU
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Steve Hawker*

Send message
Joined: 1 Jul 12
Posts: 13
Credit: 2,099,843
RAC: 0
Message 1162 - Posted: 17 Nov 2014, 19:06:34 UTC

So, having got those errors on Get Decics, I switched to Bounded tasks.

These run to completion but the CPU % is sometimes in single figures, often around the 50% mark and almost never above 90% (according to me sitting there watching Activity Monitor on and off all day, mostly off).

its all a little odd.
ID: 1162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,722,178
RAC: 288,032
Message 1165 - Posted: 17 Nov 2014, 21:10:50 UTC - in response to Message 1162.  

So, having got those errors on Get Decics, I switched to Bounded tasks.

These run to completion but the CPU % is sometimes in single figures, often around the 50% mark and almost never above 90% (according to me sitting there watching Activity Monitor on and off all day, mostly off).

its all a little odd.


So you are only seeing this on your Mac?

Could it be related to the preferences that say "use at most x% of cpu time" or "suspend work when CPU usage is above x%"? And there may be other configuarbles that I am forgetting about.
ID: 1165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Hawker*

Send message
Joined: 1 Jul 12
Posts: 13
Credit: 2,099,843
RAC: 0
Message 1167 - Posted: 17 Nov 2014, 23:57:31 UTC - in response to Message 1165.  

So, having got those errors on Get Decics, I switched to Bounded tasks.

These run to completion but the CPU % is sometimes in single figures, often around the 50% mark and almost never above 90% (according to me sitting there watching Activity Monitor on and off all day, mostly off).

its all a little odd.


So you are only seeing this on your Mac?

Could it be related to the preferences that say "use at most x% of cpu time" or "suspend work when CPU usage is above x%"? And there may be other configuarbles that I am forgetting about.


I have 2 MacBooks - I am seeing it on one but not the other.

CPU time and usage are both set to 100%

I have rebooted and it still only runs around 50%.
ID: 1167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,722,178
RAC: 288,032
Message 1171 - Posted: 18 Nov 2014, 2:34:16 UTC - in response to Message 1167.  

So, having got those errors on Get Decics, I switched to Bounded tasks.

These run to completion but the CPU % is sometimes in single figures, often around the 50% mark and almost never above 90% (according to me sitting there watching Activity Monitor on and off all day, mostly off).

its all a little odd.


So you are only seeing this on your Mac?

Could it be related to the preferences that say "use at most x% of cpu time" or "suspend work when CPU usage is above x%"? And there may be other configuarbles that I am forgetting about.


I have 2 MacBooks - I am seeing it on one but not the other.

CPU time and usage are both set to 100%

I have rebooted and it still only runs around 50%.


That's a mystery. I also noticed that your runtimes were several factors higher than what I would expect (most SF77 bounded app WUs should be taking ~1 hour). So your CPU does seem to be throttled. Last time that happened to me, my CPU was over-heating and the builtin safe guards were kicking in.
ID: 1171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mumps [MM]

Send message
Joined: 19 Aug 11
Posts: 8
Credit: 252,901,489
RAC: 7,801
Message 1194 - Posted: 17 Dec 2014, 1:31:58 UTC

I've been having problems with my Linux hosts running Bounded recently. After digging in to it a bit this evening, it seems to be related to Multi-tasking and something the WU's are sharing causing the individual tasks to block, waiting for something one of the other WU's is locking. On my 8-32 core boxes, I've had a pile of WU's where they sit there idling with a Run State of "D" (Uninterruptable wait, probably IO) and the group of WU's together will only add up to a single full core worth of CPU utilization. If I abort half of them, the remaining ones that are running double their actual CPU utilization, but still leave tons of idle cores. For now, I'm going to limit the systems to only running a single WU at a time, but I wonder what could be the cause of this behavior so I can get back to running full out. :) I've ended up cancelling lots of WU's that have run for *days* and the stats report significantly less CPU time consumed than Wall time. Here's one example:

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=9092322

I'm also going to try switching to the Unbounded to see if this affects both Linux apps.
ID: 1194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,722,178
RAC: 288,032
Message 1195 - Posted: 17 Dec 2014, 6:02:53 UTC - in response to Message 1194.  

I've been having problems with my Linux hosts running Bounded recently. After digging in to it a bit this evening, it seems to be related to Multi-tasking and something the WU's are sharing causing the individual tasks to block, waiting for something one of the other WU's is locking. On my 8-32 core boxes, I've had a pile of WU's where they sit there idling with a Run State of "D" (Uninterruptable wait, probably IO) and the group of WU's together will only add up to a single full core worth of CPU utilization. If I abort half of them, the remaining ones that are running double their actual CPU utilization, but still leave tons of idle cores. For now, I'm going to limit the systems to only running a single WU at a time, but I wonder what could be the cause of this behavior so I can get back to running full out. :) I've ended up cancelling lots of WU's that have run for *days* and the stats report significantly less CPU time consumed than Wall time. Here's one example:

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=9092322

I'm also going to try switching to the Unbounded to see if this affects both Linux apps.


Thanks for your analysis!

The recent upgrade of the bounded app included an upgrade of the pari library. In this upgrade, the default factoring routine writes intermediate results to a temporary file. I have always disliked this factoring method, because of this dependence on file I/O. I'm wondering if this is causing some kind of contention between WUs.

This file dependent factoring method is bypassed in the unbounded app (still using the older pari). So it will be interesting to see if you get better cpu usage with the unbounded app.

In the meantime, I will accelerate my plans to reconfigure the current factoring algorithm for the bounded app.
ID: 1195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mumps [MM]

Send message
Joined: 19 Aug 11
Posts: 8
Credit: 252,901,489
RAC: 7,801
Message 1198 - Posted: 18 Dec 2014, 3:59:59 UTC

Sounds like you've got it. Looking at some of the systems still running Bounded, the sar activity shows 99-100% disk activity with 6+ running. 92% on an 8 core system running 4 Bounded. Those running UnBounded or other projects (even VM based ones) report negligible disk activity. So, I think I'll simply limit to one, maybe two Bounded at a time with an app_config.xml and let Unbounded consume the rest. :)
ID: 1198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,722,178
RAC: 288,032
Message 1199 - Posted: 18 Dec 2014, 8:25:23 UTC - in response to Message 1198.  
Last modified: 18 Dec 2014, 8:25:52 UTC

Sounds like you've got it. Looking at some of the systems still running Bounded, the sar activity shows 99-100% disk activity with 6+ running. 92% on an 8 core system running 4 Bounded. Those running UnBounded or other projects (even VM based ones) report negligible disk activity. So, I think I'll simply limit to one, maybe two Bounded at a time with an app_config.xml and let Unbounded consume the rest. :)


Thanks for taking the time to investigate.

I should have some time on Friday to make the modifications to the bounded app. Stay tuned...
ID: 1199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,722,178
RAC: 288,032
Message 1210 - Posted: 22 Dec 2014, 7:44:27 UTC - in response to Message 1199.  

I made the necessary changes to turn off the file I/O inside pari. It will be interesting to see if this fixes all the under-utilized cpu issues that people have been having.

I only needed to do this for the linux and mac apps. It turns out that the windows app is still using the older version of pari and does not suffer from the disk access problem. However, I still plan to upgrade the windows app in the near future (after I return from Xmas vacation).
ID: 1210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mumps [MM]

Send message
Joined: 19 Aug 11
Posts: 8
Credit: 252,901,489
RAC: 7,801
Message 1211 - Posted: 27 Dec 2014, 19:24:53 UTC

OK, I've finished up all my older Bounded. I'll re-enable Bounded and watch a few systems. Thank Eric!
ID: 1211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mumps [MM]

Send message
Joined: 19 Aug 11
Posts: 8
Credit: 252,901,489
RAC: 7,801
Message 1218 - Posted: 7 Jan 2015, 4:15:51 UTC

Yup. Bounded definitely looks a lot better. Thanks again for the tweak Eric!
ID: 1218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,722,178
RAC: 288,032
Message 1219 - Posted: 7 Jan 2015, 5:59:20 UTC - in response to Message 1218.  

Yup. Bounded definitely looks a lot better. Thanks again for the tweak Eric!


I should be the one thanking you! You helped to find and test this problem.
ID: 1219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Underutilized CPU


Main page · Your account · Message boards


Copyright © 2024 Arizona State University