Tahiti gpus and memory issues

Message boards : Number crunching : Tahiti gpus and memory issues
Message board moderation

To post messages, you must log in.

AuthorMessage
[H]auntjemima

Send message
Joined: 30 Apr 18
Posts: 11
Credit: 2,344,276
RAC: 0
Message 2720 - Posted: 5 May 2020, 5:39:17 UTC
Last modified: 5 May 2020, 5:44:35 UTC

Just to start off, I am running all Tahiti gpus. I have a 280x, 7950's and 7970's. These cards support OpenCL 1.2.

When running a task, it will quickly use all of my ram (up to 32gb, my highest system). When it runs out of memory it will then use my page file in windows which brings my disk up to 100% and it does this until it crashes the system. I have tested this on three separate systems, same types of cards.

I changed my boinc settings to use no more than 50% of my memory, but it still does the same thing.

Is my card not supported?
ID: 2720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[H]auntjemima

Send message
Joined: 30 Apr 18
Posts: 11
Credit: 2,344,276
RAC: 0
Message 2721 - Posted: 5 May 2020, 5:52:23 UTC

You can use this as a reference. I left it running for a bit, I was "lucky" enough to get some failures before it killed my system.

https://numberfields.asu.edu/NumberFields/result.php?resultid=81380141
ID: 2721 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2725 - Posted: 5 May 2020, 17:30:22 UTC - in response to Message 2720.  

Just to start off, I am running all Tahiti gpus. I have a 280x, 7950's and 7970's. These cards support OpenCL 1.2.

When running a task, it will quickly use all of my ram (up to 32gb, my highest system). When it runs out of memory it will then use my page file in windows which brings my disk up to 100% and it does this until it crashes the system. I have tested this on three separate systems, same types of cards.

I changed my boinc settings to use no more than 50% of my memory, but it still does the same thing.

Is my card not supported?


If I recall, there were others with similar issues on the older AMD GPUs (any card before ~2014).

Running out of memory usually happens during the openCL compiler phase, which usually takes about 10 seconds at the start of every openCL task. This tells me that the openCL compiler is failing. Since the openCL compiler is part of the openCL driver, one solution is to try a different driver version; however, due to the age of the card it will most likely still give you problems, so probably best to not waste time on it. Sorry for the inconvenience!
ID: 2725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[H]auntjemima

Send message
Joined: 30 Apr 18
Posts: 11
Credit: 2,344,276
RAC: 0
Message 2726 - Posted: 5 May 2020, 18:49:38 UTC - in response to Message 2725.  

Thanks Eric, never thought about the driver version causing me headaches. I will try some older versions.
ID: 2726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Coleslaw
Avatar

Send message
Joined: 21 Sep 11
Posts: 9
Credit: 12,715,541
RAC: 151
Message 2730 - Posted: 6 May 2020, 17:38:50 UTC

I had this issue on a recent system build that had an old 5450 and a 6770 in the same system. I had tried two different older v15 drivers. They continue to show this behavior. I noticed they would get all the way to 100% but not finish as well before eventually freezing the entire system up due to eating up every bit of resource they could squeeze out. I find this to be a huge issue as not everyone will be paying attention. System stability should be a concern. If you are aware of this bug, you should limit the work units only certain capable cards. This was not a happy surprise for me.
ID: 2730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2731 - Posted: 6 May 2020, 21:18:52 UTC - in response to Message 2730.  

I had this issue on a recent system build that had an old 5450 and a 6770 in the same system. I had tried two different older v15 drivers. They continue to show this behavior. I noticed they would get all the way to 100% but not finish as well before eventually freezing the entire system up due to eating up every bit of resource they could squeeze out. I find this to be a huge issue as not everyone will be paying attention. System stability should be a concern. If you are aware of this bug, you should limit the work units only certain capable cards. This was not a happy surprise for me.


The windows version is flagged as a beta app, so I would assume people are paying attention, at least when trying for the first time.

As far as limiting work to certain cards, there aren't many mechanisms within the plan class - just openCL version and driver version. For Nvidia there is the compute capability parameter which can be used to restrict the older cards, but there is nothing like this for AMD. In short, there aren't any knobs I can tweak to fix this. This is why it is a beta app.
ID: 2731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2735 - Posted: 8 May 2020, 5:09:09 UTC

Is there any work for AMD cards?
I tried preferences with GPU only but haven't received any work. Just wondering if there was any?

Thx

ID: 2735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2739 - Posted: 8 May 2020, 16:01:12 UTC - in response to Message 2735.  

Is there any work for AMD cards?
I tried preferences with GPU only but haven't received any work. Just wondering if there was any?

Thx


There should be. Make sure you are allowing beta apps and your card is openCL 1.2 capable.
ID: 2739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2743 - Posted: 8 May 2020, 23:07:39 UTC

Ahhh I needed to check the box to allow Beta apps.
Thank you for the reply Eric.

ID: 2743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2746 - Posted: 9 May 2020, 12:40:23 UTC
Last modified: 9 May 2020, 13:17:58 UTC

I'm finding that some of these work units get stuck. Unless running for 5 hrs for 40% completion is right?
I've noticed this on 2 separate pc's with each running a different AMD card (R9 280X & Radeon VII) at various times (but not that long after starting...maybe 5mins) It just gets stuck around 40%. I didn't realise that the 5 hrs wu was still going. I just happened to check it.

This has happened running 2 instances on the Radeon VII card as well. I'm now only running 1 instance and will monitor it.

Edit - I'm giving up with the 280X. A new wu has been running for 44min and it's only 21% complete.
Radeon VII seems ok now running 1 wu @ a time.

ID: 2746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2747 - Posted: 9 May 2020, 16:31:15 UTC - in response to Message 2746.  

I'm finding that some of these work units get stuck. Unless running for 5 hrs for 40% completion is right?
I've noticed this on 2 separate pc's with each running a different AMD card (R9 280X & Radeon VII) at various times (but not that long after starting...maybe 5mins) It just gets stuck around 40%. I didn't realise that the 5 hrs wu was still going. I just happened to check it.

This has happened running 2 instances on the Radeon VII card as well. I'm now only running 1 instance and will monitor it.

Edit - I'm giving up with the 280X. A new wu has been running for 44min and it's only 21% complete.
Radeon VII seems ok now running 1 wu @ a time.


The average run time on my GTX1660 is 7 minutes (15x271 batch), so that should give you an idea of how long it should take. So 44 minutes sounds excessive to me.

As discussed in another thread:
1. Older cards do not work well. Cards produced within the last 3 years should be good.
2. The newer AMD drivers are buggy. Try driver version 19.10.1 or earlier.
ID: 2747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2748 - Posted: 9 May 2020, 22:42:09 UTC

Thanks for the reply Eric.
I'm using 19.9.2 as the newer drivers are indeed buggy. I left the Radeon VII over night running 1 wu. Times vary from 8min to 48min + but the average looks to be about 16 minutes.

ID: 2748 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2749 - Posted: 10 May 2020, 0:42:45 UTC - in response to Message 2748.  

Thanks for the reply Eric.
I'm using 19.9.2 as the newer drivers are indeed buggy. I left the Radeon VII over night running 1 wu. Times vary from 8min to 48min + but the average looks to be about 16 minutes.


I just reviewed the specs on the Radeon VII. That's a pretty nice card, so I would expect it to perform much better. Looking at a few of your tasks, I see a huge discrepancy between cpu time and run time - normally the runtime is just a little more than the cpu time. So something is horribly wrong here. Not sure what it is, but I will think about it.

Chooka - If you look at task manager, what percentage of the cpu is being used by the GPU executable? Also, do you know what the GPU utilization is?
ID: 2749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2750 - Posted: 10 May 2020, 0:52:25 UTC
Last modified: 10 May 2020, 1:00:00 UTC

Hi Eric,

I was messing around with consecutive work unit initially so the first few numbers might look odd. I tested 1, 2 & 3 consecutive wu's. For the past 12 hours I've just been running 1 wu at a time.
I have my CPU usage set to 95% so I'm reserving about 2 cores. My GPU load is about 91% running 1 wu according to GPU-Z and as stated I'm running 19.9.2 drivers.

I often believe that the info in Windows task manager is questionable with regards to the GPU.
GPU-Z says the card is under near full load but Windows task manager says GPU utilization hits 6% max, using 1gig out of my 16gig HMB2.

I could be starving the GPU wu's CPU cores but I doubt it leaving 2 threads free....unless it's like Primegrid where you need an actual core running rather than a thread. I could test running at 50% CPU usage or GPU only.

ID: 2750 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tazzduke

Send message
Joined: 22 Aug 11
Posts: 2
Credit: 10,846,492
RAC: 0
Message 2751 - Posted: 10 May 2020, 5:21:04 UTC - in response to Message 2750.  

Hey Chooka03,

within task manager on Win10,
under the GPU section,
you have 4 graphs,
on the top left one,
click the button that says 3d a drop down box will appear,
click the compute_0 item.

Now it will show you that GPU usage.

Cheers
ID: 2751 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2752 - Posted: 10 May 2020, 5:57:16 UTC - in response to Message 2750.  

Hi Chooka,

I mentioned the task manager, because on linux I have seen the CPU usage as low as 20% and GPU utilization down around 50%. I increased my throughput by running 2 simultaneously. But this was only a work around, and I think I have a similar problem to what you are seeing.

The Nvidia GPUs use almost a full core and run times are much better. It appears the AMD GPUs are taking a longer time to process and this would explain the low CPU usage since the CPU idles while it waits for the GPU to finish.

In the database I see plenty of AMD results with good times which tells me the app is at least working for some people. Here is a partial list of the good cards:
[BOINC|7.14.2][CAL|AMD Radeon (TM) Pro WX 7100 Graphics|1|8192MB||200]
[BOINC|7.14.2][CAL|Radeon RX 580 Series|1|4096MB||200][vbox|5.2.8|0|0]
[BOINC|7.16.5][CAL|Radeon (TM) RX 480 Graphics|1|4096MB||200][vbox|5.2.30|1|1]
Any idea why these cards should work any better than your Radeon VII?
ID: 2752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2758 - Posted: 10 May 2020, 20:02:07 UTC - in response to Message 2751.  

Hey Chooka03,

within task manager on Win10,
under the GPU section,
you have 4 graphs,
on the top left one,
click the button that says 3d a drop down box will appear,
click the compute_0 item.

Now it will show you that GPU usage.

Cheers


Ha! I'd never noticed that drop down box before. Well set to compute 0 is shows....nothing. 0% GPU-Z shows 91% load. Same with compute 1 or any of the other settings.

I looked at my other pc running a Radeon VII. The results are all over the place. CPU run time 265 vs Run time 1050 sec or 11,897.00 sec CPU time vs 11,982.58 sec Run time for 82 credits? :(

Oh well, don't stress about it. I can't see the value in using my GPU here.

ID: 2758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2761 - Posted: 10 May 2020, 23:35:59 UTC

I was just looking over a team mates results with his 2080. (NudgeyNR)
I can see what you mean with the run time and CPU time being close to the same. At least I can see that his run times vary from short to long duration as well although they are much more consistent than mine.
As mentioned before, I see a CPU time of 300 seconds but a run time of 1200 seconds for the same wu.? Strange.

ID: 2761 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 16
Credit: 42,346,902
RAC: 98
Message 2763 - Posted: 11 May 2020, 7:37:33 UTC

Sorry Tazzduke... I worked it out. Mine is on Compute 2. With 1 wu running its showing 83% usage. Using 1 gig of GPU memory.

ID: 2763 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Tahiti gpus and memory issues


Main page · Your account · Message boards


Copyright © 2024 Arizona State University