Message boards :
News :
Modification to bounded app
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 25 Feb 13 Posts: 216 Credit: 9,899,302 RAC: 0 |
|
Send message Joined: 2 Dec 14 Posts: 3 Credit: 388,326 RAC: 0 |
Ding! We have a winner here with almost 103 hours CPU Time. http://numberfields.asu.edu/NumberFields/result.php?resultid=9788343 The second one took "only" 60 hours. By the way, I don't know if the problem is related to the underutilized CPU one you referred to. As far as I could see it, this unit used one full core all the time and all my other cores were busy too. |
Send message Joined: 25 Feb 13 Posts: 216 Credit: 9,899,302 RAC: 0 |
Nice. Eric, can you give me the informations there standing in the .out? Seams like the app restarted the unit some times: Signature = [2,0] a11 = 1 a12 = 2 sig1a1 = -8.774964387392122060406388307 sig2a1 = 8.774964387392122060406388307 Ca1_pre = 30.800000 Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF77-0_Idx6_Grp44261of111118_0_0 Now starting the Martinet search: Doing case a5 = 11 + -3w... 2nd part of Martinet bound = 18.852711. Martinet bound = 49.652711. a22_L = 2. a22_U = 2. a22 = 2. a21_L = 31. a21_U = 31. a21 = 31. a32_L = -6. a32_U = 24. Reading checkpoint file. Checkpoint Flag = 1. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 546,800,544 RAC: 634,058 |
Ding! Ok. I manually granted you credit for the WU that went over the 100 hour cap. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 546,800,544 RAC: 634,058 |
Nice. Eric, can you give me the informations there standing in the .out? Just to clarify... the app doesn't restart the WU. It may have been restarted for a number of reasons, including the user restarting their computer. What's important is the checkpoint mechanism is functioning as expected. |
Send message Joined: 2 Dec 14 Posts: 3 Credit: 388,326 RAC: 0 |
Thank you Eric! What I also saw with this WU was that it was once standing at 53% for a long time (40 hours or so), then jumping back to 28% and from there crawling to the finish line. Don't know if it helps to find reason for these longrunners. |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
All of my bounded discriminant work units report memory leaks. |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
A memory leak is a bug where a program loses track of memory that it has obtained and therefore cannot release it. This wastes the leaked memory while the program runs. When the leaky program closes, a modern operating system that employs protected memory will free the leaked memory. See https://en.wikipedia.org/wiki/Memory_leak for more information on memory leaks. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 546,800,544 RAC: 634,058 |
Thank you Eric! That's very strange. The progress meter should never go backward. Looking at the stderr, it appears that it tried to read the checkpoint file at one point but failed for some reason, and started the search from the beginning; maybe this explains the progress going backwards. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 546,800,544 RAC: 634,058 |
A memory leak is a bug where a program loses track of memory that it has obtained and therefore cannot release it. This wastes the leaked memory while the program runs. When the leaky program closes, a modern operating system that employs protected memory will free the leaked memory. See https://en.wikipedia.org/wiki/Memory_leak for more information on memory leaks. I really doubt there is a true memory leak. I only allocate a small number of variables; and these are all freed before the program ends. However, it is in the realm of possibility that one of the libraries I am using has a memory leak. But like you say, this should be freed when the program ends. Also note, these memory leak messages only happen with the windows app. |