Message boards :
Number crunching :
Long running wu_Qsqrt421_DS1x5 units - how long to let them run?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
After 44 days my last long running wu (task #13680649) ended up in a computation error. :-( Sorry to hear that. It looks like there was a problem acquiring the lock file and not with the app. Either way, I granted you credit for the lost cpu cycles. Hopefully one of your wing men can finish it. |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
I think that I will need a time extension on result 15014223 which is part of work unit 12346684. My stderr is below: Checkpoint Flag = 0. Cvec Starting Index = 0. N1 Start = 0. N2 Start = 0. k1 Start = 0. k2 Start = 0. PolyCount starting value = 0. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 0 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat K = x^2 - 421 S = [2] Disc Bound = 16 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -54 N2_MAX = -54 N1_MIN = -502 N1_MAX = 447 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0 Now starting the targeted Martinet search: N2_L = -54. N2_U = -54. N2 = -54. N1_L = -502. N1_U = 447. N1 = -502. N1 = -501. N1 = -500. N1 = -499. N1 = -498. N1 = -497. N1 = -496. N1 = -495. N1 = -494. N1 = -493. N1 = -492. N1 = -491. N1 = -490. N1 = -489. N1 = -488. N1 = -487. N1 = -486. N1 = -485. N1 = -484. N1 = -483. N1 = -482. N1 = -481. N1 = -480. N1 = -479. N1 = -478. N1 = -477. N1 = -476. N1 = -475. N1 = -474. N1 = -473. N1 = -472. N1 = -471. N1 = -470. N1 = -469. N1 = -468. N1 = -467. N1 = -466. N1 = -465. N1 = -464. N1 = -463. N1 = -462. N1 = -461. N1 = -460. N1 = -459. N1 = -458. N1 = -457. N1 = -456. N1 = -455. N1 = -454. N1 = -453. N1 = -452. N1 = -451. N1 = -450. N1 = -449. N1 = -448. N1 = -447. N1 = -446. N1 = -445. N1 = -444. N1 = -443. N1 = -442. N1 = -441. N1 = -440. N1 = -439. N1 = -438. N1 = -437. N1 = -436. N1 = -435. N1 = -434. N1 = -433. N1 = -432. N1 = -431. N1 = -430. N1 = -429. N1 = -428. N1 = -427. N1 = -426. N1 = -425. N1 = -424. N1 = -423. N1 = -422. N1 = -421. N1 = -420. N1 = -419. N1 = -418. N1 = -417. N1 = -416. N1 = -415. N1 = -414. N1 = -413. N1 = -412. N1 = -411. N1 = -410. N1 = -409. N1 = -408. N1 = -407. N1 = -406. N1 = -405. N1 = -404. N1 = -403. N1 = -402. N1 = -401. N1 = -400. N1 = -399. N1 = -398. N1 = -397. N1 = -396. N1 = -395. N1 = -394. N1 = -393. N1 = -392. N1 = -391. N1 = -390. N1 = -389. N1 = -388. N1 = -387. N1 = -386. N1 = -385. N1 = -384. N1 = -383. N1 = -382. N1 = -381. N1 = -380. N1 = -379. N1 = -378. N1 = -377. N1 = -376. N1 = -375. N1 = -374. N1 = -373. N1 = -372. N1 = -371. N1 = -370. N1 = -369. N1 = -368. N1 = -367. N1 = -366. N1 = -365. N1 = -364. N1 = -363. N1 = -362. N1 = -361. N1 = -360. N1 = -359. N1 = -358. N1 = -357. N1 = -356. N1 = -355. N1 = -354. N1 = -353. N1 = -352. N1 = -351. N1 = -350. N1 = -349. N1 = -348. N1 = -347. N1 = -346. N1 = -345. N1 = -344. N1 = -343. N1 = -342. N1 = -341. N1 = -340. N1 = -339. N1 = -338. N1 = -337. N1 = -336. N1 = -335. N1 = -334. N1 = -333. N1 = -332. N1 = -331. N1 = -330. N1 = -329. N1 = -328. N1 = -327. N1 = -326. N1 = -325. N1 = -324. N1 = -323. N1 = -322. N1 = -321. N1 = -320. N1 = -319. N1 = -318. N1 = -317. N1 = -316. N1 = -315. N1 = -314. N1 = -313. N1 = -312. N1 = -311. N1 = -310. N1 = -309. N1 = -308. N1 = -307. N1 = -306. N1 = -305. N1 = -304. N1 = -303. N1 = -302. N1 = -301. N1 = -300. N1 = -299. N1 = -298. N1 = -297. N1 = -296. N1 = -295. N1 = -294. N1 = -293. N1 = -292. N1 = -291. N1 = -290. N1 = -289. N1 = -288. N1 = -287. N1 = -286. N1 = -285. N1 = -284. N1 = -283. N1 = -282. N1 = -281. N1 = -280. N1 = -279. N1 = -278. N1 = -277. N1 = -276. N1 = -275. N1 = -274. N1 = -273. N1 = -272. N1 = -271. N1 = -270. N1 = -269. N1 = -268. N1 = -267. N1 = -266. N1 = -265. N1 = -264. N1 = -263. N1 = -262. N1 = -261. N1 = -260. N1 = -259. N1 = -258. N1 = -257. N1 = -256. N1 = -255. N1 = -254. N1 = -253. N1 = -252. N1 = -251. N1 = -250. N1 = -249. N1 = -248. N1 = -247. N1 = -246. N1 = -245. N1 = -244. N1 = -243. N1 = -242. N1 = -241. N1 = -240. N1 = -239. N1 = -238. N1 = -237. N1 = -236. N1 = -235. N1 = -234. N1 = -233. N1 = -232. N1 = -231. N1 = -230. N1 = -229. N1 = -228. N1 = -227. N1 = -226. N1 = -225. N1 = -224. N1 = -223. N1 = -222. N1 = -221. N1 = -220. N1 = -219. N1 = -218. N1 = -217. N1 = -216. N1 = -215. N1 = -214. N1 = -213. N1 = -212. N1 = -211. N1 = -210. N1 = -209. N1 = -208. N1 = -207. N1 = -206. N1 = -205. N1 = -204. N1 = -203. N1 = -202. N1 = -201. N1 = -200. N1 = -199. N1 = -198. N1 = -197. N1 = -196. N1 = -195. N1 = -194. N1 = -193. N1 = -192. N1 = -191. N1 = -190. N1 = -189. N1 = -188. N1 = -187. N1 = -186. N1 = -185. N1 = -184. N1 = -183. N1 = -182. N1 = -181. N1 = -180. N1 = -179. N1 = -178. N1 = -177. N1 = -176. N1 = -175. N1 = -174. N1 = -173. N1 = -172. N1 = -171. N1 = -170. N1 = -169. N1 = -168. N1 = -167. N1 = -166. N1 = -165. N1 = -164. N1 = -163. N1 = -162. N1 = -161. N1 = -160. N1 = -159. N1 = -158. N1 = -157. N1 = -156. N1 = -155. N1 = -154. N1 = -153. N1 = -152. N1 = -151. N1 = -150. N1 = -149. N1 = -148. N1 = -147. N1 = -146. N1 = -145. N1 = -144. N1 = -143. N1 = -142. N1 = -141. N1 = -140. N1 = -139. N1 = -138. N1 = -137. N1 = -136. Reading checkpoint file. Checkpoint Flag = 1. Cvec Starting Index = 0. N1 Start = -137. N2 Start = -54. k1 Start = 90. k2 Start = 6. PolyCount starting value = 1626159753. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 1194533 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat K = x^2 - 421 S = [2] Disc Bound = 16 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -54 N2_MAX = -54 N1_MIN = -502 N1_MAX = 447 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0 Now starting the targeted Martinet search: N2_L = -54. N2_U = -54. N2 = -54. N1_L = -137. N1_U = 447. N1 = -137. N1 = -136. Reading checkpoint file. Checkpoint Flag = 1. Cvec Starting Index = 0. N1 Start = -137. N2 Start = -54. k1 Start = 90. k2 Start = 6. PolyCount starting value = 1626159753. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 1194533 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat K = x^2 - 421 S = [2] Disc Bound = 16 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -54 N2_MAX = -54 N1_MIN = -502 N1_MAX = 447 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0 Now starting the targeted Martinet search: N2_L = -54. N2_U = -54. N2 = -54. N1_L = -137. N1_U = 447. N1 = -137. N1 = -136. N1 = -135. N1 = -134. Reading checkpoint file. Checkpoint Flag = 1. Cvec Starting Index = 0. N1 Start = -134. N2 Start = -54. k1 Start = 89. k2 Start = 4. PolyCount starting value = 1788972354. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 1263973 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat K = x^2 - 421 S = [2] Disc Bound = 16 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -54 N2_MAX = -54 N1_MIN = -502 N1_MAX = 447 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0 Now starting the targeted Martinet search: N2_L = -54. N2_U = -54. N2 = -54. N1_L = -134. N1_U = 447. N1 = -134. Reading checkpoint file. Checkpoint Flag = 1. Cvec Starting Index = 0. N1 Start = -134. N2 Start = -54. k1 Start = 89. k2 Start = 4. PolyCount starting value = 1788972354. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 1263973 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat K = x^2 - 421 S = [2] Disc Bound = 16 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -54 N2_MAX = -54 N1_MIN = -502 N1_MAX = 447 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0 Now starting the targeted Martinet search: N2_L = -54. N2_U = -54. N2 = -54. N1_L = -134. N1_U = 447. N1 = -134. The few interruptions are due to an OS update, Microsoft Edge crash (this is rare, fortunately!), a video card driver update (I will shut BOINC down due to GPU projects before updating a video card driver because changing a video card driver while a program is using it to compute will cause that program to crash and result in a computation error.), and shutting down BOINC to play some occasional games that I just bought on Steam on February 29, 2016 (Papers, Please and Ys I & II Chronicles+). I have been letting this work unit get crunched for almost all of the time since February 15 to now with a few exceptions. If this work unit fails, could you reissue this work unit after chopping it up to much smaller work units? Each value of N1 generally seems to take multiple hours. My estimate is around 3 to 4 hours per value of N1. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
There are now about 20 of these long WUs left (down from over 100). At some point soon, I will cancel the rest and just run them offline. In regards to extending the deadline, that won't help with a task already in progress. But I can grant you manual credit if someone beats you to the finish line. |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 257,294,379 RAC: 194,567 |
I've had two of these running for a while now (my longest one succumbed to an unfortunate BOINC coding error). Looks like I've recently picked up a third: That one would be WU 12731691. |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
What I am afraid of is that another user gets the work unit after my result times out, and then that user aborts the replacement result. If timeouts are counted as errors, then that abortion pushes the error limit to the max of 8 errors, and the work unit will be automatically cancelled before anyone can complete and report it. There are currently three timeouts, two abortions, and one computation error. If timeouts are not considered errors when deciding whether to automatically cancel a work unit, then I do not have much to worry about. If they are considered errors, then my work and the work of the other wingmen who have decided to continue computing could get prematurely wasted by an automatic cancellation. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
What I am afraid of is that another user gets the work unit after my result times out, and then that user aborts the replacement result. If timeouts are counted as errors, then that abortion pushes the error limit to the max of 8 errors, and the work unit will be automatically cancelled before anyone can complete and report it. There are currently three timeouts, two abortions, and one computation error. If timeouts are not considered errors when deciding whether to automatically cancel a work unit, then I do not have much to worry about. If they are considered errors, then my work and the work of the other wingmen who have decided to continue computing could get prematurely wasted by an automatic cancellation. The timeouts are not considered errors, so I don't believe they count towards the cancellation limit. Somebody please correct me if I am wrong. |
Send message Joined: 12 Aug 12 Posts: 7 Credit: 20,464,039 RAC: 0 |
It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs. The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday). http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866 No happy returns from anyone yet. |
Send message Joined: 15 Mar 15 Posts: 11 Credit: 113,280,935 RAC: 0 |
>>>The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday). >>>http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866 >>>No happy returns from anyone yet. I've been running that one too. On an i7-2600. For 1354 hours now. It's currently at 50.565%. Sigh. |
Send message Joined: 12 Aug 12 Posts: 7 Credit: 20,464,039 RAC: 0 |
>>>The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday). Wow! Mine's at 50.079%, 793 hours at the moment. AMD Phenom II X6 1055T CPU. |
Send message Joined: 15 Mar 15 Posts: 11 Credit: 113,280,935 RAC: 0 |
>>>Wow! Mine's at 50.079%, 793 hours at the moment. >>>AMD Phenom II X6 1055T CPU. Your BOINC benchmark numbers are quite a bit higher than I'd expect a 1055T and Windows to produce. Are you overclocking? |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
The first computer to get work unit 12346684 has finished it. I have aborted my result and suggest that everyone else still computing this work unit to abort it so that our energy is put to better use like solving some other work unit or saved. (I regrettably have to do the latter because the weather is too hot in my area to allow the heat from solving loads of work units to be acceptable anymore. I am living with my parents, and they do not want the heat anymore which means higher air conditioner bills. I will be back in the fall, when they will accept the heat.) |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
The first computer to get work unit 12346684 has finished it. I have aborted my result and suggest that everyone else still computing this work unit to abort it so that our energy is put to better use like solving some other work unit or saved. (I regrettably have to do the latter because the weather is too hot in my area to allow the heat from solving loads of work units to be acceptable anymore. I am living with my parents, and they do not want the heat anymore which means higher air conditioner bills. I will be back in the fall, when they will accept the heat.) I granted you credit for the lost cpu cycles. I hope that eases the pain somewhat. |
Send message Joined: 25 Feb 13 Posts: 216 Credit: 9,899,302 RAC: 0 |
CPU time: 70 days 19 hours 21 min. 15 sec. This might me the longest one. Amazing. |
Send message Joined: 12 Aug 12 Posts: 7 Credit: 20,464,039 RAC: 0 |
It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs. Just got home from vacation and checked this WU again. It's at 1342 hours and 52%. Looks like it's been completed by Capital Avionics on 3/24. I assume that I should abort it? |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs. Yes, you can abort that one. I'll give you credit for it when I get home later. Thanks! |
Send message Joined: 12 Aug 12 Posts: 7 Credit: 20,464,039 RAC: 0 |
It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs. Thanks Eric. Aborted at 1349 hours. Longest one I've seen so far. |
Send message Joined: 9 Apr 15 Posts: 2 Credit: 9,058,122 RAC: 0 |
I've noticed the same situation for two work units, due on 10 April. For the first, the "Time Remaining" column has stayed at either 1 second remaining or 0 seconds remaining for about three days, and the Elapsed time column is now over 4 days. For teh second the "Time Remaining" column has remained at around 20 seconds for today, with over two days elapsed. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
I've noticed the same situation for two work units, due on 10 April. I don't see the WUs you speak of. The earliest ones I see are due April 21; all other WUs on your task list appear to have completed successfully. Do you have a WU name or id? |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 257,294,379 RAC: 194,567 |
I don't see the WUs you speak of. The earliest ones I see are due April 21; all other WUs on your task list appear to have completed successfully. Do you have a WU name or id? Vik is possibly reading reading the 'due date' off BOINC Manager. The slow one we were discussing in the other thread is showing a due date of 16 Apr 2016, 23:56:47 locally, but 26 Apr 2016, 22:56:47 UTC on the website. The difference is made up of 1 hour for time zone offset, and 10 days grace period allowed by the project. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,365,180 RAC: 628,750 |
I don't see the WUs you speak of. The earliest ones I see are due April 21; all other WUs on your task list appear to have completed successfully. Do you have a WU name or id? Oh that's right, I forgot about the difference between what the manager shows and the true deadline on the server. |