I cant send finished wus

Message boards : Number crunching : I cant send finished wus
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2014 - Posted: 11 Apr 2018, 1:19:12 UTC - in response to Message 2013.  


Ehm yes...
#95070 now pending
#95038 still pending
#95079 successfully uploaded (?)
#95069 suddenly successfully uploaded (!)

I don´t get it. There is some magic going on here. :)


Are you mixing up your host ids? I show nothing uploaded by #95069, but #95038 returning a successful Septic result.
ID: 2014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HerrJeh

Send message
Joined: 2 Apr 18
Posts: 10
Credit: 243,524
RAC: 0
Message 2015 - Posted: 11 Apr 2018, 9:47:18 UTC - in response to Message 2014.  

Hi Erich looks different to me - "Office" #95038 is still trying to upload the same septic-wus in Boinc. There is one decic wu on this host, which is ready to report. But i will have a closer look.
ID: 2015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 134
Credit: 110,263,614
RAC: 29,192
Message 2016 - Posted: 11 Apr 2018, 10:35:16 UTC

Here is an absolutely off-the-wall observation, which may be completely unrelated - but it has some similarities with this situation. I present it as food for thought, nothing more.

Like HerrJeh, I have multiple machines - all are at my home, and share an internet connection. They have the same public IP addresses, but different private IP addresses behind the router. I'm connected to a different BOINC project, which like this one runs every BOINC function from a single server.

That other project distributes long-running (up to 18 hour) workunits, but regards them as time-critical - it likes to have them returned within 24 hours. So I don't want to get a cache of tasks in advance, and in fact the project often has no tasks immediately available on demand. All of which is a long way of explaining why I find myself repeatedly clicking the 'Update' button in an attempt to get new work before the current task completes.

What I am observing is that I can issue repeated requests every 30 seconds from one machine, and connect to the server every time. But if I try to connect from a different machine (same LAN, same IP) in between, the second machine can't connect. If I stop updating the first machine for a minute or two, the second machine can connect, and goes on connecting for as long as is needed and allowed by their 30-second backoff interval. The nature of the server contact required doesn't affect the connection failures: I've just had a machine which couldn't connect to upload results, while I was requesting new work on a different machine.

It isn't simply congestion at the server port: that would be more random. This observation is strictly about multiple connection attempts, closely spaced in time, from different computers sharing the same public IP address. It feels more like a server OS-level problem than a BOINC problem, and it's been consistent for weeks, if not months.

Their server is running

11/04/2018 11:03:50 | | [http] [ID#1] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
My machines are all Windows 7/64 and run recent versions of BOINC (mostly v7.9 test builds). I haven't yet explored the http logs for further clues, but I'll give it a try when I have time.
ID: 2016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2017 - Posted: 11 Apr 2018, 15:49:01 UTC - in response to Message 2015.  

Hi Erich looks different to me - "Office" #95038 is still trying to upload the same septic-wus in Boinc. There is one decic wu on this host, which is ready to report. But i will have a closer look.


That is odd. I see something different. I'm doing database queries through the web admin interface since your computers are hidden from the normal view. I show you uploaded a septic result on April 10th for WU: septics_Bnd200E6_Grp153126of3001592. I also verified the uploaded file resides in the final assimilated directory, so all looks well on my end.
Is that the same WU your client is still trying to upload?
ID: 2017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 134
Credit: 110,263,614
RAC: 29,192
Message 2018 - Posted: 11 Apr 2018, 16:01:32 UTC - in response to Message 2017.  

And we can see that WU (23257062 - task 25321670) was reported and validated by host 95038 at 2:10:51 UTC on 10 Apr 2018. We can't see the user name ('anonymous'), to match the 'hidden' setting for your computers. But you might be able to see some host log activity around that time.
ID: 2018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HerrJeh

Send message
Joined: 2 Apr 18
Posts: 10
Credit: 243,524
RAC: 0
Message 2019 - Posted: 12 Apr 2018, 11:02:13 UTC - in response to Message 2018.  

hi, i opened up hosts for you and changed wus to decic only. @Richard: Like you said - there is a local network running, up to 7 hosts working on boinc. Your idea seems plausible to me, i will give it a try and shut down single managers for a while to see if behaviour changes.

Thank you for your support!
ID: 2019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2020 - Posted: 12 Apr 2018, 15:10:53 UTC - in response to Message 2019.  

@HerrJeh: It looks like all your hosts with the exception of #95062 have returned at least one septic result. Let me know if any of the WUs showing "in progress" are still unable to upload. I can run them offline to see if there is anything unusual about them (like file size).

@UrsD: Sorry to have neglected you. Are you still having upload problems? Several tasks show "timed out"; I imagine some of those are the ones with upload problems?
ID: 2020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HerrJeh

Send message
Joined: 2 Apr 18
Posts: 10
Credit: 243,524
RAC: 0
Message 2021 - Posted: 12 Apr 2018, 16:06:09 UTC - in response to Message 2020.  

@UrsD: Sorry, i didn´t want to highjack your thread.
@Eric: Thank you.
ID: 2021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UrsD

Send message
Joined: 25 Feb 18
Posts: 5
Credit: 1,047,953
RAC: 0
Message 2027 - Posted: 24 Apr 2018, 2:47:12 UTC - in response to Message 2020.  

I just run Get Decic Fields on my win10 pc. No problem.
My good old i7 965 runs septics under linux. No prob!

If U cant handle the problem. Avoid it! :-) ;-)
ID: 2027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UrsD

Send message
Joined: 25 Feb 18
Posts: 5
Credit: 1,047,953
RAC: 0
Message 2028 - Posted: 24 Apr 2018, 17:22:05 UTC - in response to Message 2021.  

@HerrJeh. Kein Thema ;)
ID: 2028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pschoefer

Send message
Joined: 10 Oct 15
Posts: 3
Credit: 2,750,975
RAC: 0
Message 2040 - Posted: 7 May 2018, 22:32:55 UTC - in response to Message 2003.  

Kaspersky18@work - could this be a problem?

Looks like it is. Several users complained about upload problems in the last few days and they all used Kaspersky. Apparently, the upload itself works fine (which is why everything looks fine on server side), but the server's confirmation that the file was received is blocked by Kaspersky (so the client thinks it went wrong and tries again). This does not only affect NumberFields@home, but at least one other (and probably all) project that also uses HTTPS for file uploads. HTTP appears to be fine, which is why the problem doesn't occur with Get Decic Fields tasks.

Workarounds:
- disable network traffic scan of BOINC Client in Kaspersky
- set <http_1_0>1</http_1_0> in cc_config.xml to force BOINC to use HTTP 1.0 for file transfers
ID: 2040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2041 - Posted: 8 May 2018, 1:05:51 UTC - in response to Message 2040.  

Thanks for the update! It looks like the mystery is finally solved...
ID: 2041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AussieGeoff

Send message
Joined: 6 May 18
Posts: 4
Credit: 267,684
RAC: 0
Message 2046 - Posted: 11 May 2018, 23:03:46 UTC

Hi

I have read the thread but I also cant upload some WUs. All Decic WUs upload without problem but most Septics WUs fail with

12/05/2018 8:51:24 AM | NumberFields@home | Temporarily failed upload of wu_septics_Bnd200E6_Grp1093864of3001592_0_r1114902147_0: transient HTTP error

However some Septics units do upload.

Windows 7 Pro x64, BOINC 7.8.3 but I dont run Kaspersky.
ID: 2046 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2047 - Posted: 12 May 2018, 7:00:44 UTC - in response to Message 2046.  

Hi

I have read the thread but I also cant upload some WUs. All Decic WUs upload without problem but most Septics WUs fail with

12/05/2018 8:51:24 AM | NumberFields@home | Temporarily failed upload of wu_septics_Bnd200E6_Grp1093864of3001592_0_r1114902147_0: transient HTTP error

However some Septics units do upload.

Windows 7 Pro x64, BOINC 7.8.3 but I dont run Kaspersky.


I'm wondering if the file size might be playing a part. The number of fields in the septic tasks that you successfully uploaded were close to average or less.

If it's not too much to ask, could you look at some of the stuck tasks in the slot directories to see how many fields are in each file (#lines is sufficient).
I am also running the one you mentioned above to see if I notice anything out of the ordinary.
ID: 2047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AussieGeoff

Send message
Joined: 6 May 18
Posts: 4
Credit: 267,684
RAC: 0
Message 2048 - Posted: 12 May 2018, 10:02:12 UTC - in response to Message 2047.  
Last modified: 12 May 2018, 10:07:07 UTC

Hi

The output files vary in size between 585B and 1.18kB, the output files are between 13 and 26 lines in length with the last 5 lines of the form

# The search is complete. Stats:
# Inspected 11965108 polynomials.
# Num Polys post discriminant = 11965005.
# Num Polys passing field disc test = 18.
# Elapsed Time = 48121 (sec)

There are six (6) completed WUs in total that don't upload. The completed output files are all in E:\BOINC Data\projects\numberfields.asu.edu_NumberFields, they are not stuck in any slot directories as incomplete.
ID: 2048 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2050 - Posted: 12 May 2018, 15:10:47 UTC - in response to Message 2048.  

Hi AussieGeoff,

So it doesn't look like a file size problem. 13 lines (8 fields) is below average and you've returned other results with more fields. Also, I ran Grp1093864, which was the one you mentioned above that couldn't upload, and this one only had 9 fields.

I will think about this problem some more.
ID: 2050 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2051 - Posted: 13 May 2018, 6:59:19 UTC - in response to Message 2050.  

So that everyone is in the loop, Geoff emailed me one of the non-uploadable result files and I determined there is nothing wrong with it. Therefore, I believe this problem is client or host related.

Geoff: Do you have a firewall and/or virus scan software that could be temporarily disabled before trying to upload?
Or based on pschoefer's post above could you try this:
set 1 in cc_config.xml to force BOINC to use HTTP 1.0 for file transfers
ID: 2051 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AussieGeoff

Send message
Joined: 6 May 18
Posts: 4
Credit: 267,684
RAC: 0
Message 2052 - Posted: 14 May 2018, 4:38:38 UTC - in response to Message 2051.  

So that everyone is in the loop, Geoff emailed me one of the non-uploadable result files and I determined there is nothing wrong with it. Therefore, I believe this problem is client or host related.

Geoff: Do you have a firewall and/or virus scan software that could be temporarily disabled before trying to upload?
Or based on pschoefer's post above could you try this:
set <http_1_0>1</http_1_0> in cc_config.xml to force BOINC to use HTTP 1.0 for file transfers


Hi

I disabled all my malware protection and it seemed to work - all 6 files uploaded after 2-3 attempts at each.
ID: 2052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 945
Credit: 104,027,016
RAC: 68,293
Message 2053 - Posted: 14 May 2018, 5:50:23 UTC - in response to Message 2052.  

That's great news Geoff! So Kaspersky is not the only malware provider that conflicts with BOINC.

I imagine there is a way to configure your malware protection so that it ignores the BOINC client. If not you could try to use the http config work around mentioned above.
ID: 2053 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AussieGeoff

Send message
Joined: 6 May 18
Posts: 4
Credit: 267,684
RAC: 0
Message 2054 - Posted: 14 May 2018, 23:47:16 UTC - in response to Message 2053.  

That's great news Geoff! So Kaspersky is not the only malware provider that conflicts with BOINC.

I imagine there is a way to configure your malware protection so that it ignores the BOINC client. If not you could try to use the http config work around mentioned above.


I have been checking and the problem seems to be WinPatrol Firewall and/or WinPatrol WAR (anti malware). They run a shared service that continues to run after you shut them both down. That service runs another service that seems to be the problem because it is a pain to get stopped and as soon as you do BOINC issues a message that your login is invalid and you have to start BOINC again. The original WinPatrol is fine.

What really annoyed me is that there is no indication of any action (blocking) in the logs of either of the 2 program.
ID: 2054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : I cant send finished wus


Main page · Your account · Message boards


Copyright © 2019 Arizona State University