Message boards :
Science :
Should Job Replication be Enabled for this Project?
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Mar 19 Posts: 20 Credit: 140,475,236 RAC: 6,691 ![]() ![]() ![]() |
I believe I mentioned this before, but let me start a new thread for discussion. I noticed that NumberFields@home distributes a work unit (WU) to only one client and accepts the results computed by that client. However, the hardware and software stack used by volunteers may not be as reliable as an HPC or server grid. Many other projects, like Einstein@home, send one WU as two tasks to different clients (replication) and compare the results afterward to ensure reliability. I wonder if it would be worthwhile for NumberFields@home to implement this, or if there is a way for the NumberFields@home server app to mathematically validate the client's results without replication. |
![]() Send message Joined: 8 Jul 11 Posts: 1373 Credit: 656,730,067 RAC: 836,960 ![]() ![]() ![]() |
I believe I mentioned this before, but let me start a new thread for discussion. I believe this was discussed years ago and is probably lost deep within some other thread, so I guess it doesn't hurt to revisit the issue. The bottom line is that replication would cause the searches to take twice as long to complete, and this is time we do not have. We could always run a 2nd pass at a later time if it was deemed necessary. Also, when collating the data, my backend scripts detect anything suspicious and these cases are rerun offline. To this day I have never detected anything intentional, just the occasional bit error during upload (I assume). In addition, the algorithm has an inherent redundancy in that every field has many polynomial representatives and the algorithm gives them all (within the bounds). Note that every field is usually found 10 times over within a given search. There are also tests that can be performed on the final data sets that are analogous to parity tests. To summarize, if a few bad actors were modifying the results and managed to go undetected, then chances are it would still not make a difference in the final resuts. I hope that explains the reasoning behind our decision to not use job replication. |