Should Job Replication be Enabled for this Project?

Author	Message
Zhiwei Send message Joined: 14 Mar 19 Posts: 20 Credit: 140,568,686 RAC: 2,314	Message 3842 - Posted: 12 May 2025, 8:11:25 UTC I believe I mentioned this before, but let me start a new thread for discussion. I noticed that NumberFields@home distributes a work unit (WU) to only one client and accepts the results computed by that client. However, the hardware and software stack used by volunteers may not be as reliable as an HPC or server grid. Many other projects, like Einstein@home, send one WU as two tasks to different clients (replication) and compare the results afterward to ensure reliability. I wonder if it would be worthwhile for NumberFields@home to implement this, or if there is a way for the NumberFields@home server app to mathematically validate the client's results without replication. ID: 3842 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1388 Credit: 692,928,292 RAC: 828,852	Message 3843 - Posted: 12 May 2025, 17:48:01 UTC - in response to Message 3842. I believe I mentioned this before, but let me start a new thread for discussion. I noticed that NumberFields@home distributes a work unit (WU) to only one client and accepts the results computed by that client. However, the hardware and software stack used by volunteers may not be as reliable as an HPC or server grid. Many other projects, like Einstein@home, send one WU as two tasks to different clients (replication) and compare the results afterward to ensure reliability. I wonder if it would be worthwhile for NumberFields@home to implement this, or if there is a way for the NumberFields@home server app to mathematically validate the client's results without replication. I believe this was discussed years ago and is probably lost deep within some other thread, so I guess it doesn't hurt to revisit the issue. The bottom line is that replication would cause the searches to take twice as long to complete, and this is time we do not have. We could always run a 2nd pass at a later time if it was deemed necessary. Also, when collating the data, my backend scripts detect anything suspicious and these cases are rerun offline. To this day I have never detected anything intentional, just the occasional bit error during upload (I assume). In addition, the algorithm has an inherent redundancy in that every field has many polynomial representatives and the algorithm gives them all (within the bounds). Note that every field is usually found 10 times over within a given search. There are also tests that can be performed on the final data sets that are analogous to parity tests. To summarize, if a few bad actors were modifying the results and managed to go undetected, then chances are it would still not make a difference in the final resuts. I hope that explains the reasoning behind our decision to not use job replication. ID: 3843 · Rating: 0 · rate: / Reply Quote