FP64

Message boards : Number crunching : FP64
Message board moderation

To post messages, you must log in.

AuthorMessage
Xterelle

Send message
Joined: 12 Oct 25
Posts: 2
Credit: 766,704
RAC: 8,472
Message 4125 - Posted: 2 Jan 2026, 10:25:16 UTC

If I buy a high-performance FP64 graphics card, such as an Nvidia Tesla V100 or P100, will there be a big difference in performance? Or will I be able to run multiple tasks simultaneously? Or do I need different computing performance?
ID: 4125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1442
Credit: 884,259,809
RAC: 1,565,101
Message 4126 - Posted: 2 Jan 2026, 15:33:47 UTC - in response to Message 4125.  

If I buy a high-performance FP64 graphics card, such as an Nvidia Tesla V100 or P100, will there be a big difference in performance? Or will I be able to run multiple tasks simultaneously? Or do I need different computing performance?

The computations on this project are almost 100% int64. I am not familiar with the V100 or P100, but according to google gemini:

The NVIDIA V100 GPU is not a good choice for applications that are strictly "int64" intensive because it lacks a dedicated, high-performance 64-bit integer arithmetic logic unit (ALU).  Most GPUs, including the V100, emulate 64-bit integer operations using 32-bit units, making them significantly slower than CPUs for these specific tasks. 

Performance Breakdown:
     Floating-Point Performance: The V100 excels at floating-point (FP) operations, especially single-precision (FP32), half-precision (FP16), and double-precision (FP64), which are common in AI and traditional High-Performance Computing (HPC) simulations. It offers up to 7.8 TFLOPS of FP64 performance, which is high for GPU standards but still relies on floating-point arithmetic, not integer.
     Integer Performance: The V100's primary integer performance optimization is for INT8 operations, leveraged by its Tensor Cores for deep learning inference. It does not have specialized hardware for high-throughput 64-bit integer arithmetic.
     CPU Comparison: For workloads where 64-bit integers are the dominant data type (common in specific types of data analytics or computational finance that do not rely on floating-point numbers), a high-end CPU will generally outperform a V100. 

Conclusion:
If your application's core computational bottlenecks are fundamentally tied to 64-bit integer arithmetic, the V100 will likely underperform compared to a modern CPU designed for such operations. The V100's strengths lie in its massive parallelism and specialized hardware for floating-point matrix operations and deep learning tasks.
ID: 4126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 4 Jan 25
Posts: 44
Credit: 225,112,179
RAC: 835,079
Message 4127 - Posted: 2 Jan 2026, 20:17:19 UTC - in response to Message 4125.  

If I buy a high-performance FP64 graphics card, such as an Nvidia Tesla V100 or P100, will there be a big difference in performance?
It was high performance 10 years ago, roughly equivalent to the GTX 3070, and pulls 250W max. It is barely faster than my RTX 2060 Super, which is flat out processing 1 Task at a time.

My RTX 2060 Super is averaging 11min 30sec per Task, while running 1 Task.
My RTX 4080 Super is averaging 10min 15sec per Task, while running 3 at a time.
Grant
Darwin NT, Australia.
ID: 4127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Xterelle

Send message
Joined: 12 Oct 25
Posts: 2
Credit: 766,704
RAC: 8,472
Message 4128 - Posted: 3 Jan 2026, 7:35:30 UTC - in response to Message 4127.  

So my 4070 ti, which takes 3, 4 minutes per task, runs about the same number of tasks per hour?
ID: 4128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 4 Jan 25
Posts: 44
Credit: 225,112,179
RAC: 835,079
Message 4129 - Posted: 3 Jan 2026, 22:32:00 UTC - in response to Message 4128.  
Last modified: 3 Jan 2026, 22:32:47 UTC

So my 4070 ti, which takes 3, 4 minutes per task, runs about the same number of tasks per hour?
Roughly.
I'm using the Remaining (estimated time) for unstarted Tasks for comparisons- you're running BOINC 8.2.8 which from memory that got broken and no longer updates as Tasks are returned.


So based on my current numbers
RTX 4080 Super  10min 17sec, 3 at a time = 420 per day.
RTX 4070TI Super 7min 56sec, 2 at a time = 363 per day.
RTX 2060 Super  11min 26sec, 1 at a time = 126 per day.
I worked out what gave the best output per 24hrs ages back, before these much longer running Tasks. So running one less Task may give better throughput, but it probably wouldn't be much of an increase.

If i get bored and have some time i might have a play and see if it is worth it or not.

Basically download a day or more's worth or work, then turn off the BOINC network connection.
For 1 Task at a time do at least 40 Tasks, manually work out their average processing time, then change the number of Tasks to two, do at least 80 Tasks, manually work out their average processing time, etc,etc (making use of GPUz to see what the GPU load is- if it's sustained over 90%, then there's no point running more Tasks, a bit less than 90%, might as well try it and see. Less than 70% is almost always worth running another Task. Less than 60%, it's always worth running another Task).
Grant
Darwin NT, Australia.
ID: 4129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : FP64


Main page · Your account · Message boards


Copyright © 2026 Arizona State University