Posts by Julien

1) Message boards : Science : Availability of source code (Message 3131) Posted 3 Oct 2021 by Julien Post: Still in the same idea: diff --git a/GetDecicsSrc/polDiscTest_cpu.cpp b/GetDecicsSrc/polDiscTest_cpu.cpp index 6fa7280..f152cd2 100755 --- a/GetDecicsSrc/polDiscTest_cpu.cpp +++ b/GetDecicsSrc/polDiscTest_cpu.cpp @@ -95,7 +95,7 @@ polDiscTest_cpu(long long polBuf, int numPolys, char polGoodFlag, char polyMa // (i.e. A[0]x^10 + A[1]x^9 + ... + A[9]x + A[10]) // BUT FOR IMPLEMENTATION REASONS, WE REVERSE THE ORDER. int64_t A[11]; - for(int col = 0; col < 11; col++) A[10-col] = polBuf[pIdx11+col]; + for(int col = 0; col < 11; col++) A[col] = polBuf[pIdx11 + 10 - col]; // Compute the derivative of A. Call it B. // NOTE: B = 10x^9 + 9A[9]x^8 + 8A[8]x^7 + ... + 2A[2]*x + A[1] (remark: I've increased the number of elements in array in my tests and repeated the same test several times, it happens the modified version may sometimes be a bit slower, hope I'm not in the wrong way :-().
2) Message boards : Science : Availability of source code (Message 3130) Posted 3 Oct 2021 by Julien Post: Another slight patch: diff --git a/GetDecicsSrc/polDiscTest_cpu.cpp b/GetDecicsSrc/polDiscTest_cpu.cpp index 6fa7280..27b66f3 100755 --- a/GetDecicsSrc/polDiscTest_cpu.cpp +++ b/GetDecicsSrc/polDiscTest_cpu.cpp @@ -100,7 +100,7 @@ polDiscTest_cpu(long long polBuf, int numPolys, char polGoodFlag, char polyMa // Compute the derivative of A. Call it B. // NOTE: B = 10x^9 + 9A[9]x^8 + 8A[8]x^7 + ... + 2A[2]x + A[1] int64_t B[10]; - for(int k = 1; k <= 10; k++) B[k-1] = kA[k]; + for(int k = 0; k < 10; k++) B[k] = (k+1)A[k+1]; // The discriminant of a monic decic is -1Resultant(A,B). // We use Algorithm 3.3.7 in Cohen's Book (p.122). Here's a small snippet of code to test this: //C++11 Style:* #include <chrono> #include <iostream> void test1() { int64_t A[11]={0}; int64_t B[10]={0}; std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now(); for(int k = 1; k <= 10; k++) B[k-1] = kA[k]; std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now(); std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds> (end - begin).count() << "[ns]" << std::endl; } void test2() { int64_t A[11]={0}; int64_t B[10]={0}; std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now(); for(int k = 0; k < 10; k++) B[k] = (k+1)A[k+1]; std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now(); std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds> (end - begin).count() << "[ns]" << std::endl; } int main() { test1(); test2(); } ~ Testing this, I get: julien@debianamd:~/test/cpp$ ./a.out Time difference = 130[ns] Time difference = 70[ns] julien@debianamd:~/test/cpp$ ./a.out Time difference = 160[ns] Time difference = 140[ns] julien@debianamd:~/test/cpp$ ./a.out Time difference = 140[ns] Time difference = 60[ns] julien@debianamd:~/test/cpp$ ./a.out Time difference = 220[ns] Time difference = 80[ns] julien@debianamd:~/test/cpp$ ./a.out Time difference = 130[ns] Time difference = 70[ns] julien@debianamd:~/test/cpp$ ./a.out Time difference = 150[ns] Time difference = 120[ns] julien@debianamd:~/test/cpp$ ./a.out Time difference = 160[ns] Time difference = 80[ns] So I know that with caching mechanism the difference between methods varies but the second one seems better.
3) Message boards : Science : Availability of source code (Message 3129) Posted 3 Oct 2021 by Julien Post: Hello, It certainly won't change perf but here's a slight simplification for initialization that can't be bad: diff --git a/GetDecicsSrc/polDiscTest_cpu.cpp b/GetDecicsSrc/polDiscTest_cpu.cpp index 6fa7280..7e0ee3c 100755 --- a/GetDecicsSrc/polDiscTest_cpu.cpp +++ b/GetDecicsSrc/polDiscTest_cpu.cpp @@ -72,12 +72,9 @@ polDiscTest_cpu(long long polBuf, int numPolys, char polGoodFlag, char *polyMa #ifdef COEFF_SIZE_TEST // maxBits[i][j] = max bits required when starting an iteration with degA=i and degB=j. - int maxBits[11][10]; - int maxBitsGH[11][10]; // A separate measurement for g and h. - long freq[11][10]; - for(int i=0; i<=10; ++i) { - for(int j=0; j<=9; ++j) { maxBits[i][j]=0; maxBitsGH[i][j]=0; freq[i][j]=0; } - } + int maxBits[11][10] = {0}; + int maxBitsGH[11][10]= {0}; // A separate measurement for g and h. + long freq[11][10] = {0}; mpz_t absB, absR, absGH; mpz_inits(absB, absR, absGH, NULL); #endif (from here: https://slaystudy.com/initialize-2d-array-c-to-zero/)
4) Message boards : Number crunching : Proposed New Badge System (Message 3112) Posted 2 Sep 2021 by Julien Post: Argh, Eric if you have time to spend on badges system, it means you can't optimize more anymore any of the apps? Or perhaps you've got in mind that having a brand new badge systems may attract more people/machines so it may help to crunch more and you may be right badfully... I mean, the important thing is to finish all the calculus the quickest way possible not to accumulate some medals/badges which are collected not thanks to some intellectual/physical working but only thanks to the machine time (and of course perf of the machine) we devote.
5) Message boards : Science : Availability of source code (Message 3088) Posted 16 May 2021 by Julien Post: I finally succeeded in building it. subdir "boinc" was empty, so I removed it and did a soft link with: ln -s /home/julien/projects/boinc/ Then I replaced in Makefile.cpu_linux64 (since it's the one I use) -CXXFLAGS = -O3 -Wall -fmax-errors=4 +CXXFLAGS = -O3 -Wall -fmax-errors=4 -fopt-info-vec-missed and it built ! I could retrieve this kind of messages: polDiscTest_cpu.cpp:87:23: missed: not vectorized: loop nest containing two or more consecutive inner loops cannot be vectorized polDiscTest_cpu.cpp:571:19: missed: couldn't vectorize loop polDiscTest_cpu.cpp:604:26: missed: couldn't vectorize loop polDiscTest_cpu.cpp:604:26: missed: not vectorized: control flow in loop. polDiscTest_cpu.cpp:613:11: missed: couldn't vectorize loop polDiscTest_cpu.cpp:613:11: missed: not vectorized: latch block not empty. polDiscTest_cpu.cpp:504:47: missed: statement clobbers memory: __gmpz_mul (&hPow, &h, &hPow); polDiscTest_cpu.cpp:500:23: missed: couldn't vectorize loop ... For the moment, I saw nothing which could be fixed but am not autovectorize expert. Then I thought about putting this flag for Pari lib and noticed that version used was Pari 2.8 and https://pari.math.u-bordeaux.fr/download.html proposed 2.13 version. Perhaps there have been some improvements? (Of course, I suppose it needs some time/work to retrieve it, build it, test if there's no regression and if it indeed improves perfs).
6) Message boards : Science : Availability of source code (Message 3086) Posted 13 May 2021 by Julien Post: I know no guru about it. I wanted to try to build because I noticed the option -fopt-info-vec-missed ("Detailed info about loops not being vectorized, and a lot of other detailed information"). So after building pari-2.8-1711-ge5c317c (make all failed on doc part but at least libs were ok) and make install, then make install for boinc, I failed to build get-decics-numberfields just at the beginning. I tried: make -f ./GetDecicsSrc/Makefile.cpu_linux64 => make: * No rule to make target 'GetDecics.cpp', needed by 'Olinux-x86_64/GetDecics.o'. Stop Then I also tried: cd GetDecicsSrc make Makefile.cpu_linux64 => g++ -O3 -Wall -fmax-errors=4 -I/home/eric/BOINC_Project/boinc -I/home/eric/BOINC_Project/boinc/lib -I/home/eric/BOINC_Project/boinc/api -I/home/eric/BOINC_Project/boinc/sched -I/home/eric/BOINC_Project/boinc/db -I/home/eric/BOINC_Project/Pari/pari-2.8-1711-ge5c317c/src/headers -I/home/eric/BOINC_Project/Pari/pari-2.8-1711-ge5c317c/Olinux-x86_64 -DAPP_VERSION_CPU_STD -c -o Olinux-x86_64/GetDecics.o GetDecics.cpp GetDecics.cpp:8:12: fatal error: config.h: No such file or directory 8 \| # include "config.h" \| ^~~~~~~~~~ compilation terminated. make: * [Makefile.cpu_linux64:50: Olinux-x86_64/GetDecics.o] Error 1 Then I read this afterwards: "The makefiles will need to be modified so that the compiler can find the source files (both pari and boinc) and the libraries. There is also a dependency on the gmp library, so that will need to be installed. And obviously, openCL headers and libOpenCL will need to be present (and makefile changed accordingly). " I'm more accustomed to configure/autogen (which complain because of missing libs that we must install to make them happy) then make. Giving my little knowledge about building config, I can't go on. Anyway, hopefully other people knowing building conf would be able to build this and try some options.[/quote]
7) Message boards : Science : Availability of source code (Message 3084) Posted 12 May 2021 by Julien Post: Hello, Perhaps it's have been already tried but thought that auto vectorization may help. Indeed, it's typically the kind of calculus where it could be applied here. See: - https://en.wikipedia.org/wiki/Automatic_vectorization - https://www.codingame.com/playgrounds/283/sse-avx-vectorization/autovectorization - https://locklessinc.com/articles/vectorize/ (on my tablet it works but on my pc it's blank page thought for both I use Firefox) If you already knew this and tried to implement it by tuning a bit code and/or using compilator flags, sorry for the noise. Regards, Julien
8) Message boards : Science : Availability of source code (Message 3067) Posted 8 Apr 2021 by Julien Post: Hello, Ok I don't know what happened about my message. I wonder if extracting invariants in some loops may help a bit + add a bit of redundancy to avoid n tests. Here's a patch I must recognize I haven't even built since I haven't taken a look at the build mechanism for the moment. diff --git a/GetDecicsSrc/TgtMartinet.cpp b/GetDecicsSrc/TgtMartinet.cpp index e62c06a..014feaa 100755 --- a/GetDecicsSrc/TgtMartinet.cpp +++ b/GetDecicsSrc/TgtMartinet.cpp @@ -228,22 +228,45 @@ int TgtMartinet(char FilenameIn, char FilenameOut, int ChkPntFlag, pari_long C // sig1w[1] = sig1(w1) and sig1w[2] = sig1(w2) // Since K is quadratic, w1=1, so sig1w[1]=1 (the code below is left over from a more general version) // The same remarks also apply to sig2w[:]. - for(a11=0;a11<=2;++a11) + bool b4DividesdK((dK%4)==0); + if (b4DividesdK) { - for(a12=0;a12<=2;++a12) + for(a11=0;a11<=2;++a11) { - if(a11==2 && a12==2 && (dK%4)!=0) // a1=2+2w and 4 does not divide dK + auto tmp_gmulsg_a11_sig1w = gmulsg(a11,(GEN)sig1w[1]); + auto tmp_gmulsg_a11_sig2w = gmulsg(a11,(GEN)sig2w[1]); + for(a12=0;a12<=2;++a12) { - sig1a1 = gadd( gmulsg(-1,(GEN)sig1w[1]),gmulsg(a12,(GEN)sig1w[2]) ); - sig2a1 = gadd( gmulsg(-1,(GEN)sig2w[1]),gmulsg(a12,(GEN)sig2w[2]) ); + sig1a1 = gadd( tmp_gmulsg_a11_sig1w, gmulsg(a12,(GEN)sig1w[2]) ); + sig2a1 = gadd( tmp_gmulsg_a11_sig2w,gmulsg(a12,(GEN)sig2w[2]) ); + Ca1_pre[3a11+a12] = ( gtodouble(gsqr(gabs(sig1a1,DEFAULTPREC))) + + gtodouble(gsqr(gabs(sig2a1,DEFAULTPREC))) )/5.0; } - else + } + } + else + { + auto tmp_gmulsg_sig1w = gmulsg(-1,(GEN)sig1w[1]); + auto tmp_gmulsg_sig2w = gmulsg(-1,(GEN)sig2w[1]); + for(a11=0;a11<=2;++a11) + { + auto tmp_gmulsg_a11_sig1w = gmulsg(a11,(GEN)sig1w[1]); + auto tmp_gmulsg_a11_sig2w = gmulsg(a11,(GEN)sig2w[1]); + for(a12=0;a12<=2;++a12) { - sig1a1 = gadd( gmulsg(a11,(GEN)sig1w[1]),gmulsg(a12,(GEN)sig1w[2]) ); - sig2a1 = gadd( gmulsg(a11,(GEN)sig2w[1]),gmulsg(a12,(GEN)sig2w[2]) ); + if(a11==2 && a12==2) // a1=2+2w and 4 does not divide dK + { + sig1a1 = gadd( tmp_gmulsg_sig1w, gmulsg(a12,(GEN)sig1w[2]) ); + sig2a1 = gadd( tmp_gmulsg_sig2w, gmulsg(a12,(GEN)sig2w[2]) ); + } + else + { + sig1a1 = gadd( tmp_gmulsg_a11_sig1w, gmulsg(a12,(GEN)sig1w[2]) ); + sig2a1 = gadd( tmp_gmulsg_a11_sig2w, gmulsg(a12,(GEN)sig2w[2]) ); + } + Ca1_pre[3a11+a12] = ( gtodouble(gsqr(gabs(sig1a1,DEFAULTPREC))) + + gtodouble(gsqr(gabs(sig2a1,DEFAULTPREC))) )/5.0; } - Ca1_pre[3*a11+a12] = ( gtodouble(gsqr(gabs(sig1a1,DEFAULTPREC))) + - gtodouble(gsqr(gabs(sig2a1,DEFAULTPREC))) )/5.0; } }
9) Message boards : Science : Availability of source code (Message 3064) Posted 8 Apr 2021 by Julien Post: Hello, Thank you for the quick feedback! No pb to access it. I hadn't expected 2.4GB to download ! :-) I don't know if you received my message where I proposed a patch. (I haven't forked yet your repo to use merge request feature)
10) Message boards : Science : Availability of source code (Message 3061) Posted 5 Apr 2021 by Julien Post: Hello, Would it be possible to make available source code (CPU and GPU related programs) so people may try to propose some changes to optimize? Julien
11) Message boards : News : GPU app status update (Message 2377) Posted 7 Apr 2019 by Julien Post: Hello again, Thank you for your feedback. Also, perhaps it could be interesting you send your code about GPU polynomial discriminant algorithm to PARI authors. Indeed, it could help them, perhaps they could find some flaws but also they may have some idea to improve it even more!
12) Message boards : News : GPU app status update (Message 2365) Posted 6 Apr 2019 by Julien Post: Hello, I didn't find the answer in FAQ so just asking here: did you plan to put code (for Cpu and Gpu Nvidia or Amd) on github/gitlab or similar so people may contribute? Eg: I use cppcheck (a C/C++ static analyzer) to find some bugs.