Tuesday, August 31, 2010

Integer GPU computing apps..

Integer programs are now being routed to GPU en massse..

In one year RSA, molmud, elliptic curves ops, parts of factoring in ECM and Mersenne GIMP programs, logarithm discre problem solver,have been ported:

First see Bernstein GPU work:
(e)ecm on gpu january 09 : edward curves 48g mulmod/s on 280bit mod GTX295
called gpu-ecm
software avaiable (with source)(1-fase) chung meng cheng research page..
cuda-eecm: september 09 best optimized curves edwards on cell,cpu and gpu
now 500g mulmod/s on 192 mod (scales as pow(280/192,2)) so 6-7 times faster than previous record..
CPU imp now ported from GMP to MPFQ and better EECM usage:
GMP-ECM->EECM-MPFQ software with source avaiable at:
I think gpu soon avaiable in cpu page..
nearsha gpu and cpu client

RSA see dublin research group (also best aes imp and good mulmod on Zp or ZN)

Factor code:
Msieve 1.44 gpu download win32 binary :
with c160 gpu load 99%
On SVN source has VC2008 projects by Brian Gladman..
says 27x 9800gt vs intel core duo
examples:
9370548739750343689742077059611741296688413458087068027338328923603585147935698143105876573510157864118212297131774808193943011745511363829026508600700379919701

3414023265048252827894893895448283501597256998523545196425280040055849104721167589947328246556695586532677342768160211760950557294071424000

Mersenne programs:
Maclucasfftw_cuda (now using cufft instead of fftw) seems validated computations altough direct porting.. uses doubles so gt2xx i think and low
speed developers waiting for fermi 5x impr at least expect..
now seems gtx275 with 2048k and 4096k fft seems 2x perf over highly optimized single thread on 3ghz core 2 so at least fermi with 5x perf better than nehalem or k10 (?)

logarithm discrete solver 0.3: 0.1 in 2007 was better 16x than previous state of the art code all at x86 with 0.2 in 2008 have 64bits support and better scalability and now in spet 2009 cuda code with python interface..
the python interface is promising as has dll for cuda version so you know how to call it and has cubins.. for testing decuda..
no sources..

CPU implementations are getting faster:

GMP 5 released with better asimptotic very fast mult, div,etc.. also mingw64
support so the best probably better than before 4.3 with gladman vs2008 port using yasm and probably better than mpir as is gladman win stuff with yasm

MPIR 1.3(4) in SVN with Nehalem assembler and tuned mp_param also seems some code is very good before with fft mul,etc.. so has to test gmpbench 0.2 with mpir trunk and gmp5 x86 and 64 on windows and linux at least..

MPFQ 1.0rc2 released in october (windows support? or fixes..)

there exist MPFR and a lib using transcendentals on google code..
Also two breaktrough news:
pi world record on nehalem
768 rsa factored zimmerman stuff..

Last AMD GPU has integer sad and new integer instructions see
SA2009 course..
in parboil benchmark has sad (h.264) test would be good porting to ocl for getting sad optimized with ati ocl sad instructions what speedup vs fermi?

logarithm solver

No comments:

Post a Comment