Date: Mon, 20 Feb 2012 23:16:08 +0100 From: Samuele Giovanni Tonon <samu@...uxasylum.net> To: john-dev@...ts.openwall.com Subject: Re: Recent github patches On 02/20/12 20:38, magnum wrote: > On 02/07/2012 09:33 AM, Samuele Giovanni Tonon wrote: >> - i added mysql-sha1, which is basically sha(sha(p)) , >> on my test i saw it going faster than on cpu. any test >> is really appreciated > > A quad-core OMP build outperforms it, there's probably a lot of > optimisations that can be made. Other than that it seems to work as it > should (passes test suite). > ouch, that is sad, i can take a look to see if i can optimize it a bit, but i think it's related to input e output bottlneck on the GPU<->Host transfer >> find_best_workgroup is on by default you can make it run for your >> preferred format and then suppress the test by doing >> export LWS=num >> be aware, best LWS changes from format to format . >> >> find_best_kpc is off by default; you activate it by doing >> export KPC=0 and, after the test you can suppress it >> by doing export KPC=num . >> if you don't export a KPC it will use max_keys_per_crypt. >> >> be aware that kpc is highly dependant on the specification of >> your hardware and the format you are using. > > I tried this a little. First thing I noticed is if I set LWS=0 it will > crash. Re-reading what you write above, you actually never said that > should work :) but maybe you could treat LWS=0 as "find best" instead > of crashing. yes, there's no control on the integer you put on LWS or KPC, i will add the find best worksize if you set LWS=0 and i should add some suggestion for good KPC and LWS values (always add power of two). > Anyway, I tried find_best_kpc and it picks very small numbers (like > 69632) and end up a lot slower than just going with the default 2M. I > also tried manually setting 4M and that worked fine and was faster than > 2M. Maybe the find_best could be enhanced somehow. > > Another thing I noticed is that sometimes auto-LWS picks 32 and > sometimes 64, for a particular format. Maybe this is not really a > problem, I just report what I see. yes it's dependant on the format you are using: basically LWS tells how many "threads" of the same .cl will be run in parallel to "fill" the global work size (which is equal to KPC if the code is not "vectorized") so LWS must be an integer and an "integer" factor of global_work_size . if the cl code is fast enough LWS will be low; if the code is vectorized (the same thread will hash more than one password at time) LWS will likely increment until MAX_WORK_SIZE (dependent on the GPU). Samuele
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.