Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 20 Dec 2012 02:03:03 +0100
From: magnum <john.magnum@...hmail.com>
To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com>
Subject: Varous experimental OpenCL commits

Claudio, Sayantan, all

I have committed a couple patches that are somewhat experimental:


1. A patch that adds an "opencl_process_event()" function in common-opencl.c, that can be called from within an iterated format's split-kernel loop. This makes for swift response to key presses as well as proper session-saving in time, as discussed in another thread. The "pseudo-code patch" now is this simple for any format:

  void crypt_all(int count)
  {
  	enqueue(Transfer);
  	enqueue(RarInitKernel);
  	for (i=0; i<HASH_LOOPS; i++)
  	{
  		enqueue(RarLoopKernel);
+  		clFinish();
+  		opencl_process_event();
  	}
  	enqueue(RarFinalKernel);


After some glitches that was fixed, it seems to work just fine and only introduces a very minor performance drop. All my iterated formats now use this but I did not touch Claudio's nor Sayantan's formats - I leave it up to you to use it or not.


2. A patch that make use of our beloved "spinning wheel" progression indicator also during format self-test. This makes for a way to see when a session goes from self-test to actual cracking. Hopefully after we make the self-tests faster, we can drop this.


3. Modifications to all my OpenCL formats so they actually use the 'count' argument passed to crypt_all() to decrease global worksize when possible. This has several good consequences: It makes Single mode work less bad (min_keys_per_crypt can be set to local worksize) and it speeds up self-test - often a lot! For example, Office 2007 benchmark took 1:45 before this patch, and just 25 seconds now.

This is quite simple: You just need to take local_work_size into account so you end up with a multiple. For scalar formats, I just did this:

  static void crypt_all(int count)
  {
+ 	size_t crypt_gws = ((count + (local_work_size - 1)) /
+ 		local_work_size) * local_work_size;


...then replace all uses of global_work_size within the function to crypt_gws. Simple as that! Don't forget to set self->params.min_keys_per_crypt to local_work_size in init(). Actually, I set it to MAX(local_work_size, 8) because some CPU drivers will use a local_work_size of 1 and we don't want it that low.


All and any comments are welcome.

Enjoy,
magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ