Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 30 Mar 2015 02:29:46 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Generic parsing functions -- prototype

Hi!

I've tried to create some prototype of generic parsing functions. Not 
much is implemented. But it's enough to for 7z format (more or less). It 
looks like this:

----------------------------------------------------------------------

#define HASH_FORMAT             "$7z$ %0-0d $ %1-24d $ %0-16d $ %16h $ 
%0-16d $ %16h $ %d $ %l $ %d $ %*h"

...

static int valid(char *ciphertext, struct fmt_main *self)
{
         return proc_valid(ciphertext, HASH_FORMAT, BIG_ENOUGH);
}

static void *get_salt(char *ciphertext)
{
         static union {
                 struct custom_salt _cs;
                 ARCH_WORD_32 dummy;
         } un;
         struct custom_salt *cs = &(un._cs);

         size_t SaltSize, ivSize, length;
         proc_extract(ciphertext, HASH_FORMAT,
                      &cs->type, &cs->NumCyclesPower,
                      IGNORE_NUM, &SaltSize, cs->salt,
                      IGNORE_NUM, &ivSize, cs->iv,
                      &cs->crc, &cs->unpacksize,
                      &length, cs->data);
         cs->SaltSize = SaltSize;
         cs->ivSize = ivSize;
         cs->length = length;

         return (void *)cs;
}

----------------------------------------------------------------------

After some tuning it should become even shorter. IMHO it's much better 
than current approach of manual parsing.

The attached patch contains new files parsing_plug.c/parsing.h and 
changes to 7z_fmt_plug.c. I've only checked that self-tests are passed.
I don't think it's worth committing yet. But it should be enough to 
start discussion and to take it into account while make gsoc plans more 
precise.

Some notes.

I hope, for each john format, to have one format string describing the 
hash structure so that it's enough to validate a hash and to extract 
info from it a-la scanf (and to create a hash a-la printf if the need 
arises). Probably not for every john format, but for most of them.

It's possible to also expose intermediate functions (to parse a number 
etc.) but I'm not yet sure how useful it is. IMHO the less functions we 
expose the better.

Which elements of format string are implemented:

- spaces are ignored;

- everything special starts with %, everything else is treated as literals;

- %d for unsigned decimal numbers (uint32_t), can have a range for 
accepted values like %1-24d. Returns the result via uint32_t *;

- %h for binary data of variable length, encoded in hex. Max length have 
to be indicated. Returns two(!) things -- actual length via size_t * and 
data via unsigned char *;

- %l for a length of the next field of variable length. Returns nothing.

All length are for decoded data. '*' can be used in place of any number, 
then the number is taken from the arguments a-la printf.

Future elements:

- %% -- literal %;

- %m -- base64/mime-encoded string without padding;

- %M -- base64/mime-encoded string with padding;

- %b -- base64/crypt-encoded string without padding;

- %B -- base64/crypt-encoded string with padding;

- %s -- arbitrary string (like usernames).

There are naturally many questions:

- spaces. Do we have hashes with spaces in them?

- numbers. Should we require to always indicate the range? Do we need 
negative numbers (they are used only in pdf hashes)?

- types. Types are probably not very convenient. The idea was that for 
numbers extracted from a hash a type of fixed size should used. And for 
numbers like sizes size_t should be used. But in the example above this 
leads to 3 intermediate variable which is not very nice;

- hex. Do we need variants for lower- and upper-case?

- fixed-length data. Do we have cases of fixed-length data without a 
separator after it? Or cases when there is no separator and the length 
is extracted from the hash, like this: 
$<length-of-data1>$<length-of-data2>$<data1-in-hex><data2-in-hex>? LDAP 
formats have salt+binary base64-encoded together, they probably should 
splitted by hand;

- variable-length data. Do we need ranges for lengths?

- is scanf-like approach is good at all. It seems to be quite compact 
but types of arguments are not checked and mistakes there are fatal and 
hard to debug. Mismatch between number of specifiers and number of 
arguments (2 for %h and 0 for %l) doesn't help too;

- are chosen letters for specifier good (e.g. %b vs. %m)?

- which other types of field do we need?

Comments?

-- 
Alexander Cherepanov

>From 7750fc980764980771eef8167d6d1bef83cace47 Mon Sep 17 00:00:00 2001
From: Alexander Cherepanov <ch3root@...nwall.com>
Date: Mon, 30 Mar 2015 01:40:57 +0300
Subject: [PATCH] Test generic parsing function on 7z format.

---
 src/7z_fmt_plug.c  |  109 +++-----------------
 src/parsing.h      |    4 +
 src/parsing_plug.c |  290 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 309 insertions(+), 94 deletions(-)
 create mode 100644 src/parsing.h
 create mode 100644 src/parsing_plug.c

diff --git a/src/7z_fmt_plug.c b/src/7z_fmt_plug.c
index 2d9cb60..59f8246 100644
--- a/src/7z_fmt_plug.c
+++ b/src/7z_fmt_plug.c
@@ -33,6 +33,7 @@ john_register_one(&fmt_sevenzip);
 #include "crc32.h"
 #include "unicode.h"
 #include "memdbg.h"
+#include "parsing.h"
 
 #define FORMAT_LABEL		"7z"
 #define FORMAT_NAME		"7-Zip"
@@ -50,6 +51,8 @@ john_register_one(&fmt_sevenzip);
 #define MAX_KEYS_PER_CRYPT	1
 #define OMP_SCALE               1 // tuned on core i7
 
+#define HASH_FORMAT             "$7z$ %0-0d $ %1-24d $ %0-16d $ %16h $ %0-16d $ %16h $ %d $ %l $ %d $ %*h"
+
 #define BIG_ENOUGH 		(8192 * 32)
 
 static struct fmt_tests sevenzip_tests[] = {
@@ -105,110 +108,28 @@ static void done(void)
 
 static int valid(char *ciphertext, struct fmt_main *self)
 {
-	char *ctcopy, *keeptr, *p;
-	int len, NumCyclesPower;
-
-	if (strncmp(ciphertext, FORMAT_TAG, TAG_LENGTH) != 0)
-		return 0;
-
-	ctcopy = strdup(ciphertext);
-	keeptr = ctcopy;
-	ctcopy += TAG_LENGTH;
-	if ((p = strtokm(ctcopy, "$")) == NULL)
-		goto err;
-	if (strlen(p) != 1 || '0' != *p)     /* p must be "0" */
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* NumCyclesPower */
-		goto err;
-	if (strlen(p) > 2)
-		goto err;
-	NumCyclesPower = atoi(p);
-	if (NumCyclesPower > 24 || NumCyclesPower < 1)
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* salt length */
-		goto err;
-	len = atoi(p);
-	if(len > 16 || len < 0) /* salt length */
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* salt */
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* iv length */
-		goto err;
-	if (strlen(p) > 2)
-		goto err;
-	len = atoi(p);
-	if(len < 0 || len > 16) /* iv length */
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* iv */
-		goto err;
-	if (!ishex(p))
-		goto err;
-	if (strlen(p) > len*2 && strcmp(p+len*2, "0000000000000000"))
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* crc */
-		goto err;
-	if (!isdecu(p))
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* data length */
-		goto err;
-	len = atoi(p);
-	if ((p = strtokm(NULL, "$")) == NULL) /* unpacksize */
-		goto err;
-	if (!isdec(p))	/* no way to validate, other than atoi() works for it */
-		goto err;
-	if ((p = strtokm(NULL, "$")) == NULL) /* data */
-		goto err;
-	if (strlen(p) != len * 2)	/* validates data_len atoi() */
-		goto err;
-
-	MEM_FREE(keeptr);
-	return 1;
-
-err:
-	MEM_FREE(keeptr);
-	return 0;
+        return proc_valid(ciphertext, HASH_FORMAT, BIG_ENOUGH);
 }
 
 static void *get_salt(char *ciphertext)
 {
-	char *ctcopy = strdup(ciphertext);
-	char *keeptr = ctcopy;
-	int i;
-	char *p;
-
 	static union {
 		struct custom_salt _cs;
 		ARCH_WORD_32 dummy;
 	} un;
 	struct custom_salt *cs = &(un._cs);
 
-	memset(cs, 0, SALT_SIZE);
-
-	ctcopy += 4;
-	p = strtokm(ctcopy, "$");
-	cs->type = atoi(p);
-	p = strtokm(NULL, "$");
-	cs->NumCyclesPower = atoi(p);
-	p = strtokm(NULL, "$");
-	cs->SaltSize = atoi(p);
-	p = strtokm(NULL, "$"); /* salt */
-	p = strtokm(NULL, "$");
-	cs->ivSize = atoi(p);
-	p = strtokm(NULL, "$"); /* iv */
-	for (i = 0; i < cs->ivSize; i++)
-		cs->iv[i] = atoi16[ARCH_INDEX(p[i * 2])] * 16
-			+ atoi16[ARCH_INDEX(p[i * 2 + 1])];
-	p = strtokm(NULL, "$"); /* crc */
-	cs->crc = atou(p); /* unsigned function */
-	p = strtokm(NULL, "$");
-	cs->length = atoi(p);
-	p = strtokm(NULL, "$");
-	cs->unpacksize = atoi(p);
-	p = strtokm(NULL, "$"); /* crc */
-	for (i = 0; i < cs->length; i++)
-		cs->data[i] = atoi16[ARCH_INDEX(p[i * 2])] * 16
-			+ atoi16[ARCH_INDEX(p[i * 2 + 1])];
-	MEM_FREE(keeptr);
+        size_t SaltSize, ivSize, length;
+        proc_extract(ciphertext, HASH_FORMAT,
+                     &cs->type, &cs->NumCyclesPower,
+                     IGNORE_NUM, &SaltSize, cs->salt,
+                     IGNORE_NUM, &ivSize, cs->iv,
+                     &cs->crc, &cs->unpacksize,
+                     &length, cs->data);
+        cs->SaltSize = SaltSize;
+        cs->ivSize = ivSize;
+        cs->length = length;
+
 	return (void *)cs;
 }
 
diff --git a/src/parsing.h b/src/parsing.h
new file mode 100644
index 0000000..c326a56
--- /dev/null
+++ b/src/parsing.h
@@ -0,0 +1,4 @@
+int proc_valid(const char *s, const char *format, ...);
+void proc_extract(const char *s, const char *format, ...);
+
+#define IGNORE_NUM ((uint32_t *)0)
diff --git a/src/parsing_plug.c b/src/parsing_plug.c
new file mode 100644
index 0000000..b41f79e
--- /dev/null
+++ b/src/parsing_plug.c
@@ -0,0 +1,290 @@
+/* for size_t */
+#include <stddef.h>
+/* for uint32_t */
+#include <stdint.h>
+/* for isdigit etc. */
+#include <ctype.h>
+/* for va_list etc. */
+#include <stdarg.h>
+/* for strlen */
+#include <string.h>
+
+#include "parsing.h"
+
+#pragma GCC diagnostic ignored "-Wdeclaration-after-statement"
+
+static int initialized = 0;
+static unsigned char dec[256], hex[256];
+
+static void init()
+{
+  int i;
+
+  for (i = 0; i < 10; i++) {
+    dec['0' + i] = i;
+    hex['0' + i] = i;
+  }
+  for (i = 10; i < 16; i++) {
+    hex['A' + (i - 10)] = i;
+    hex['a' + (i - 10)] = i;
+  }  
+}
+
+/**********************************************************************/
+
+/*
+  decimal number, positive, without leading zeroes (except for zero itself) 
+  (uint32_t only)
+  empty string is not allowed
+  gobbles all digits
+*/
+static const unsigned char *num_valid(const unsigned char *s, uint32_t min, uint32_t max, uint32_t *number)
+{
+  if (!s)
+    return 0;
+
+  /* convert */
+  /* leading zeroes */
+  const unsigned char *s0 = s;  /* start of the string */
+  while (*s == '0')
+    s++;
+  /* further digits */
+  const unsigned char *s1 = s;  /* after leading zeroes */
+  uint32_t n = 0;
+  while (isdigit(*s))
+    n = n * 10 + dec[*s++];
+
+  /* check */
+  /* only two good cases: one 0 and no other digits or no 0 and there are other digits*/
+  if (!((s1 == s0 + 1 && s == s1) || (s1 == s0 && s > s1)))
+    return 0;
+  if (s - s1 > 10)              /* too many digits for uint32_t */
+    return 0;
+  if (s - s1 == 10 && (*s1 > '4' || n < 1000000000u)) /* overflow in exactly 10 digits */
+    return 0;
+  if (n < min || n > max)
+    return 0;
+
+  /* results */
+  if (number)
+    *number = n;
+  return s;
+}
+
+static const unsigned char *num_extract(const unsigned char *s, uint32_t *number)
+{
+  /* convert */
+  uint32_t n = 0;
+  while (isdigit(*s))
+    n = n * 10 + dec[*s++];
+
+  /* results */
+  if (number)
+    *number = n;
+  return s;
+}
+
+/**********************************************************************/
+
+/*
+  binary data of variable length in hex
+  empty string is allowed
+  gobbles all hex digits
+  stores length and data
+
+  size should be <= SIZE_MAX / 2
+*/
+static const unsigned char *hex_var_valid(const unsigned char *s, size_t size, size_t *length)
+{
+  if (!s)
+    return 0;
+
+  /* convert */
+  const unsigned char *s0 = s;
+  while (isxdigit(*s))
+    s++;
+
+  /* check */
+  size_t len = s - s0;           /* casted from ptrdiff_t to size_t */
+  if (len % 2 != 0 || len > size * 2)
+    return 0;
+
+  /* results */
+  if (length)
+    *length = len / 2;
+  return s;
+}
+
+static const unsigned char *hex_var_extract(const unsigned char *s, size_t *length, unsigned char *buffer)
+{
+  /* convert */
+  unsigned char *p = buffer;
+  while (isxdigit(s[0])) {
+    *p++ = hex[s[0]] * 16 + hex[s[1]];
+    s += 2;
+  }
+
+  /* results */
+  *length = p - buffer;         /* casted from ptrdiff_t to size_t */
+  return s;
+}
+
+/**********************************************************************/
+
+/*
+  generic parsing function -- validation
+*/
+int proc_valid(const char *s0, const char *format0, ...)
+{
+  if (!initialized)
+    init();
+
+  va_list ap;
+  const unsigned char *s = (const unsigned char *)s0;
+  const unsigned char *format = (const unsigned char *)format0;
+
+  va_start(ap, format0);
+
+  /* for %l */
+  int have_length = 0;
+  uint32_t length = 0;          /* silence -Wmaybe-uninitialized */
+
+  const unsigned char *p = format;
+  while (*p && s) {
+    if (*p == ' ') {
+      /* ignore spaces */
+      p++;
+    } else if (*p != '%') {
+      /* fixed */
+      /* printf(" %c", *p); */
+      if (*s == *p)
+        s++;
+      else
+        s = 0;
+      p++;
+    } else {
+      p++;                      /* skip % */
+
+      /* numbers like in %40h or %1-64d */
+      size_t size1, size2 = UINT32_MAX;
+      if (*p == '*') {
+        size1 = va_arg(ap, int);
+        p++;
+      } else {
+        size1 = 0;
+        while (isdigit(*p))
+          size1 = size1 * 10 + dec[*p++];
+      }
+      if (*p == '-') {          /* range */
+        p++;
+        if (*p == '*') {
+          size2 = va_arg(ap, int);
+          p++;
+        } else {
+          size2 = 0;
+          while (isdigit(*p))
+            size2 = size2 * 10 + dec[*p++];
+        }
+      }
+
+      /* specifiers */
+      switch (*p) {
+      case 'd':                 /* number */
+        /* printf(" %%(%zu-%zu)d", size1, size2); */
+        s = num_valid(s, size1, size2, 0);
+        break;
+      case 'l':                 /* length of the next data field of variable length*/
+        /* printf(" %%l"); */
+        have_length = 1;
+        s = num_valid(s, 0, (uint32_t)-1, &length);
+        break;
+      case 'h':                 /* hex data */
+        /* printf(" %%(%zu)h", size1); */
+        {
+          size_t len = 0;
+          s = hex_var_valid(s, size1, &len);
+          if (have_length &&  len != length)
+            s = 0;
+          else
+            have_length = 0;
+        }
+        break;
+      }
+      p++;
+    }
+  }
+  /* puts(""); */
+
+  va_end(ap);
+  return *p == 0 && s && *s == 0;
+}
+
+/*
+  generic parsing function -- extraction
+*/
+void proc_extract(const char *s0, const char *format0, ...)
+{
+  if (!initialized)
+    init();
+
+  va_list ap;
+  const unsigned char *s = (const unsigned char *)s0;
+  const unsigned char *format = (const unsigned char *)format0;
+
+  va_start(ap, format0);
+
+  const unsigned char *p = format;
+  while (*p) {
+    if (*p == ' ') {
+      /* ignore spaces */
+      p++;
+    } else if (*p != '%') {
+      /* fixed */
+      /* printf(" %c", *p); */
+      s++;
+      p++;
+    } else {
+      p++;                      /* skip % */
+
+      if (*p == '*') {
+        p++;
+      } else {
+        while (isdigit(*p))
+          p++;
+      }
+      if (*p == '-') {          /* range */
+        p++;
+        if (*p == '*') {
+          p++;
+        } else {
+          while (isdigit(*p))
+            p++;
+        }
+      }
+
+      /* specifiers */
+      switch (*p) {
+      case 'd':                 /* number */
+        /* printf(" %%(*-*)d"); */
+        s = num_extract(s, va_arg(ap, uint32_t *));
+        break;
+      case 'l':                 /* length of the next data field of variable length*/
+        /* printf(" %%l"); */
+        s = num_extract(s, 0);
+        break;
+      case 'h':                 /* hex data */
+        /* printf(" %%(*)h"); */
+        {
+          size_t *t1 = va_arg(ap, size_t *);
+          unsigned char *t2 = va_arg(ap, unsigned char *);
+          s = hex_var_extract(s, t1, t2);
+        }
+        break;
+      }
+      p++;
+    }
+  }
+  /* puts(""); */
+
+  va_end(ap);
+}
-- 
1.7.10.4


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ