Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 25 Jul 2015 21:05:51 +0800
From: Kai Zhao <loverszhao@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Coding Style

 Hi all,

The attachment is the coding style for john. Most of them are borrowed
from Kernel CodingStyle.

https://www.kernel.org/doc/Documentation/CodingStyle

There are 20 chapters in Kernel CodingStyle, but some of them are
kernel only. I chose the most concerned problems of coding style, and
add some examples of john.

There are some rules that some people like while others don't. I am very
glad to get your advice for each rules whether agree or disagree.Thus, I
can finish this coding style documentation and there are less coding style
problems.


Thanks,

Kai

[ CONTENT OF TYPE text/html SKIPPED ]

This coding style is borrowed from Kernel CodingStyle
(https://www.kernel.org/doc/Documentation/CodingStyle).

	Chapter 1: Indentation

1.1 Tabs are 4 characters for jumbo (8 for core), and thus indentations are
also 4 characters.

1.2 Indent with tabs, align with spaces. E.g.

'->' is tab, '.' is space.

void *memmem(const void *haystack, size_t haystack_len,
.............const void *needle, size_t needle_len)
{
->	haystack_ = (char*)haystack;
->	needle_ = (char*)needle;
->	last = haystack_+(haystack_len - needle_len + 1);
->	for (; haystack_ < last; ++haystack_)
->	{
->	->	if (hash == hay_hash &&
->	->	....*haystack_ == *needle_ &&
->	->	....!memcmp (haystack_, needle_, needle_len))
->	->	->	return haystack_;

->	->	hay_hash += *(haystack_+needle_len);
->	}

->	return NULL;
}

1.3 Ease multiple indentation levels in switch(), for(), while()...

	switch (suffix) {
	case 'G':
	case 'g':
		mem <<= 30;
		break;
	case 'M':
	case 'm':
		mem <<= 20;
		break;
	case 'K':
	case 'k':
		mem <<= 10;
		/* fall through */
	default:
		break;
	}

	for (size = 0; size < PASSWORD_HASH_SIZES; size++)
	if (format->methods.binary_hash[size] &&
	    format->methods.get_hash[size](i) !=
	    format->methods.binary_hash[size](binary)) {
		do_something();
	}

1.4 Don't put multiple statements on a single line. A good example is:

	if (condition)
		do_something();

NOTE: This is the kernel coding style but some of the john codes do't follow
this rule. You can find those codes by:
find -name "*.c" | xargs grep -n "if (.\{1,\}) [^{/]."

1.5 Don't put multiple assignments on a single line either.

Do't do this:

	i = j = pos = -1;

	or

	*pos++ = 0; *ptr = pos;

NOTE: Is my understand right ? You can find those codes by:
find -name "*.c" | xargs grep -n ".\{1,\}=[ _*;0-9a-zA-Z]\{1,\}=.\{1,\}"


	Chapter 2: Breaking long lines and strings

The limit on the length of lines is 80 columns.

However, there are some cases that the lines can exceed 80 columns. Never break
user-visible strings such as printk messages, because that breaks the ability
to grep for them.

E.g.:

	fprintf(stderr, "Error, a c%c found in expression, but the data for this const was not provided\n", pInput[2]);

You can find those codes by:
find -name "*.c" | xargs grep -n ".\{80,\}"


	Chapter 3: Placing Braces and Spaces

3.1 Braces

3.1.1 Function

Put the opening brace at the beginning of the next line, thus:

int function(int x)
{
	body of function
}

You can find some of those codes by:
find -name "*.c" | xargs grep -n "static.\{1,\}{"

3.1.2 Others

Put the opening brace last on the next line, thus:

	if (x is true) {
		we do y
	}

This applies to all non-function statement blocks (if, switch, for,
while, do).  E.g.:

	switch (action) {
	case KOBJ_ADD:
		return "add";
	case KOBJ_REMOVE:
		return "remove";
	case KOBJ_CHANGE:
		return "change";
	default:
		return NULL;
	}

Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a "while" in a do-statement or an "else" in an if-statement, like
this:

	do {
		body of do-loop
	} while (condition);

and

	if (x == y) {
		..
	} else if (x > y) {
		...
	} else {
		....
	}

NOTE: Many codes of john do't follow this rule, there are some codes:

	if (...)
	{
	}
	else
	{
	}

and

	if (...)
	{
	}
	else
	if ()
		do_something();

3.1.3 Do not unnecessarily use braces where a single statement will do.

	if (condition)
		action();

and

	if (condition)
		do_this();
	else
		do_that();

This does not apply if only one branch of a conditional statement is a single
statement; in the latter case use braces in both branches:

	if (condition) {
		do_this();
		do_that();
	} else {
		otherwise();
	}

3.2 Spaces

3.2.1 Use a space after (most) keywords.

Use a space after these keywords:

	if, switch, case, for, do, while

but not with sizeof, typeof, alignof, or __attribute__.  E.g.,

	s = sizeof(struct file);

NOTE: many codes of john do't follow this rule, you can find those codes by:
find -name "*.c" | xargs grep -n "if("
find -name "*.c" | xargs grep -n "switch("
find -name "*.c" | xargs grep -n "for("
find -name "*.c" | xargs grep -n "while("

find -name "*.c" | xargs grep -n "sizeof "

3.2.2 Do not add spaces around (inside) parenthesized expressions.

This example is *bad*:

	s = sizeof( struct file );

You can find those codes by:
find -name "*.c" | xargs grep -n "( "

3.2.3 When declaring pointer, the preferred use of '*' is adjacent to the data
name or function name. E.g.:

	char *linux_banner;
	unsigned long long memparse(char *ptr, char **retptr);
	char *match_strdup(substring_t *s);

You can find those codes by:
find -name "*.c" | xargs grep -n "[0-9a-zA-Z]\*"

3.2.4 When type conversin, add a space before '*'.

E.g:

	hostSalt = (cl_uint *) mem_alloc(SALT_BUFFER_SIZE);

3.2.5 Use one space around (on each side of) most binary and ternary operators,
such as any of these:

	=  +  -  <  >  *  /  %  |  &  ^  <=  >=  ==  !=  ?  :

but no space after unary operators:

	&  *  +  -  ~  !  sizeof  typeof  alignof  __attribute__  defined

no space before the postfix increment & decrement unary operators:

	++  --

no space after the prefix increment & decrement unary operators:

	++  --

and no space around the '.' and "->" structure member operators.

3.2.6 Don't leave whilespace at the end of lines.

Don't do this:

	do_something();. // '.' is a space

You can find those codes by:
find -name "*.c" | xargs grep -n " $"


	Chapter 4: Naming

4.1 Names for global variables and gloabl functions

GLOBAL variables (to be used only if you _really_ need them) need to
have descriptive names, as do global functions.  If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should _not_ call it "cntusr()".

4.2 Names for local variables

LOCAL variable names should be short, and to the point.  If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood.  Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.


	Chapter 5: Typedefs

NOTE: This chapter is totally from Kernel CodingStyle. I think many codes
of john do't follow this rule. So should we follow this rule ?

Please don't use things like "vps_t".
It's a _mistake_ to use typedef for structures and pointers. When you see a

	vps_t a;

in the source, what does it mean?
In contrast, if it says

	struct virtual_container *a;

you can actually tell what "a" is.

Lots of people think that typedefs "help readability". Not so. They are
useful only for:

 (a) totally opaque objects (where the typedef is actively used to _hide_
     what the object is).

     Example: "pte_t" etc. opaque objects that you can only access using
     the proper accessor functions.

     NOTE! Opaqueness and "accessor functions" are not good in themselves.
     The reason we have them for things like pte_t etc. is that there
     really is absolutely _zero_ portably accessible information there.

 (b) Clear integer types, where the abstraction _helps_ avoid confusion
     whether it is "int" or "long".

     u8/u16/u32 are perfectly fine typedefs, although they fit into
     category (d) better than here.

     NOTE! Again - there needs to be a _reason_ for this. If something is
     "unsigned long", then there's no reason to do

	typedef unsigned long myflags_t;

     but if there is a clear reason for why it under certain circumstances
     might be an "unsigned int" and under other configurations might be
     "unsigned long", then by all means go ahead and use a typedef.

 (c) when you use sparse to literally create a _new_ type for
     type-checking.

 (d) New types which are identical to standard C99 types, in certain
     exceptional circumstances.

     Although it would only take a short amount of time for the eyes and
     brain to become accustomed to the standard types like 'uint32_t',
     some people object to their use anyway.

     Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
     signed equivalents which are identical to standard types are
     permitted -- although they are not mandatory in new code of your
     own.

     When editing existing code which already uses one or the other set
     of types, you should conform to the existing choices in that code.

 (e) Types safe for use in userspace.

     In certain structures which are visible to userspace, we cannot
     require C99 types and cannot use the 'u32' form above. Thus, we
     use __u32 and similar types in all structures which are shared
     with userspace.

Maybe there are other cases too, but the rule should basically be to NEVER
EVER use a typedef unless you can clearly match one of those rules.

In general, a pointer, or a struct that has elements that can reasonably
be directly accessed should _never_ be a typedef.


	Chapter 6: Functions

6.1 Do't be too long

Functions should be short and sweet, and do just one thing.

There are long functions in john, such as formats.c::fmt_self_test_body()
which has 504 lines.

There are some cases that functions can be long, e.g.:

Conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.

6.2 Complex functions

If you have a complex function, and you suspect that a less-than-gifted
first-year high-school student might not even understand what the function
is all about, you should adhere to the maximum limits all the more closely.
 Use helper functions with descriptive names (you can ask the compiler to
in-line them if you think it's performance-critical, and it will probably
do a better job of it than you would have done).

6.3 Number of local variables

They shouldn't exceed 5-10, or you're doing something wrong.  Re-think the
function, and split it into smaller pieces.  A human brain can generally
easily keep track of about 7 different things, anything more and it gets
confused.  You know you're brilliant, but maybe you'd like to understand
what you did 2 weeks from now.

6.4 Declartion

In function prototypes, include parameter names with their data types.
Although this is not required by the C language, it is preferred in Linux
because it is a simple way to add valuable information for the reader.

You can find some of those codes by:
find -name "*.h" | xargs grep -n "int);"
find -name "*.h" | xargs grep -n "char \*);"


	Chapter 7: Centralized exiting of functions

NOTE: This chapter is totally from Kernel CodingStyle. I don't have much
idea on this topic. I have noticed that many of the *_fmt_plug.c::valid()
functions follow this rule.

Albeit deprecated by some people, the equivalent of the goto statement is
used frequently by compilers in form of the unconditional jump instruction.

The goto statement comes in handy when a function exits from multiple
locations and some common work such as cleanup has to be done.  If there is no
cleanup needed then just return directly.

Choose label names which say what the goto does or why the goto exists.  An
example of a good name could be "out_buffer:" if the goto frees "buffer".  Avoid
using GW-BASIC names like "err1:" and "err2:".  Also don't name them after the
goto location like "err_kmalloc_failed:"

The rationale for using gotos is:

- unconditional statements are easier to understand and follow
- nesting is reduced
- errors by not updating individual exit points when making
    modifications are prevented
- saves the compiler work to optimize redundant code away ;)

	int fun(int a)
	{
		int result = 0;
		char *buffer;

		buffer = kmalloc(SIZE, GFP_KERNEL);
		if (!buffer)
			return -ENOMEM;

		if (condition1) {
			while (loop1) {
				...
			}
			result = 1;
			goto out_buffer;
		}
		...
	out_buffer:
		kfree(buffer);
		return result;
	}

A common type of bug to be aware of it "one err bugs" which look like this:

	err:
		kfree(foo->bar);
		kfree(foo);
		return ret;

The bug in this code is that on some exit paths "foo" is NULL.  Normally the
fix for this is to split it up into two error labels "err_bar:" and "err_foo:".


	Chapter 8: Commenting

8.1 C89 style and C99 style

C89:

/* ... */

C99:

// ...

John core and Linux style for comments is the C89.
Many codes of John jumbo for comments is both C89 and C99.

8.2 Comment content

Comments are good, but there is also a danger of over-commenting.  NEVER
try to explain HOW your code works in a comment: it's much better to
write the code so that the _working_ is obvious, and it's a waste of
time to explain badly written code.

Generally, you want your comments to tell WHAT your code does, not HOW.
Also, try to avoid putting comments inside a function body: if the
function is so complex that you need to separately comment parts of it,
you should probably go back to chapter 6 for a while.  You can make
small comments to note or warn about something particularly clever (or
ugly), but try to avoid excess.  Instead, put the comments at the head
of the function, telling people what it does, and possibly WHY it does
it.


	Chapter 9: Macros, Enums

9.1 Names of macros defining constants and labels in enums are capitalized.

	#define CONSTANT 0x12345

Enums are preferred when defining several related constants.

CAPITALIZED macro names are appreciated but macros resembling functions
may be named in lower case.

Generally, inline functions are preferable to macros resembling functions.

There are some macros defining constants using lower case. Such as,

#define gost_block_size 32
#define gost_hash_length 32

You can find those codes by:
find -name "*.[ch]" | xargs grep -n "#define [_0-9a-z]\{2,\} [0-9]\{1,\}"

9.2 Macros with multiple statements should be enclosed in a do - while block:

	#define macrofun(a, b, c) 			\
		do {					\
			if (a == 5)			\
				do_this(b, c);		\
		} while (0)

9.3 Things to avoid when using macros:

1) macros that affect control flow:

	#define FOO(x)					\
		do {					\
			if (blah(x) < 0)		\
				return -EBUGGERED;	\
		} while(0)

is a _very_ bad idea.  It looks like a function call but exits the "calling"
function; don't break the internal parsers of those who will read the code.

There are some macros in john have return. Such as:

scrypt_fmt.c:274

#define H0(s) \
	char *cp = strrchr(s,'$')+40; \
	int i = cp-s; \
	return i > 0 ? H((s), i) & 0xF : 0

2) macros that depend on having a local variable with a magic name:

	#define FOO(val) bar(index, val)

might look like a good thing, but it's confusing as hell when one reads the
code and it's prone to breakage from seemingly innocent changes.

3) macros with arguments that are used as l-values: FOO(x) = y; will
bite you if somebody e.g. turns FOO into an inline function.

4) forgetting about precedence: macros defining constants using expressions
must enclose the expression in parentheses. Beware of similar issues with
macros using parameters.

	#define CONSTANT 0x4000
	#define CONSTEXP (CONSTANT | 3)

5) namespace collisions when defining local variables in macros resembling
functions:

#define FOO(x)				\
({					\
	typeof(x) ret;			\
	ret = calc_ret(x);		\
	(ret);				\
})

ret is a common name for a local variable - __foo_ret is less likely
to collide with an existing variable.


	Chapter 10: The inline disease

A reasonable rule of thumb is to not put inline at functions that have more
than 3 lines of code in them. An exception to this rule are the cases where
a parameter is known to be a compiletime constant, and as a result of this
constantness you *know* the compiler will be able to optimize most of your
function away at compile time.

Many inline functions in john are exceed 3 lines. E.g:

unicode.c:184
inline int utf8_to_utf16(UTF16 *target, unsigned int len, const UTF8 *source,
                         unsigned int sourceLen)
{
	...
}

This function has 97 lines. Should we add inline ?


	Chapter 11: Function return values and names

11.1 The function is an action or an imperative command whose return shows
whether succeeds

The function should return an error-code integer. 0 is success and others is
failure.

For example, "add work" is a command, and the add_work() function returns 0
for success or -EBUSY for failure.

11.2 The function is a predicate whose return shows whether succeeds

The function should return a "succeeded" boolean. 0 is failure and others is
success.

"PCI device present" is a predicate, and the pci_dev_present() function returns
1 if it succeeds in finding a matching device or 0 if it doesn't.

But john doesn't follow this rule, such as valid(), cmp_all(), cmp_one(),
cmp_exact() functions, 0 is failure and 1 is success.

I think we should make it clear no matter which we choose.

11.3 Others

Functions whose return value is the actual result of a computation, rather
than an indication of whether the computation succeeded, are not subject to
this rule.  Generally they indicate failure by returning some out-of-range
result.  Typical examples would be functions that return pointers; they use
NULL or the ERR_PTR mechanism to report failure.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ