This coding style is borrowed from Kernel CodingStyle (https://www.kernel.org/doc/Documentation/CodingStyle). Chapter 1: Indentation 1.1 Tabs are 4 characters for jumbo (8 for core), and thus indentations are also 4 characters. 1.2 Indent with tabs, align with spaces. E.g. '->' is tab, '.' is space. void *memmem(const void *haystack, size_t haystack_len, .............const void *needle, size_t needle_len) { -> haystack_ = (char*)haystack; -> needle_ = (char*)needle; -> last = haystack_+(haystack_len - needle_len + 1); -> for (; haystack_ < last; ++haystack_) -> { -> -> if (hash == hay_hash && -> -> ....*haystack_ == *needle_ && -> -> ....!memcmp (haystack_, needle_, needle_len)) -> -> -> return haystack_; -> -> hay_hash += *(haystack_+needle_len); -> } -> return NULL; } 1.3 Ease multiple indentation levels in switch(), for(), while()... switch (suffix) { case 'G': case 'g': mem <<= 30; break; case 'M': case 'm': mem <<= 20; break; case 'K': case 'k': mem <<= 10; /* fall through */ default: break; } for (size = 0; size < PASSWORD_HASH_SIZES; size++) if (format->methods.binary_hash[size] && format->methods.get_hash[size](i) != format->methods.binary_hash[size](binary)) { do_something(); } 1.4 Don't put multiple statements on a single line. A good example is: if (condition) do_something(); NOTE: This is the kernel coding style but some of the john codes do't follow this rule. You can find those codes by: find -name "*.c" | xargs grep -n "if (.\{1,\}) [^{/]." 1.5 Don't put multiple assignments on a single line either. Do't do this: i = j = pos = -1; or *pos++ = 0; *ptr = pos; NOTE: Is my understand right ? You can find those codes by: find -name "*.c" | xargs grep -n ".\{1,\}=[ _*;0-9a-zA-Z]\{1,\}=.\{1,\}" Chapter 2: Breaking long lines and strings The limit on the length of lines is 80 columns. However, there are some cases that the lines can exceed 80 columns. Never break user-visible strings such as printk messages, because that breaks the ability to grep for them. E.g.: fprintf(stderr, "Error, a c%c found in expression, but the data for this const was not provided\n", pInput[2]); You can find those codes by: find -name "*.c" | xargs grep -n ".\{80,\}" Chapter 3: Placing Braces and Spaces 3.1 Braces 3.1.1 Function Put the opening brace at the beginning of the next line, thus: int function(int x) { body of function } You can find some of those codes by: find -name "*.c" | xargs grep -n "static.\{1,\}{" 3.1.2 Others Put the opening brace last on the next line, thus: if (x is true) { we do y } This applies to all non-function statement blocks (if, switch, for, while, do). E.g.: switch (action) { case KOBJ_ADD: return "add"; case KOBJ_REMOVE: return "remove"; case KOBJ_CHANGE: return "change"; default: return NULL; } Note that the closing brace is empty on a line of its own, _except_ in the cases where it is followed by a continuation of the same statement, ie a "while" in a do-statement or an "else" in an if-statement, like this: do { body of do-loop } while (condition); and if (x == y) { .. } else if (x > y) { ... } else { .... } NOTE: Many codes of john do't follow this rule, there are some codes: if (...) { } else { } and if (...) { } else if () do_something(); 3.1.3 Do not unnecessarily use braces where a single statement will do. if (condition) action(); and if (condition) do_this(); else do_that(); This does not apply if only one branch of a conditional statement is a single statement; in the latter case use braces in both branches: if (condition) { do_this(); do_that(); } else { otherwise(); } 3.2 Spaces 3.2.1 Use a space after (most) keywords. Use a space after these keywords: if, switch, case, for, do, while but not with sizeof, typeof, alignof, or __attribute__. E.g., s = sizeof(struct file); NOTE: many codes of john do't follow this rule, you can find those codes by: find -name "*.c" | xargs grep -n "if(" find -name "*.c" | xargs grep -n "switch(" find -name "*.c" | xargs grep -n "for(" find -name "*.c" | xargs grep -n "while(" find -name "*.c" | xargs grep -n "sizeof " 3.2.2 Do not add spaces around (inside) parenthesized expressions. This example is *bad*: s = sizeof( struct file ); You can find those codes by: find -name "*.c" | xargs grep -n "( " 3.2.3 When declaring pointer, the preferred use of '*' is adjacent to the data name or function name. E.g.: char *linux_banner; unsigned long long memparse(char *ptr, char **retptr); char *match_strdup(substring_t *s); You can find those codes by: find -name "*.c" | xargs grep -n "[0-9a-zA-Z]\*" 3.2.4 When type conversin, add a space before '*'. E.g: hostSalt = (cl_uint *) mem_alloc(SALT_BUFFER_SIZE); 3.2.5 Use one space around (on each side of) most binary and ternary operators, such as any of these: = + - < > * / % | & ^ <= >= == != ? : but no space after unary operators: & * + - ~ ! sizeof typeof alignof __attribute__ defined no space before the postfix increment & decrement unary operators: ++ -- no space after the prefix increment & decrement unary operators: ++ -- and no space around the '.' and "->" structure member operators. 3.2.6 Don't leave whilespace at the end of lines. Don't do this: do_something();. // '.' is a space You can find those codes by: find -name "*.c" | xargs grep -n " $" Chapter 4: Naming 4.1 Names for global variables and gloabl functions GLOBAL variables (to be used only if you _really_ need them) need to have descriptive names, as do global functions. If you have a function that counts the number of active users, you should call that "count_active_users()" or similar, you should _not_ call it "cntusr()". 4.2 Names for local variables LOCAL variable names should be short, and to the point. If you have some random integer loop counter, it should probably be called "i". Calling it "loop_counter" is non-productive, if there is no chance of it being mis-understood. Similarly, "tmp" can be just about any type of variable that is used to hold a temporary value. Chapter 5: Typedefs NOTE: This chapter is totally from Kernel CodingStyle. I think many codes of john do't follow this rule. So should we follow this rule ? Please don't use things like "vps_t". It's a _mistake_ to use typedef for structures and pointers. When you see a vps_t a; in the source, what does it mean? In contrast, if it says struct virtual_container *a; you can actually tell what "a" is. Lots of people think that typedefs "help readability". Not so. They are useful only for: (a) totally opaque objects (where the typedef is actively used to _hide_ what the object is). Example: "pte_t" etc. opaque objects that you can only access using the proper accessor functions. NOTE! Opaqueness and "accessor functions" are not good in themselves. The reason we have them for things like pte_t etc. is that there really is absolutely _zero_ portably accessible information there. (b) Clear integer types, where the abstraction _helps_ avoid confusion whether it is "int" or "long". u8/u16/u32 are perfectly fine typedefs, although they fit into category (d) better than here. NOTE! Again - there needs to be a _reason_ for this. If something is "unsigned long", then there's no reason to do typedef unsigned long myflags_t; but if there is a clear reason for why it under certain circumstances might be an "unsigned int" and under other configurations might be "unsigned long", then by all means go ahead and use a typedef. (c) when you use sparse to literally create a _new_ type for type-checking. (d) New types which are identical to standard C99 types, in certain exceptional circumstances. Although it would only take a short amount of time for the eyes and brain to become accustomed to the standard types like 'uint32_t', some people object to their use anyway. Therefore, the Linux-specific 'u8/u16/u32/u64' types and their signed equivalents which are identical to standard types are permitted -- although they are not mandatory in new code of your own. When editing existing code which already uses one or the other set of types, you should conform to the existing choices in that code. (e) Types safe for use in userspace. In certain structures which are visible to userspace, we cannot require C99 types and cannot use the 'u32' form above. Thus, we use __u32 and similar types in all structures which are shared with userspace. Maybe there are other cases too, but the rule should basically be to NEVER EVER use a typedef unless you can clearly match one of those rules. In general, a pointer, or a struct that has elements that can reasonably be directly accessed should _never_ be a typedef. Chapter 6: Functions 6.1 Do't be too long Functions should be short and sweet, and do just one thing. There are long functions in john, such as formats.c::fmt_self_test_body() which has 504 lines. There are some cases that functions can be long, e.g.: Conceptually simple function that is just one long (but simple) case-statement, where you have to do lots of small things for a lot of different cases, it's OK to have a longer function. 6.2 Complex functions If you have a complex function, and you suspect that a less-than-gifted first-year high-school student might not even understand what the function is all about, you should adhere to the maximum limits all the more closely. Use helper functions with descriptive names (you can ask the compiler to in-line them if you think it's performance-critical, and it will probably do a better job of it than you would have done). 6.3 Number of local variables They shouldn't exceed 5-10, or you're doing something wrong. Re-think the function, and split it into smaller pieces. A human brain can generally easily keep track of about 7 different things, anything more and it gets confused. You know you're brilliant, but maybe you'd like to understand what you did 2 weeks from now. 6.4 Declartion In function prototypes, include parameter names with their data types. Although this is not required by the C language, it is preferred in Linux because it is a simple way to add valuable information for the reader. You can find some of those codes by: find -name "*.h" | xargs grep -n "int);" find -name "*.h" | xargs grep -n "char \*);" Chapter 7: Centralized exiting of functions NOTE: This chapter is totally from Kernel CodingStyle. I don't have much idea on this topic. I have noticed that many of the *_fmt_plug.c::valid() functions follow this rule. Albeit deprecated by some people, the equivalent of the goto statement is used frequently by compilers in form of the unconditional jump instruction. The goto statement comes in handy when a function exits from multiple locations and some common work such as cleanup has to be done. If there is no cleanup needed then just return directly. Choose label names which say what the goto does or why the goto exists. An example of a good name could be "out_buffer:" if the goto frees "buffer". Avoid using GW-BASIC names like "err1:" and "err2:". Also don't name them after the goto location like "err_kmalloc_failed:" The rationale for using gotos is: - unconditional statements are easier to understand and follow - nesting is reduced - errors by not updating individual exit points when making modifications are prevented - saves the compiler work to optimize redundant code away ;) int fun(int a) { int result = 0; char *buffer; buffer = kmalloc(SIZE, GFP_KERNEL); if (!buffer) return -ENOMEM; if (condition1) { while (loop1) { ... } result = 1; goto out_buffer; } ... out_buffer: kfree(buffer); return result; } A common type of bug to be aware of it "one err bugs" which look like this: err: kfree(foo->bar); kfree(foo); return ret; The bug in this code is that on some exit paths "foo" is NULL. Normally the fix for this is to split it up into two error labels "err_bar:" and "err_foo:". Chapter 8: Commenting 8.1 C89 style and C99 style C89: /* ... */ C99: // ... John core and Linux style for comments is the C89. Many codes of John jumbo for comments is both C89 and C99. 8.2 Comment content Comments are good, but there is also a danger of over-commenting. NEVER try to explain HOW your code works in a comment: it's much better to write the code so that the _working_ is obvious, and it's a waste of time to explain badly written code. Generally, you want your comments to tell WHAT your code does, not HOW. Also, try to avoid putting comments inside a function body: if the function is so complex that you need to separately comment parts of it, you should probably go back to chapter 6 for a while. You can make small comments to note or warn about something particularly clever (or ugly), but try to avoid excess. Instead, put the comments at the head of the function, telling people what it does, and possibly WHY it does it. Chapter 9: Macros, Enums 9.1 Names of macros defining constants and labels in enums are capitalized. #define CONSTANT 0x12345 Enums are preferred when defining several related constants. CAPITALIZED macro names are appreciated but macros resembling functions may be named in lower case. Generally, inline functions are preferable to macros resembling functions. There are some macros defining constants using lower case. Such as, #define gost_block_size 32 #define gost_hash_length 32 You can find those codes by: find -name "*.[ch]" | xargs grep -n "#define [_0-9a-z]\{2,\} [0-9]\{1,\}" 9.2 Macros with multiple statements should be enclosed in a do - while block: #define macrofun(a, b, c) \ do { \ if (a == 5) \ do_this(b, c); \ } while (0) 9.3 Things to avoid when using macros: 1) macros that affect control flow: #define FOO(x) \ do { \ if (blah(x) < 0) \ return -EBUGGERED; \ } while(0) is a _very_ bad idea. It looks like a function call but exits the "calling" function; don't break the internal parsers of those who will read the code. There are some macros in john have return. Such as: scrypt_fmt.c:274 #define H0(s) \ char *cp = strrchr(s,'$')+40; \ int i = cp-s; \ return i > 0 ? H((s), i) & 0xF : 0 2) macros that depend on having a local variable with a magic name: #define FOO(val) bar(index, val) might look like a good thing, but it's confusing as hell when one reads the code and it's prone to breakage from seemingly innocent changes. 3) macros with arguments that are used as l-values: FOO(x) = y; will bite you if somebody e.g. turns FOO into an inline function. 4) forgetting about precedence: macros defining constants using expressions must enclose the expression in parentheses. Beware of similar issues with macros using parameters. #define CONSTANT 0x4000 #define CONSTEXP (CONSTANT | 3) 5) namespace collisions when defining local variables in macros resembling functions: #define FOO(x) \ ({ \ typeof(x) ret; \ ret = calc_ret(x); \ (ret); \ }) ret is a common name for a local variable - __foo_ret is less likely to collide with an existing variable. Chapter 10: The inline disease A reasonable rule of thumb is to not put inline at functions that have more than 3 lines of code in them. An exception to this rule are the cases where a parameter is known to be a compiletime constant, and as a result of this constantness you *know* the compiler will be able to optimize most of your function away at compile time. Many inline functions in john are exceed 3 lines. E.g: unicode.c:184 inline int utf8_to_utf16(UTF16 *target, unsigned int len, const UTF8 *source, unsigned int sourceLen) { ... } This function has 97 lines. Should we add inline ? Chapter 11: Function return values and names 11.1 The function is an action or an imperative command whose return shows whether succeeds The function should return an error-code integer. 0 is success and others is failure. For example, "add work" is a command, and the add_work() function returns 0 for success or -EBUSY for failure. 11.2 The function is a predicate whose return shows whether succeeds The function should return a "succeeded" boolean. 0 is failure and others is success. "PCI device present" is a predicate, and the pci_dev_present() function returns 1 if it succeeds in finding a matching device or 0 if it doesn't. But john doesn't follow this rule, such as valid(), cmp_all(), cmp_one(), cmp_exact() functions, 0 is failure and 1 is success. I think we should make it clear no matter which we choose. 11.3 Others Functions whose return value is the actual result of a computation, rather than an indication of whether the computation succeeded, are not subject to this rule. Generally they indicate failure by returning some out-of-range result. Typical examples would be functions that return pointers; they use NULL or the ERR_PTR mechanism to report failure.