
Date: Fri, 9 Jul 2021 13:26:28 +0200 (CEST) From: jason <jason@...omnia247.nl> To: musl@...ts.openwall.com Cc: jason@...omnia247.nl Subject: Improvements to qsort I've been trying to understand ngmalloc, and it's a struggle, so for relaxation I had a look at qsort, and have a few suggestions. To avoid burying the lead, the following reduces comparisons by about 4% with no downside:  a/qsort.c +++ b/qsort.c @@ 100,18 +100,16 @@ static void sift(unsigned char *head, size_t width, cmpfun cmp, int pshift, size rt = head  width; lf = head  width  lp[pshift  2];  if((*cmp)(ar[0], lf) >= 0 && (*cmp)(ar[0], rt) >= 0) {  break;  } if((*cmp)(lf, rt) >= 0) {  ar[i++] = lf; head = lf; pshift = 1; } else {  ar[i++] = rt; head = rt; pshift = 2; } + if((*cmp)(ar[0], head) >= 0) + break; + ar[i++] = head; } cycle(width, ar, i); } Basically, instead of doing two comparisons to see if ar[0] is greater than both the left and right children and then comparing the children with each other to find the greater one, find the greater child and compare ar[0] with that one. I have several other lesssignificant changes that I wonder if musl is interested in: * I notice than pntz() is always used in the following context: trail = pntz(p); shr(p, trail); pshift += trail; both pntz() and shr() have branches conditional on whether the shift is more or less than 8*sizeof(size_t) bits. I have replaced this by a single function, which allows the conditional branch to be shared: /* * Renormalize p[]. Ignore the loworder bit (which must be set!), * find the position of the nextleastsignificant bit, and shift p[] * down by that much. Return the number of bits shifted. * * There must be a penultimate trailing bit; i.e. p[] must not be {1, 0}. */ static inline int normalize(size_t p[2]) { int s; size_t t = p[0]  1; if (t == 0) { t = p[1]; s = ntz(t); p[0] = t >> s; p[1] = 0; s += 8*sizeof(size_t); } else { s = ntz(t); p[0] = t >> s  p[1] << (8*sizeof(size_t)  s); p[1] >>= s; } return s; } which is used simply as: pshift += normalize(p); * As part of my learning about the code, I have added a lot of comments, particularly large block comments about the smoothsort algorithm in general, the meaning of p[] and pindex, what a "stepson" is, when it's okay to use sift() instead of trinkle(), what the "trusty" flag means, how cycle() is used, etc. Would these be worth cleaning up and submitting? * In my private copy, I have changed the type of "trusty" to bool, a data type I'm quite fond of as it concisely documents the limited range of values. I notice, however, that musl is completely devoid of any use of <stdbool.h>, although it uses other C99 features. Is this a conscious decision? * Likewise, I notice that most musl code has a space in "keyword (condition)" like "if (foo != 0)", "while (bar < N)" and "for (i=0; i < N; i++)". But the qsort() code, with one single exception, omits the space. Likewise, most msul code is quite happy to "if (condition) break;" on a single line, with no braces, but the original sift() function above puts the break on a second line and a closing brace on a third. The musl code base is inconsistent enough that it's hard to infer code formatting conventions. Are these discrepancies in qsort a problem worth fixing? * The way the heapbuilding loop maintains its loop invariants is a trifle peculiar. It begins with the element at *head included in p[] and pindex, but excluded from the heap ordering invariants. Then it establishes the ordering invariants (with sift() or trinkle()) and immediately adds another (unordered) element to p[] and pindex. After the main loop exits, a standalone call to trinkle() places the final element in heap order. I have a more conventional (add, then sift) ordering working which seems to simplify the code: while(head < high) { + bool notrinkle = false; + + /* Append an element to the heap */ + head += width; if((p[0] & 3) == 3) {  sift(head, width, cmp, pshift, lp); + /* Two trees of adjacent sizes: merge with head */ shr(p, 2); pshift += 2;  } else {  if(lp[pshift  1] >= high  head) {  trinkle(head, width, cmp, p, pshift, 0, lp);  } else { + if((size_t)(high  head) > lp[pshift  1]) { sift(head, width, cmp, pshift, lp); + notrinkle = true; }   if(pshift == 1) {  shl(p, 1);  pshift = 0;  } else {  shl(p, pshift  1);  pshift = 1;  } + } else if(pshift == 1) { + /* order1 exists; new tree is order0 */ + shl(p, 1); + pshift = 0; + notrinkle = (high != head); + } else { + /* new tree is order1 */ + shl(p, pshift  1); + pshift = 1; + notrinkle = ((size_t)(high  head) > lp[0]); }  p[0] = 1;  head += width; + + /* Reestablish the heap using trinkle */ + if(!notrinkle) + trinkle(head, width, cmp, p, pshift, false, lp); }  trinkle(head, width, cmp, p, pshift, 0, lp); If the original ordering is wanted, then the loop initialization can be changed to start with two elements (p[0] = 3, pindex = 0, head = base + width), since two elements automatically fulfil the required loop invariants. Does anyone have a preference? * Since the first test in qsort() excludes width == 0, the loop over width in cycle() can be changed to "do { ... } while (width);". * The ar[] array in sift() only has to be as large as the lp[] array in qsort(), not 1/6 larger. * The ar[] array in trinkle() only has to be half as large, plus one. * The pindex > 1 case in the heapconsuming loop contains some redundant leftthenright shifting: shl(p, 2); pshift = 2; p[0] ^= 7; shr(p, 1); This can be optimized. * There are several cases where variables can be moved into smaller local scopes, particularly the "stepson", "rt" and "lf" variables in trinkle(). This is another change I've made privately as I think it makes the code more reasonable. Worth submitting? * GCC emits a couple of signed/unsigned comparison warnings with W Wall. Worth fixing?
Powered by blists  more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.