Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 12 Oct 2015 20:30:33 +0200
From: Denys Vlasenko <>
To: Rich Felker <>
Cc: Denys Vlasenko <>,
Subject: [PATCH 2/3] i386/memset: do not fetch fill char from memory again

 shl $16,%edx
 mov 8(%esp),%dl
 mov 8(%esp),%dh

The above code has two register merge stalls, and it goes to load unit
to fetch the data. I don't know what's worse. Both are not pleasant.

Replace them with IMUL. It has ~3 cycle latency, but no stalls.
Move it a bit up to hide its latency.

   text	   data	    bss	    dec	    hex	filename
    182	      0	      0	    182	     b6	memset1.o
    177	      0	      0	    177	     b1	memset2.o

Signed-off-by: Denys Vlasenko <>
CC: Rich Felker <>
 src/string/i386/memset.s | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/string/i386/memset.s b/src/string/i386/memset.s
index d6118c7..cd13f41 100644
--- a/src/string/i386/memset.s
+++ b/src/string/i386/memset.s
@@ -19,13 +19,10 @@ memset:
 	mov %dx,1(%eax)
 	mov %dx,(-1-2)(%eax,%ecx)
+	imul $0x10001,%edx
 	cmp $6,%ecx
 	jbe 1f
-	shl $16,%edx
-	mov 8(%esp),%dl
-	mov 8(%esp),%dh
 	mov %edx,(1+2)(%eax)
 	mov %edx,(-1-2-4)(%eax,%ecx)
 	cmp $14,%ecx

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.