Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 20 Apr 2023 16:17:10 +0800 (GMT+08:00)
From: 张飞 <zhangfei@...iscas.ac.cn>
To: "Szabolcs Nagy" <nsz@...t70.net>
Cc: musl@...ts.openwall.com
Subject: Re: Re: Re: memset_riscv64

Hi!
I listened to your suggestions and referred to string.c in Musl's test set(libc-bench), 
and then modified the test cases. Since BUFLEN is a fixed value in strlen.c, I modified 
it to a variable as a parameter in my own test case and passed it to the memset function. 
I adjusted the LOOP_TIMES has been counted up to 500 times and the running time has been 
sorted, only recording the running time of the middle 300 times.

I took turns executing two programs on the SiFive chip three times each, and the results 
are shown below.
                             First run result
--------------------------------------------------------------------------------
length(byte)  C language implementation(s)   Basic instruction implementation(s)
--------------------------------------------------------------------------------
100                 0.002208102                     0.002304056
200                 0.005053208                     0.004629598
400                 0.008666684                     0.007739176
800                 0.014065196                     0.012372702
1600                0.023377685                     0.020090966
3200                0.040221849                     0.034059631
6400                0.072095377                     0.060028906
12800               0.134040475                     0.110039387
25600               0.257426806                     0.210710952
51200               1.173755160                     1.121833227
102400              3.693170402                     3.637194098
204800              8.919975455                     8.865504460
409600             19.410922418                    19.360956493
--------------------------------------------------------------------------------

                             Second run result 
--------------------------------------------------------------------------------
length(byte)  C language implementation(s)   Basic instruction implementation(s)
--------------------------------------------------------------------------------
100                 0.002208109                     0.002293857
200                 0.005057374                     0.004640669
400                 0.008674218                     0.007760795
800                 0.014068582                     0.012417084
1600                0.023381095                     0.020124496
3200                0.040225138                     0.034093181
6400                0.072098744                     0.060069574
12800               0.134043954                     0.110088141
25600               0.256453187                     0.208578633
51200               1.166602505                     1.118972796
102400              3.684957231                     3.635116808
204800              8.916302592                     8.861590734
409600             19.411057216                    19.358777670
--------------------------------------------------------------------------------

                             Third run result 
--------------------------------------------------------------------------------
length(byte)  C language implementation(s)   Basic instruction implementation(s)
--------------------------------------------------------------------------------
100                 0.002208111                     0.002293227
200                 0.005056101                     0.004628539
400                 0.008677756                     0.007748687
800                 0.014085242                     0.012404443
1600                0.023397782                     0.020115710
3200                0.040242985                     0.034084435
6400                0.072116665                     0.060063767
12800               0.134060262                     0.110082427
25600               0.257865186                     0.209101754
51200               1.174257177                     1.117753408
102400              3.696518162                     3.635417503
204800              8.929357747                     8.858765915
409600             19.426520562                     19.356515671
--------------------------------------------------------------------------------

From the test results, it can be seen that the runtime of memset implemented using the basic 
instruction set assembly is basically shorter than that implemented using the C language. 
May I ask if the test results are convincing?


&gt; -----原始邮件-----
&gt; 发件人: "Szabolcs Nagy" <nsz@...t70.net>
&gt; 发送时间: 2023-04-19 17:02:10 (星期三)
&gt; 收件人: "张飞" <zhangfei@...iscas.ac.cn>
&gt; 抄送: musl@...ts.openwall.com
&gt; 主题: Re: Re: [musl] memset_riscv64
&gt; 
&gt; * 张飞 <zhangfei@...iscas.ac.cn> [2023-04-19 13:33:08 +0800]:
&gt; &gt; --------------------------------------------------------------------------------
&gt; &gt; length(byte)  C language implementation(s)   Basic instruction implementation(s)
&gt; &gt; --------------------------------------------------------------------------------	
&gt; &gt; 4	          0.00000352	                    0.000004001	
&gt; &gt; 8	          0.000004001	                    0.000005441	
&gt; &gt; 16	          0.000006241	                    0.00000464	
&gt; &gt; 32	          0.00000752	                    0.00000448	
&gt; &gt; 64	          0.000008481	                    0.000005281	
&gt; &gt; 128	          0.000009281	                    0.000005921	
&gt; &gt; 256	          0.000011201	                    0.000007041	
&gt; 
&gt; i don't think these numbers can be trusted.
&gt; 
&gt; &gt; #include <stdio.h>
&gt; &gt; #include <sys mman.h="">
&gt; &gt; #include <string.h>
&gt; &gt; #include <stdlib.h>
&gt; &gt; #include <time.h>
&gt; &gt; 
&gt; &gt; #define DATA_SIZE 5*1024*1024
&gt; &gt; #define MAX_LEN 1*1024*1024
&gt; &gt; #define OFFSET 0
&gt; &gt; #define LOOP_TIMES 100
&gt; &gt; int main(){
&gt; &gt;    char *str1,*src1;
&gt; &gt;    str1 = (char *)mmap(NULL, DATA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
&gt; &gt; 
&gt; &gt;    printf("function test start\n");
&gt; &gt;    
&gt; &gt;    src1 = str1+OFFSET;
&gt; &gt;    struct timespec tv0,tv;
&gt; &gt;    for(int len=2; len&lt;=MAX_LEN; len*=2){
&gt; &gt;       clock_gettime(CLOCK_REALTIME, &amp;tv0);
&gt; &gt;       for(int k=0; k<loop_times; k++){=""> &gt;           memset(src1, 'a', len);
&gt; &gt;       }
&gt; &gt;       clock_gettime(CLOCK_REALTIME, &amp;tv);
&gt; &gt;       tv.tv_sec -= tv0.tv_sec;
&gt; &gt;       if ((tv.tv_nsec -= tv0.tv_nsec) &lt; 0) {
&gt; &gt; 	      tv.tv_nsec += 1000000000;
&gt; &gt; 	      tv.tv_sec--;
&gt; &gt;       }
&gt; &gt;       printf("len: %d  time: %ld.%.9ld\n",len, (long)tv.tv_sec, (long)tv.tv_nsec);
&gt; 
&gt; 
&gt; this repeatedly calls memset with exact same len, alignment and value.
&gt; so it favours branch heavy code since those are correctly predicted.
&gt; 
&gt; but even if you care about a branch-predicted microbenchmark, you
&gt; made a single measurement per size so you cannot tell how much the
&gt; time varies, you should do several measurements and take the min
&gt; so noise from system effects and cpu internal state are reduced
&gt; (also that state needs to be warmed up). and likely the LOOP_TIMES
&gt; should be bigger too for small sizes for reliable timing.
&gt; 
&gt; benchmarking string functions is tricky especially for a target arch
&gt; with many implementations.
&gt; 
&gt; &gt;    }
&gt; &gt; 
&gt; &gt;    printf("function test end\n");
&gt; &gt;    munmap(str1,DATA_SIZE);
&gt; &gt;    return 0;
&gt; &gt; }
&gt; &gt; 
</loop_times;></time.h></stdlib.h></string.h></sys></stdio.h></zhangfei@...iscas.ac.cn></zhangfei@...iscas.ac.cn></nsz@...t70.net>
View attachment "test_memset2.c" of type "text/plain" (1364 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.