Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 22 Mar 2023 08:15:30 -0400
From: Rich Felker <>
To: 张飞 <>,
	"A. Wilcox" <>
Subject: Re: [PATCH]Implementation of strlen function in riscv64

On Wed, Mar 22, 2023 at 01:27:33AM -0500, A. Wilcox wrote:
> The content of the message was sent as an image.
> For those who cannot view images, I've reproduced the text below:
> On Mar 22, 2023, at 1:21 AM, 张飞 <> wrote:
> > 
> > Hi:
> > 
> > I implemented vectorization of the strlen function in the riscv64
> > architecture, which is controlled by __riscv_vector definition. Due
> > to lack of support for risc-v V expansion in hardware, I conducted
> > performance tests on a simulator, which was more than 10 times the
> > performance achieved in C language. In terms of functionality, I
> > tested the string length from 1 byte to 64 Mb, and the alignment of
> > different addresses at the beginning of the string.
> > 
> > 
> > Please review it.I'm Looking forward to your reply,thanks.

The riscv64 target does not assume presence of vector extensions, and
as it's generally not a bottleneck, strlen isn't one of the functions
for which we generally have existing per-arch asm.

If we were going to introduce this kind of thing for strlen, the
preferable approach would probably be something like what I've
suggested we change memcpy/memset to: having the arch definition
provide only the minimal inline fragment needed to do the actual work
(something like: loading a vector, optionally xor'ing it with a mask
for the byte to search for, and reporting if it's found or the offset
at which it's found) with the actual control logic all in C.

Regarding the code submitted for review, I'm pretty sure it's buggy
because it doesn't seem to do anything with alignment. If you pass it
a pointer to the last byte of a page whose contents are zero, it will
attempt to load the rest of the vector from the next page, and fault.
Since strlen has no a priori way to know how long the object it's
inspecting is, I don't believe there's any way to do a vectorized
approach without pre-alignment to the size of the read you will be
performing, processing everything up to the aligned start separately.
Having to check for this kind of bug on a per-arch basis is one of the
motivations for not wanting whole functions written in asm, but
instead just minimal fragments, with this sort of common logic in C
where you know, once it's been reviewed once, it's correct for all


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.