Date: Wed, 22 Mar 2023 08:15:30 -0400 From: Rich Felker <dalias@...c.org> To: 张飞 <zhangfei@...iscas.ac.cn>, "A. Wilcox" <awilfox@...lielinux.org> Cc: musl@...ts.openwall.com Subject: Re: [PATCH]Implementation of strlen function in riscv64 architecture On Wed, Mar 22, 2023 at 01:27:33AM -0500, A. Wilcox wrote: > The content of the message was sent as an image. > > For those who cannot view images, I've reproduced the text below: > > On Mar 22, 2023, at 1:21 AM, 张飞 <zhangfei@...iscas.ac.cn> wrote: > > > > Hi: > > > > I implemented vectorization of the strlen function in the riscv64 > > architecture, which is controlled by __riscv_vector definition. Due > > to lack of support for risc-v V expansion in hardware, I conducted > > performance tests on a simulator, which was more than 10 times the > > performance achieved in C language. In terms of functionality, I > > tested the string length from 1 byte to 64 Mb, and the alignment of > > different addresses at the beginning of the string. > > > > > > Please review it.I'm Looking forward to your reply,thanks. The riscv64 target does not assume presence of vector extensions, and as it's generally not a bottleneck, strlen isn't one of the functions for which we generally have existing per-arch asm. If we were going to introduce this kind of thing for strlen, the preferable approach would probably be something like what I've suggested we change memcpy/memset to: having the arch definition provide only the minimal inline fragment needed to do the actual work (something like: loading a vector, optionally xor'ing it with a mask for the byte to search for, and reporting if it's found or the offset at which it's found) with the actual control logic all in C. Regarding the code submitted for review, I'm pretty sure it's buggy because it doesn't seem to do anything with alignment. If you pass it a pointer to the last byte of a page whose contents are zero, it will attempt to load the rest of the vector from the next page, and fault. Since strlen has no a priori way to know how long the object it's inspecting is, I don't believe there's any way to do a vectorized approach without pre-alignment to the size of the read you will be performing, processing everything up to the aligned start separately. Having to check for this kind of bug on a per-arch basis is one of the motivations for not wanting whole functions written in asm, but instead just minimal fragments, with this sort of common logic in C where you know, once it's been reviewed once, it's correct for all archs. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.