musl - [PATCH 0/1] riscv64: Add RVV optimized memset implementation

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250925131557.8907-1-pincheng.plct@isrc.iscas.ac.cn>
Date: Thu, 25 Sep 2025 21:15:56 +0800
From: Pincheng Wang <pincheng.plct@...c.iscas.ac.cn>
To: musl@...ts.openwall.com
Cc: pincheng.plct@...c.iscas.ac.cn
Subject: [PATCH 0/1] riscv64: Add RVV optimized memset implementation

Hi all,

This patch introduces a RISC-V Vector (RVV) optimized implementation of
memset.

Key points:
- Use RVV instructions to fill memory in bulk, with a small-size
  head-tail fast path to reduce vsetvli overhead.
- Fall back to a scalar head-tail implementation (like generic C
  implementation) when RVV is not available.
- Reduce both instruction count and code size: memset.o shrinks by about
  16.5% compared to the generic C build.

Performance results on RVV-capable hardware show clear improvements:
- On Spacemit X60: up to ~3.1x faster (256B), with consistent gains
  across medium and large sizes.
- On XuanTie C908: up to ~2.1x faster (128B), with modest gains for
  larger sizes.

For very small sizes (<8 Bytes), there can be regressions compared to
the generic C version. A more aggresive fast path could remove these
regressions, but at the cost of added code complexity. Feedback on this
trade-off is welcome.

The implementation was tested under QEMU with RVV enabled and on real
hardware. Functional behavior matches the generic memset, with no
changes to the public interface.

Thanks,
Pincheng Wang

Pincheng Wang (1):
  riscv64: optimize memset implementation with vector extension

 src/string/riscv64/memset.S | 101 ++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)
 create mode 100644 src/string/riscv64/memset.S

-- 
2.39.5

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.