Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251023160611.197027-1-pincheng.plct@isrc.iscas.ac.cn>
Date: Fri, 24 Oct 2025 00:06:10 +0800
From: Pincheng Wang <pincheng.plct@...c.iscas.ac.cn>
To: musl@...ts.openwall.com
Cc: pincheng.plct@...c.iscas.ac.cn
Subject: [PATCH v2 resend 0/1] riscv64: Add RVV optimized memset implementation

Hi all,

This is v2 of the RISC-V Vector (RVV) optimized memset patch. Resending
because forgot to commit the latest changes to Git. Sincerely sorry for the
inconvenience of messing up the mailing list and please take this
version as the correct one.

Changes from v1:
- Replaced compile-time detetion (__riscv_vector macro) with runtime
  detection using AT_HWCAP, addressing the main concern from v1
  feedback.
- Introduced a dispatch mechanism (memset_dispatch.c) that selects
  appropriate implementation at process startup.
- Added arch.mak configuration to prevent GCC auto-vectorization on
  other string functions, ensuring only our runtime-detected code uses
  vector insturctions.
- Single binary now works correctly on both RVV and non-RVV hardware
  when built with CFLAGS+="-march=rv64gcv".

Implementation details:
- memset.S provides two symbols: memset_vect (RVV) and memset_scalar.
- memset_dispatch.c exports memset() which dispatches via function
  pointer.
- __init_riscv_string_optimizations() is called in __libc_start_main to
  initialize the function pointer based on AT_HWCAP.
- The vector implementation uses vsetvli for bulk fills and a head-tail
  strategy for small sizes.

Performance (unchanged from v1):
- On Spacemit X60: up to ~3.1x faster (256B), with consistent gains
  across medium and large sizes.
- On XuanTie C908: up to ~2.1x faster (128B), with modest gains for
  larger sizes.

For very small sizes (<8 bytes), there can be minor regressions compared
to the generic C version. This is a trade off for the significant gains
on larger sizes.

Additional Notes:
It was mentioned during internal discussions that, according to the XuanTie C908 user
manuals [1], vector memory operations must not target addresses with
Strong Order (SO) attributes, otherwise the system may crash. In the
current implementation, this scenario is handled by OpenSBI fallback
mechanisms, but this leads to degraded performance compared to scalar
implementations.
I reviewed other existing vectorized mem* patches and did not find
explicit handling for this case. Introducing explicit attribute checks
would likely add extra overhead. Therefore, I am currently uncertain
whether special handling for XuanTie CPUs should be included in this
patch or addressed separately.

Testing:
- QEMU with QEMU_CPU="rv64,v=true" and "rv64,v=false".
- Spacemit X60 with V extension support.
- XuanTie C908 with V extension support.
- CFLAGS += "-march=rv64gcv" and "-march=rv64gc".
Functional behavior matches generic memset.

Thanks,
Pincheng Wang

[1] https://occ-oss-prod.oss-cn-hangzhou.aliyuncs.com/resource//1752575352271/%E7%8E%84%E9%93%81C908R1S0%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C%28xrvm%29_Rev.21_20250715.pdf Only in Chinese, relavant contents are in Chapter 8.

Pincheng Wang (1):
  riscv64: add runtime-detected vector optimized memset

 arch/riscv64/arch.mak                |  12 +++
 src/env/__libc_start_main.c          |   3 +
 src/internal/libc.h                  |   1 +
 src/string/riscv64/memset.S          | 134 +++++++++++++++++++++++++++
 src/string/riscv64/memset_dispatch.c |  38 ++++++++
 5 files changed, 188 insertions(+)
 create mode 100644 arch/riscv64/arch.mak
 create mode 100644 src/string/riscv64/memset.S
 create mode 100644 src/string/riscv64/memset_dispatch.c

-- 
2.39.5

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.