|
|
Message-Id: <20251023155353.175632-1-pincheng.plct@isrc.iscas.ac.cn> Date: Thu, 23 Oct 2025 23:53:52 +0800 From: Pincheng Wang <pincheng.plct@...c.iscas.ac.cn> To: musl@...ts.openwall.com Cc: pincheng.plct@...c.iscas.ac.cn Subject: [PATCH v2 resend 0/1] riscv64: Add RVV optimized memset implementation Hi all, This is v2 of the RISC-V Vector (RVV) optimized memset patch. I'm resending it because I forgot to commit the latest changes to Git. Sorry for the inconvenience! Changes from v1: - Replaced compile-time detetion (__riscv_vector macro) with runtime detection using AT_HWCAP, addressing the main concern from v1 feedback. - Introduced a dispatch mechanism (memset_dispatch.c) that selects appropriate implementation at process startup. - Added arch.mak configuration to prevent GCC auto-vectorization on other string functions, ensuring only our runtime-detected code uses vector insturctions. - Single binary now works correctly on both RVV and non-RVV hardware when built with CFLAGS+="-march=rv64gcv". Implementation details: - memset.S provides two symbols: memset_vect (RVV) and memset_scalar. - memset_dispatch.c exports memset() which dispatches via function pointer. - __init_riscv_string_optimizations() is called in __libc_start_main to initialize the function pointer based on AT_HWCAP. - The vector implementation uses vsetvli for bulk fills and a head-tail strategy for small sizes. Performance (unchanged from v1): - On Spacemit X60: up to ~3.1x faster (256B), with consistent gains across medium and large sizes. - On XuanTie C908: up to ~2.1x faster (128B), with modest gains for larger sizes. For very small sizes (<8 bytes), there can be minor regressions compared to the generic C version. This is a trade off for the significant gains on larger sizes. Additional Notes: It was mentioned during internal discussions that, according to the XuanTie C908 user manuals [1], vector memory operations must not target addresses with Strong Order (SO) attributes, otherwise the system may crash. In the current implementation, this scenario is handled by OpenSBI fallback mechanisms, but this leads to degraded performance compared to scalar implementations. I reviewed other existing vectorized mem* patches and did not find explicit handling for this case. Introducing explicit attribute checks would likely add extra overhead. Therefore, I am currently uncertain whether special handling for XuanTie CPUs should be included in this patch or addressed separately. Testing: - QEMU with QEMU_CPU="rv64,v=true" and "rv64,v=false". - Spacemit X60 with V extension support. - XuanTie C908 with V extension support. - CFLAGS += "-march=rv64gcv" and "-march=rv64gc". Functional behavior matches generic memset. Thanks, Pincheng Wang Pincheng Wang (1): riscv64: add runtime-detected vector optimized memset 0000-cover-letter.patch | 79 ++++++ 0000-cover-letter.patch.bak | 79 ++++++ ...ime-detected-vector-optimized-memset.patch | 259 ++++++++++++++++++ arch/riscv64/arch.mak | 12 + src/env/__libc_start_main.c | 3 + src/internal/libc.h | 1 + src/string/riscv64/memset.S | 134 +++++++++ src/string/riscv64/memset_dispatch.c | 38 +++ test_memset | Bin 0 -> 8824 bytes 9 files changed, 605 insertions(+) create mode 100644 0000-cover-letter.patch create mode 100644 0000-cover-letter.patch.bak create mode 100644 0001-riscv64-add-runtime-detected-vector-optimized-memset.patch create mode 100644 arch/riscv64/arch.mak create mode 100644 src/string/riscv64/memset.S create mode 100644 src/string/riscv64/memset_dispatch.c create mode 100755 test_memset -- 2.39.5
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.