risc-v memcpy, memset, memmove functions

Optimized scalar memcpy, memset, memmove functions written in RISC-V assembler. Code is optimized for in-order multiple issue (superscalar) CPUs. In order to comply with the compressed ISA specification x0-x15 GP registers are used.

Minimal supported risc-v ISA: rv32e for memset, rv32i for memcpy/memmove

code size

function	standard ISA (rv64/rv32)	compressed ISA (rv64/rv32)
memset	280/268	178/172
memcpy	408	258
memmove	416	266

numbers are in bytes for rv64

performance

arch	mode	memset	memcpy	memmove
rv32	unaligned	3.76	0.61	0.61
rv32	aligned	3.76	1.82	1.82
rv64	unaligned	7.52	1.23	1.23
rv64	aligned	7.52	3.65	3.65

numbers are bytes per cycle (spike)
for memcpy/memmove aligned/unaligned modes mean mutual alignments of XLEN size for src & dst buffers (memcpy, memmove)

internal details

memcpy

memcpy() for mutual unaligned buffers does load/stores of XLEN with logical shift and combine data. Head and tail misalignments within XLEN size are handled by byte copies (lb/sb). Parallelism of instructions factor > 2. Endianess supported: big, little.

hot spot for memcpy unaligned (rv64gc, little-endian):

    /-> 0107d4b3                srl     s1,a5,a6
    |   6598                    c.ld    a4,8(a1)
    |   01171433                sll     s0,a4,a7
    |   699c                    c.ld    a5,16(a1)
    |   8c45                    c.or    s0,s1
    |   010754b3                srl     s1,a4,a6
    |   e100                    c.sd    s0,0(a0)
    |   01179433                sll     s0,a5,a7
    |   8c45                    c.or    s0,s1
    |   05c1                    c.addi  a1,16
    |   e500                    c.sd    s0,8(a0)
    |   0541                    c.addi  a0,16
    \-- fed560e3                bltu    a0,a3,<_memcpy+0xda>

memcpy() for mutual aligned buffers does unrolled loop over load/stores of XLEN register size. Parallelism of instructions factor is 4.

hot spot for memcpy aligned (rv64gc):

    /-> 6198                    c.ld    a4,0(a1)
    |   659c                    c.ld    a5,8(a1)
    |   6980                    c.ld    s0,16(a1)
    |   6d84                    c.ld    s1,24(a1)
    |   e118                    c.sd    a4,0(a0)
    |   e51c                    c.sd    a5,8(a0)
    |   e900                    c.sd    s0,16(a0)
    |   ed04                    c.sd    s1,24(a0)
    |   ....                    ....    .........
    |   71b8                    c.ld    a4,96(a1)
    |   75bc                    c.ld    a5,104(a1)
    |   79a0                    c.ld    s0,112(a1)
    |   7da4                    c.ld    s1,120(a1)
    |   f138                    c.sd    a4,96(a0)
    |   f53c                    c.sd    a5,104(a0)
    |   f920                    c.sd    s0,112(a0)
    |   fd24                    c.sd    s1,120(a0)
    |   95c6                    c.add   a1,a7
    |   9546                    c.add   a0,a7
    \-- fad56ee3                bltu    a0,a3,<_memcpy+0x42>

memmove

memmove() does the same as memcpy() does with respect to fact that buffers may overlap. Therefore there implemented direction of copying: straight and reverse

memset

Uses unrolled loop with 32 register stores of XLEN.
Maximum parallelism of instructions factor is 32.
Head and tail misalignments within XLEN size are handled by lb/sb (byte load/store)

hot spot for memset (rv64gc):

    /-> e38c                  c.sd    a1,0(a5)
    |   e78c                  c.sd    a1,8(a5)
    |   eb8c                  c.sd    a1,16(a5)
    |   ....                  ....    .........
    |   fbec                  c.sd    a1,240(a5)
    |   ffec                  c.sd    a1,248(a5)
    |   10078793              addi    a5,a5,256
    \-- fae7eee3              bltu    a5,a4,<_memset+0x5e>

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
tests		tests
LICENSE		LICENSE
README.md		README.md
memops.S		memops.S

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

risc-v memcpy, memset, memmove functions

code size

performance

internal details

memcpy

memmove

memset

About

Uh oh!

Releases

Packages

Languages

License

avx/riscv_memops

Folders and files

Latest commit

History

Repository files navigation

risc-v memcpy, memset, memmove functions

code size

performance

internal details

memcpy

memmove

memset

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages