• by sylware on 7/9/2023, 6:37:36 PM

    What seems to be missing are the hardware optimized and accelerated short and big memcpy/memset.

    On x86_64, on modern micro-archs, "rep stos[bwdq]" and "rep movs[bwdq]". I bet that, in modern binaries, memcpy/memset call sites are actually place holders for such instructions (before the memory segment goes back to Read/Executable), registers are rdi,rsi,rdx (rcx would be pushed on the stack or the code generated to account for just rcx availability on the call site).

    Also, expect x86_64 -> risc-v port bugs because to: byte->byte word->halfword doubleword->word quadword->doubleword

  • by gary_0 on 7/10/2023, 1:23:42 AM

    Does anyone have something like this for amd64 or aarch64?

    Might be useful when I'm tinkering with my toy compiler.