You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It branches between two different functions, depending on the pointer address. Since it is checking for a number with only one bit set... I wondered if just checking that fact does anything to the code, since we could simply discard the idea that it is going to be used on any 'higher' addresses.
makes it consistent between optimization levels. original variant utilizes mov + ssli shift that generates const number with 30th bit set with size optimization and l32r of the const value on other levels. here we just generate it directly, regardless of the level used
Not sure how to benchmark it, though, or whether these different instructions do anything to the overall performance
(or, anything at all, since this is just mildly improving memmove operations that are probably not that frequently performed)
> xtensa-lx106-elf-gcc -c -Os memmove.c
> xtensa-lx106-elf-nm --radix=d -S memmove.o | grep memmove U memmove00000020 00000023 T memmove_P100000000 00000018 T memmove_P2
> xtensa-lx106-elf-gcc -S -Os memmove.c
.file "memmove.c" .text .literal_position .align4 .globalmemmove_P2 .type memmove_P2, @functionmemmove_P2: bbci a3,30, .L2 ; branch on bit set / unset bbsi a2,30, .L2 j.l memcpy_P, a9.L2: j.l memmove, a9 .size memmove_P2, .-memmove_P2 .literal_position .align4 .globalmemmove_P1 .type memmove_P1, @functionmemmove_P1: movi.n a5,-1 ; btw this only happens on Os, O2 and O3 use l32r const of 0x40000000 srli a5, a5,2 bgeu a5, a3, .L7 bltu a5, a2, .L7 j.l memcpy_P, a9.L7: j.l memmove, a9 .size memmove_P1, .-memmove_P1 .ident "GCC: (GNU) 10.3.0"
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Looking at the current
memmove_P
implementationhttps://github.com/earlephilhower/newlib-xtensa/blob/ebc967552ce827f21fc579fd8c437037c1b472ab/newlib/libc/sys/xtensa/string_pgmspace.c#L184-L190
It branches between two different functions, depending on the pointer address. Since it is checking for a number with only one bit set... I wondered if just checking that fact does anything to the code, since we could simply discard the idea that it is going to be used on any 'higher' addresses.
Turns out it does
Not sure how to benchmark it, though, or whether these different instructions do anything to the overall performance
(or, anything at all, since this is just mildly improving memmove operations that are probably not that frequently performed)
Beta Was this translation helpful? Give feedback.
All reactions