Hi,
In one of the past tech forums, we claimed that we don't encourage contribution in common logic in assembly, and one of the patches was abandoned because of this. That patch is designed to improve the memset/memcpy performance.
We override these APIs in general because we want the code can be auditable. Then we provided implementations in C, but it shows these implementation does not provide good performance. We want to apply the toolchain provided versions, and looks GNU tool provides the straightest 'byte-copy' version. And armclang involves unnecessary variants which increase the code size a little (not big).
Hence we provide a version with 'tiny' optimization in assembly and mark this patch an exception to the review guidelines, as these under-layer functions won't get changed frequently. We are also seeking an ideal way to apply toolchain versions.
The patch is here for your review: https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/13735
Thanks.
/Ken