Hi Quentin,
Sorry for the delay, I’ve spent the last few hours looking into this.
So, the semantics of the inline function specifier in C99 are as follows:
A function declared with an inline function specifier is an inline function. The function specifier may appear more than once; the behavior is the same as if it appeared only once. Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined.
To truly *enforce* inlining you need to use a GCC/Clang compiler extension like `__attribute__((always_inline))`.
In this specific case, I’m not quite sure I understand why the function is marked as `inline` at all - it is quite large. If the function is only ever used once then the compiler can (and almost always will) automatically inline it regardless of whether you use the `inline` hint, although because its address is taken elsewhere in the file the compiler must always generate a non-inlined version as well.
Additionally, the loop in the inline assembly does make it a bit harder for the compiler here – the compiler usually cannot inspect inline assembly very well, so it doesn’t automatically understand the internal infinite loop. I’ve posted a patch to resolve these issues in one swoop here: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/33175
You're right, there probably isn't. I was thinking more along the lines of using the make_helpers/build_macros.mk helpers but that probably won't work either. Would propagating the TF_FLAGS be possible? Chris should probably weigh in on this.
I presume we’re talking about the RK3399 Cortex-M0 toolchain? The build macros are not set up to handle this sort of situation, I’m afraid. We can’t propagate `TF_CFLAGS` either because many of the flags are not compatible across the AArch64 and AArch32 toolchains, so it’s a manual process.
Hope that helps, Chris
From: Quentin Schulz via TF-A tf-a@lists.trustedfirmware.org Date: Thursday, 31 October 2024 at 13:53 To: Boyan Karatotev Boyan.Karatotev@arm.com, tf-a@lists.trustedfirmware.org tf-a@lists.trustedfirmware.org Cc: nb@arm.com nb@arm.com Subject: [TF-A] Re: TF-A: Compilation issues for Rockchip platforms Hi Boyan,
On 10/31/24 11:02 AM, Boyan Karatotev wrote:
[You don't often get email from boyan.karatotev@arm.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
Hi Quentin,
[You don't often get email from boyan.karatotev@arm.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
Hi Quentin,
I dug around a bit and I think I have answers for most.
Thanks for looking into this :)
No worries!
- RK3588 symbol redefinition in assembly
This only happens for clang and I'm a bit confused as to why. Applying the above patch fixing that issue but inspecting the assembly code reveals something really odd.
**With** that patch, with gcc: 25c8c: d510149f msr dbgprcr_el1, xzr 25c90: d538f2e0 mrs x0, s3_0_c15_c2_7 25c94: b2400000 orr x0, x0, #0x1 25c98: d518f2e0 msr s3_0_c15_c2_7, x0 25c9c: 58000020 ldr x0, 0x25ca0 25ca0: d5033fdf isb 25ca4: d503207f wfi 25ca8: d61f0000 br x0
**with** that patch, with clang: 26d2c: d510149f msr dbgprcr_el1, xzr 26d30: d538f2e0 mrs x0, s3_0_c15_c2_7 26d34: b2400000 orr x0, x0, #0x1 26d38: d518f2e0 msr s3_0_c15_c2_7, x0 26d3c: 58000020 ldr x0, 0x26d40 <--- "my" ldr 26d40: d5033fdf isb 26d44: d503207f wfi 26d48: d61f0000 br x0 <--- "my" br 26d4c: 94000e76 bl 0x2a724 26d50: d510149f msr dbgprcr_el1, xzr 26d54: d538f2e0 mrs x0, s3_0_c15_c2_7 26d58: b2400000 orr x0, x0, #0x1 26d5c: d518f2e0 msr s3_0_c15_c2_7, x0 26d60: 58000020 ldr x0, 0x26d64 <--- "my" ldr 26d64: d5033fdf isb 26d68: d503207f wfi 26d6c: d61f0000 br x0 <--- "my" br
So I think the above patch is just a band-aid on what the actually is but I have no idea what could be going on.
If I use this patch instead, to match the other assembly instructions in this file
========================================================================= diff --git a/plat/rockchip/rk3588/drivers/pmu/pmu.c b/plat/rockchip/rk3588/drivers/pmu/pmu.c index f693dbdf0..a4128b214 100644 --- a/plat/rockchip/rk3588/drivers/pmu/pmu.c +++ b/plat/rockchip/rk3588/drivers/pmu/pmu.c @@ -760,10 +760,10 @@ static inline void cpus_pd_req_enter_wfi(void) "mrs x0, S3_0_C15_C2_7\n" "orr x0, x0, #0x1\n" "msr S3_0_C15_C2_7, x0\n"
"wfi_loop:\n"
"1:\n" "isb\n" "wfi\n"
"b wfi_loop\n");
"b 1b\n");
}
static void nonboot_cpus_off(void)
=========================================================================
It happily compiles, though the binary still seems to contain this loop twice.
Using __attribute__((noinline)) the code is not duplicated, but I assume this is expected as it would now simply be a function.
So, the function only runs once, but it has its address taken elsewhere. For the actual call, it does get inlined, as expected. But the address taking bit, it is kept as a standalone function which happens to be places right after. Disassembling with symbols shows this a bit better. My guess would be that clang either isn't smart enough to deduplicate this, or it's doing it on purpose. Gcc must decide to not inline.
But the function currently has the inline marker, doesn't it force the compiler to inline it?
Apparently, it doesn't have to: https://www.kernel.org/doc/local/inline.html
Are you suggesting anything we could do here?
Would any of the three above suggestions be reasonably fit for upstreaming?
Well the numbered labels should let it compile and we already merged another patch for that. We do have __always_inline, ALWAYS_INLINE, or just __attribute__((__always_inline__)) which should force the inlining. But if there are two uses of the function, I'd imagine the named label would produce similar errors.
Bingo!
If I force the inlining on GCC, I get the same error as in clang.
Using a local label works for GCC and clang, so will submit something like that instead. Note that for some reason I can only use numbered local labels even though named labels are supposedly supported by the Aarch64 ASM, but not GAS? Anyway, good enough for me, too far from my comfort zone already :)
Thanks for the hint!
I'm wondering now if it really makes sense to have inline functions if GCC just doesn't care about those? Should we remove the inline part or introduce the same macro as the kernel so that we force inlining even on GCC? I am not planning on submitting anything for that, just being curious where this puts us.
- RK3399 warning array subscript 0 is outside array bounds of 'volatile unsigned int[0]' [-Warray-bounds=]
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" In file included from src/dram.c:12: src/dram.c: In function 'm0_main': include/rk3399_mcu.h:15:34: warning: array subscript 0 is outside array bounds of 'volatile unsigned int[0]' [-Warray-bounds=] 15 | (*(volatile unsigned int *)(c)); __v; }) | ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/rk3399_mcu.h:16:69: note: in definition of macro 'mmio_write_32' 16 | #define mmio_write_32(c, v) ((*(volatile unsigned int *)(c)) = (v)) | ^ src/dram.c:67:23: note: in expansion of macro 'mmio_read_32' 67 | mmio_read_32(PARAM_ADDR + PARAM_FREQ_SELECT)); | ^~~~~~~~~~~~
This function is used in many places, but only in very specific places is it triggering a warning, when PARAM_ADDR is used, but only when accessed from m0 C files.
PARAM_ADDR is set to 0xc0. From !m0 files, access to PARAM_ADDR seems to be done through the M0_PARAM_ADDR which is M0_BINCODE_BASE + PARAM_ADDR. For m0 files, it's directly accessing PARAM_ADDR. I honestly don't really know why it's complaining for m0 files but not the !m0 ones.
This seems identical to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523. The top-level makefile has a workaround for this and it sets --param=min-pagesize=0. However, this driver has a very peculiar way of building and it does not inherit most of tf-a's flags. Adding that flag to its own internal makefile fixes the issue, but I feel like it's bound to repeat. Can't it be built by the standard tfa build?
The only thing I know about TF-A on RK3399 is that it is indeed peculiar :)
As far as I know, some part of the binary runs on the Cortex-A core(s) and another part on one of the Cortex-M0, hence why we need an additional toolchain and different drivers and all that.
Ah, I hadn't though of that.
Not sure if there exists any platform today in TF-A that has the same behavior and is "built by the standard tfa build"? How would one go forward with that?
You're right, there probably isn't. I was thinking more along the lines of using the make_helpers/build_macros.mk helpers but that probably won't work either. Would propagating the TF_FLAGS be possible? Chris should probably weigh in on this.
Except if Chris's mail address is nb at arm.com, they weren't explicitly Cc'ed in your mail (if that was your intention :) ).
Cheers, Quentin -- TF-A mailing list -- tf-a@lists.trustedfirmware.org To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org