An update to the update: I spoke too soon : /
It appears there’s an issue with the armv8_cortex_a72/br_pred/
event, depending on what actual branch instructions were used.
This loop produces an underestimate of armv8_cortex_a72/br_pred/
(960M events):
4007d0: d343590a ubfx x10, x8, #3, #20
4007d4: 386a6a6a ldrb w10, [x19,x10]
4007d8: 1200090b and w11, w8, #0x7
4007dc: 1acb22ab lsl w11, w21, w11
4007e0: 0a0b014a and w10, w10, w11
4007e4: 3400002a cbz w10, 4007e8 <main+0x78>
4007e8: 91000508 add x8, x8, #0x1
4007ec: eb09011f cmp x8, x9
4007f0: 54ffff01 b.ne 4007d0 <main+0x60>
But this loop does not (it registers 1000M events):
400628: d3435802 ubfx x2, x0, #3, #20
40062c: 12000801 and w1, w0, #0x7
400630: 38626a62 ldrb w2, [x19,x2]
400634: 1ac12841 asr w1, w2, w1
400638: 36000021 tbz w1, #0, 40063c <main+0x6c>
40063c: 91000400 add x0, x0, #0x1
400640: eb03001f cmp x0, x3
400644: 54ffff21 b.ne 400628 <main+0x58>