An update to the update: I spoke too soon : /

It appears there’s an issue with the armv8_cortex_a72/br_pred/ event, depending on what actual branch instructions were used.

This loop produces an underestimate of armv8_cortex_a72/br_pred/ (960M events):

  4007d0:       d343590a        ubfx    x10, x8, #3, #20
  4007d4:       386a6a6a        ldrb    w10, [x19,x10]
  4007d8:       1200090b        and     w11, w8, #0x7
  4007dc:       1acb22ab        lsl     w11, w21, w11
  4007e0:       0a0b014a        and     w10, w10, w11
  4007e4:       3400002a        cbz     w10, 4007e8 <main+0x78>
  4007e8:       91000508        add     x8, x8, #0x1
  4007ec:       eb09011f        cmp     x8, x9
  4007f0:       54ffff01        b.ne    4007d0 <main+0x60>

But this loop does not (it registers 1000M events):

  400628:       d3435802        ubfx    x2, x0, #3, #20
  40062c:       12000801        and     w1, w0, #0x7
  400630:       38626a62        ldrb    w2, [x19,x2]
  400634:       1ac12841        asr     w1, w2, w1
  400638:       36000021        tbz     w1, #0, 40063c <main+0x6c>
  40063c:       91000400        add     x0, x0, #0x1
  400640:       eb03001f        cmp     x0, x3
  400644:       54ffff21        b.ne    400628 <main+0x58>

