I think I might have found a bug in the atan2f_neon function. It gives me NaN values where it shouldn't.
For reference, please try the following code (excuse the random values, those are just the values I discovered the issue with):

rt_printf("atan2f_neon: %f\n", atan2f_neon(1.27f * (1.0f / 1.51f), -0.41f * (1.0f / 1.51f)));
rt_printf("atan2f: %f\n", atan2f(1.27f * (1.0f / 1.51f), -0.41f * (1.0f / 1.51f)));

Yeah that doesn't sound too surprising
It may be a domain error. Can you try moving the angle to the first quadrant (both x and y positive) and see if it works there for angles between 0 and Pi/2 (i.e. x from 0 to 1, y from 0 to 1)?
In that case we may have to do some trigonometric transformations

I made the following test:

|  x  |  y  |   atan2f  | atan2f_neon
|  .5 |  .5 |  0.785398 |  0.785232
| -.5 |  .5 | -0.785398 | -6745088000.0
|  .5 | -.5 |  2.356194 | -6745088000.0
| -.5 | -.5 | -2.356194 |  0.785232

I might do some more tests with the other neon trigonometry methods soon, I am a little scared something else is going wrong. On the other hand I am also a bit dependent on them - using the math atan2f instead of atan2f_neon seems to kick my CPU usage up by almost 10%.

Thanks for your help!

    Right, so it seems that what is needed is a wrapper to make sure x and y are in the first quadrant and apply appropriate corrections (signs and/or complementary angles). That is if the accuracy you get in the first quadrant is close enough for your purposes:

    noah | .5 | .5 | 0.785398 | 0.785232