So, I created another shader that used 512 sines to generate a saw wave. The timings for the render stayed the same but the timings for the read increased.
Looking into it further the glFlush() call I was making to write the data to the surface returns before the data is written, so glReadPixels() was blocking until data was available to it.
There is another call glFinish() that only returns when the GL pipeline is totally finished and the data is fully available on the surface, so using the original single sine shader we have:
glFlush:
4, 43.210602, 112.224297, 0.000000, 0.000000,
16, 42.898399, 108.288696, 0.009600, 0.000000,
64, 42.111000, 110.291199, 0.000000, 0.000000,
256, 42.907398, 112.053398, 0.017600, 0.000000,
1024, 43.125099, 117.270897, 2.005300, 0.000000,
4096, 44.467400, 134.808502, 8.854100, 0.000000,
16384, 43.597500, 220.387894, 48.062199, 0.000000,
65536, 42.473400, 585.380371, 131.339798, 0.000000,
262144, 53.450401, 1865.462036, 531.566772, 0.000000,
glFinish():
4, 143.550903, 96.133499, 0.001700, 0.000000,
16, 147.735703, 97.076302, 0.019000, 0.000000,
64, 143.198898, 95.499001, 0.006400, 0.000000,
256, 147.069107, 97.538101, 0.190900, 0.000000,
1024, 149.832199, 102.943298, 2.387900, 0.000000,
4096, 157.092896, 105.746300, 9.270600, 0.000000,
16384, 222.302795, 144.760895, 45.846298, 0.000000,
65536, 456.732513, 273.717896, 125.640099, 0.000000,
262144, 1301.814941, 739.296570, 522.817322, 0.000000,