The best place to start is by asking yourself if you actually need to brute-force sum that many elements. For example, is there some other signal processing you can do to reduce the number?
Do you need to process this sum every block processing interval?
What function are you performing? If it's some kind of convolution then there are various tricks you can use. Even if it's something else, there may be some other tricks.
One thing for certain is you would be hard pressed just to get a busy loop of 30k iterations to complete in a single block processing interval, even if it doesn't do anything more than spin. You have to look for a way to eliminate the number of operations from your function.