A classic approach for separating state/transient is to use the phase vocoder: you estimate the phase of each bin based on the past frames. If the measured one is closed enough to the estimate, you classify the bin as steady-state, otherwise it is transient. See the DAFx book(Zolzer et al), that's in chapter 8 for the first edition and chapter 7 for the second edition. There used to be a pdf of the first edition available on the internet a decade ago, surely you'll find it.
If you don't need to separate an audio stream into each component, but rather only detect the 'state' you are in, there are perhaps simpler alternatives available (e.g.: spectral flux or high frequency content for detecting transients; spectral centroid can be used as a proxy for 'brilliance' and, in turn, quantity of harmonic content). It really depends on your application, but the DAFx book is probably a good starting point anyhow. You may want to look at some audio feature extraction libraries (e.g.: aubio or essentia can be used directly from C++).