For an interesting look at how far you can go down the rabbithole of
optimising a small piece of code, see this sequence of
posts on improving the performance of
sorting small arrays by vectorisation (among other things).
Denis Bakhvalov writes a blog on performance
optimisation, with a focus on the “top-down” methodology. He also
has a free ebook on performance analysis and tuning, and a series of
exercises exploring
individual performance analysis and optimisations.
Andi Kleen develops the
pmu-tools project, whose
toplev command is useful. The
manual
has some useful tips.