Yeah I was thinking that maybe that may be different on your part? Or it could be something else.
For instance, I seem to understand than in 4-bit mode you split most (all but the first) 8-bit messages in two halves: you first send the upper 4-bits (most-significant nibble MSN) and then the lower 4-bits (least-significant nibble LSN). If, for some reason, you end up having one extra message in there (or one fewer), you'll end up with your messages being off by one, so if, for instance, you are writing "H" followed by "E", you may think that you are sending out {MSN(H), LSN(H), MSN(E), LSN(E)}, but the display may interpret those as {LSN, MSN, LSN, MSN}, resulting in a scrambled display.
Look at the section "Interfacing via the 4 bit mode": try executing that specific set of instructions and see if you get an Hello World printed or not.