* before rendering a background;
* before rendering sprites;
* while rendering more than 128 samples of audio at once ("Prefer fluid video");
* after every 16 scanlines of CPU execution instead of every 1;
* while waiting for an audio buffer to become available;
* while killing time between frames with fast-forward disabled.
Controller presses and releases are now combined in a DS button bitfield using a shorter 32-bit algorithm. See entry.cpp:NDSSFCAccumulateJoypad and #define ACCUMULATE_JOYPAD in the source.
This is still not suitable for playing platformers frame-perfectly, but it's much better than half a second of latency to press or release a button, and one still needs to press buttons a bit more than just light taps. I'd say 50 milliseconds is the latency now. Platformers requiring more precision can be played with frameskip 0.
DMA does not require double-buffered displaying, so synchronise the controller more often by disabling double-buffered displaying again.
tile.cpp: Optimise the common case of drawing an unclipped but possibly flipped 8x8 tile. Instead of calling WRITE_4PIXELS16 16 times, each performing setup and teardown, move the loop into DrawTile16.
tile.h, tile.cpp, gfx.h, gfx.cpp: End the use of global variable GFX.ScreenColors to pass around the current frame's palette. This saves on memory stores/loads.