* before rendering a background;
* before rendering sprites;
* while rendering more than 128 samples of audio at once ("Prefer fluid video");
* after every 16 scanlines of CPU execution instead of every 1;
* while waiting for an audio buffer to become available;
* while killing time between frames with fast-forward disabled.
Controller presses and releases are now combined in a DS button bitfield using a shorter 32-bit algorithm. See entry.cpp:NDSSFCAccumulateJoypad and #define ACCUMULATE_JOYPAD in the source.
This is still not suitable for playing platformers frame-perfectly, but it's much better than half a second of latency to press or release a button, and one still needs to press buttons a bit more than just light taps. I'd say 50 milliseconds is the latency now. Platformers requiring more precision can be played with frameskip 0.
DMA does not require double-buffered displaying, so synchronise the controller more often by disabling double-buffered displaying again.
User-selected frameskip causes slowdowns if the game runs slower than the resulting frame rate, but synchronises correctly if the game runs faster.
Automatic frame skipping is still the default. It now only skips up to 8 frames, but in some games still skips that entire 8 frames. What's needed is an algorithm that averages frame latencies over a few seconds and skips while the latency is LOWER than the average.
This was the entire reason I created the fork, and I finally did it! It syncs the controls every scanline of a frame, which costs about 60,000 MIPS instructions per frame to deal with. Luckily, the processor runs at 396 MHz, which means the cost of checking the controls is 1% of the CPU's power.