This commit converges the CPU and GPU retiling
implementations to have a common interface. It
also converges their actual implementations to
be the same for both, allowing debugging and
performance improvements to the retiler on CPU
before porting to GPU. This commit also updates
the retiler testing to be a lot more consistent,
and also ensures that both tiling AND untiling
are behaving appropriately.