Also slightly tighten the emulated memory delays. With
this commit, WDC boots (but crashes shortly after). Seems
like memory timings are coming into play, among other
things.
Replaced all references to simulation with emulation
Updated copyright year
Updated .gitignore to reduce chances of random files being uploaded to
the repo
Added .gitattributes to normalize all text files, and to ignore binary
files (which includes the logo and the NEC PDF)
When a CACHE instruction uses a mapped virtual address,
and a TLB miss results... just ignore it! Clearly, this
isn't the right thing to do, but all documentation is
ambiguous and this seems to float the boat for now.
MIPS compilers of the time optimized this out very aggressively as
they waste cycles and there's generally other instructions you can
toss in the load delay slot, so flag the interlock as unlikely.
If we're doing a cache operation in the DC stage, don't
change the stage of the lines; the cache operations will
do it if needed. Also implement get/set taglo for DC.
This is how the actual processor does it. In addition to
design correctness, we have the added benefit of being able
to support cache instructions whose virtual address lies
in a mapped part of the address space.
Since the CEN64 core now runs in it's own thread (and doesn't use
the FPU), we can steal the host's FPU state register and not have
to worry about preserving it.
Along with that major overhaul, don't force "extra" features like
simulation statistics and debugging if the user doesn't want them.
Including that code, even when it is not run, mucks with register
allocation or something ever so slightly.
Since we have to convert to an integer, as well as round in some
direction, these intrinsics (_mm_ceil_*, _mm_floor_*, _mm_round_*)
aren't of much use to us.
We will likely only hit a couple of the slow_cycle functions in
the VR4300 code when we interrupt. Because of this, push everything
just before what will be hit after a data cache fault into the cold
section.
Perf reported a window where the backend was busy, and the frontend
was idle. Take advantage of the situation by inserting a branch that
has the potential to filter out (a lot of) instructions from the
backend when it's clogged. This works to our advantage, because more
often than not we aren't executing FPU instructions, or we execute
the FPU instructions in small batches.