Commit graph

736 commits

Author SHA1 Message Date
Simon Eriksson
135a6cab5e Change default window size and aspect to 640*474.
Fixes vertical stretching issues when the N64 framebuffer
has 240 or 480 lines.
2016-03-06 13:56:13 -05:00
Tyler J. Stachecki
3565a05f30 rsp: Use host byte ordering for ICACHE.
Up until the, the RSP was storing instruction words in big-
endian format. Thus, each fetch on an x86 host requires a
byteswap. This is wasteful, so use host byte ordering for
the ICACHE (as the VR4300 does now).
2016-02-27 19:13:50 -05:00
Tyler J. Stachecki
d163ff83a9 More audio optimizations from izy. 2016-02-27 18:24:19 -05:00
Tyler J. Stachecki
6a701096c1 Implement izy's SSE audio processing idea.
izy noticed that the audio buffers are usually >= 64 bytes
in size and aligned to 16 bytes. This makes them a very good
candidate for SSE (instead of swapping a word at a time).
2016-02-27 18:06:18 -05:00
Tyler J. Stachecki
1855797178 Update contributors and README.md. 2016-02-27 16:04:24 -05:00
Tyler J. Stachecki
d2d9dd6371 simer's cart db patch.
simer suggested (and implemented) the use of ROM IDs instead
of titles: "I've also found that the header name in some cases
are too inprecise, for example "TOP GEAR RALLY" has EEPROM 4K
for the Japanese and European versions, but not for the American.
2016-02-27 15:39:26 -05:00
Tyler J. Stachecki
e70455761b More optimizations from izy.
This optimization removes the LUT in LWL/LWR:

At the moment when the LUT is used inlined this code is generated:
OR LUTAddr(offset), dqm

That is something like:
OR 0x400760(,%rdi,8),dqm

The code equivalent to "mov %edi,%edi" from the function above can get removed.
I want to assume anyway that accessing the LUT and updating the "dqm" variable
generates a single instruction with memory access.

With the patch the generated code is:
add $0xfffffffd,%edi
sbb %rax,%rax
OR %rax, dqm
Thus my patch increases the amount of opcodes by two instructions.

The LUT has 3 advantages on its side:
- The function VR4300_LWL_LWR() will use the value read from the LUT only once
  and only for a logic-OR.
- On x86 a logic-OR is an operation that can work with the source operand read
  from memory
- The "offset" variable is pre-calculated and can be used "as is" by the LUT.

The code with my patch (without the LUT) has only an advantage on its side:
- The LUT (memory access) is removed
2016-02-22 00:02:26 -05:00
Tyler J. Stachecki
08f1667432 Expand the cart list.
Thanks: Snowstorm64 and krom.
2016-02-21 21:02:36 -05:00
Tyler J. Stachecki
9d9655cf62 vr4300: Sign extend results from MFC0.
This bug prevented Conker's Bad Fur Day from booting.
2016-02-17 02:00:13 -05:00
Tyler Stachecki
8dcb4b6f8f Merge pull request #37 from jkbenaim/controls-doc
Add a note about the default keyboard controls.
2016-02-08 01:32:12 -05:00
Jason
2caa344678 Add a note about the default keyboard controls. 2016-02-07 19:33:13 -05:00
Tyler J. Stachecki
88c65ae630 Another great optimization from izy.
izy managed to remove another LUT used in add/sub related
insructions. The devil is in the details (see commit).

<new>:
00000000004006b0 <rsp_addsub_mask>:
  4006b0:       c1 ef 02                shr    $0x2,%edi
  4006b3:       19 c0                   sbb    %eax,%eax
  4006b5:       c3                      retq

<old>:
00000000004006d0 <rsp_addsub_mask>:
  4006d0:       83 e7 02                and    $0x2,%edi
  4006d3:       8b 04 bd 80 07 40 00    mov    0x400780(,%rdi,4),%eax
  4006da:       c3                      retq

"You see that this patch doesn't increase the amount of
instructions. They are always two/three/four instructions
and with automatic register selection. This is always the
case with a MOV from memory... you can load to any register,
but the same will happen with a SBB over itself. That is
also the reason why when the function is inlined it won't
require any special register (such as a the EAX:EDX pair,
the "cltd" instruction you see in the 32 bit code is only
a coincidence caused by the optimizations done by the gcc
and isn't mandatory).

The System V AMD64 calling convention puts the input
parameter in rdi, but wherever the selector is placed
nothing changes. The output parameter is in rax, but
MOV/SBB can work with any register when inlined.
2016-02-07 14:01:00 -05:00
Tyler J. Stachecki
3003d774cb Improved SSE2 vector shuffle patch from izy. 2016-02-06 14:26:47 -05:00
Tyler J. Stachecki
e12a459b18 More optimization patches from izy.
izy noticed that the branch LUT was generating memory moves
and could be replaced with an inlined function that coerces
gcc into generating a lea in its place:

  4005ac:       8d 1c 00                lea    (%rax,%rax,1),%ebx
  4005af:       c1 fb 1f                sar    $0x1f,%ebx
  4005b2:       f7 d3                   not    %ebx
(no memory access)

  4005b9:       c1 e8 1e                shr    $0x1e,%eax
  4005bc:       83 e0 01                and    $0x1,%eax
  4005bf:       44 8b 24 85 90 07 40    mov    0x400790(,%rax,4),%r12d
(original has memory access)

This ends up optimizing branch instructions quite nicely:

"You see that when you use "mask" you execute "~mask". The
compiler understands that ~(~(partial_mask)) = partial_mask
and removes both "NOTs". So in this case my version uses 2
instructions and no memory access/cache pollution."
2016-02-06 13:43:07 -05:00
Mike Ryan
3235ee74eb options: document controller and save options 2016-02-05 21:44:53 -05:00
Mike Ryan
b2721e7d37 rtc: implement RTC
Untested in-game since Animal Forest does not yet run.
2016-02-05 21:44:48 -05:00
Mike Ryan
cfd2336443 time: move get_local_time out to platform-specific dir
Patched DD code to make use of this code. Untested on Windows.
2016-02-05 21:44:42 -05:00
Tyler J. Stachecki
4e0620c637 rsp.c patch from izy. 2016-02-03 22:30:54 -05:00
Tyler J. Stachecki
e0c3fdd47f Fix PI/DD bugs that broke some carts.
Readjust PI DMA delay timing slightly (until we get some
more accurate timing information or simulation to replace
it altogether).
2016-01-30 19:39:10 -05:00
Tyler J. Stachecki
bdee3731ae Don't abort emulation on SHA mismatch.
SHA checksum computation is broken on Windows builds,
so don't abort out if the checksum doesn't match; just
warn.
2016-01-30 18:42:46 -05:00
Tyler J. Stachecki
4461ad9cf1 Fix 64DD RTC for Windows builds. 2016-01-30 18:10:43 -05:00
Mike Ryan
4fc80b7e8b dd: functional N64 DD implementation
Does not handle disk writes. Expect disks in MAME format.
2016-01-30 16:28:00 -05:00
Mike Ryan
71f405ea40 pi: adjust DMA delay to work with 64dd 2016-01-30 16:27:45 -05:00
Mike Ryan
a0840f1e04 pi: return 0 for mapped read of non-present cart 2016-01-30 16:27:41 -05:00
Tyler J. Stachecki
3c7765b136 (More) fixes for SHA1 compilation errors on Windows. 2016-01-30 15:42:42 -05:00
Tyler J. Stachecki
b7bf8be66d Forgot a keyword in an older commit. 2016-01-30 15:42:38 -05:00
Tyler J. Stachecki
36b2aabc2b Fixes for SHA1 compilation errors on Windows. 2016-01-30 15:42:35 -05:00
Tyler J. Stachecki
9e33765f2e Standardized type names! Who would use such things?! 2016-01-30 15:04:14 -05:00
Tyler J. Stachecki
2b5eaa579d Try to reduce VR4300 cycle overhead as well. 2016-01-30 14:58:31 -05:00
Tyler J. Stachecki
e2e72821e2 Try to reduce component cycle overheads.
Oftentimes, many of our countrollers are just doing a
simple countdown and don't perform any real work for the
cycle. Pull those parts out into headers so that the
compiler can 'see' that and optimize accordingly.
2016-01-30 14:58:31 -05:00
Tyler J. Stachecki
63b2709dc0 Bad implementation of PI delay. 2016-01-30 14:58:31 -05:00
Tyler J. Stachecki
d753c37512 save_file patch from izy. 2016-01-30 14:52:25 -05:00
Mike Ryan
683fcc39a0 flashram: do not segfault on writes if FlashRAM is not present 2016-01-28 00:42:35 -05:00
Mike Ryan
4650fb491f quiet warnings on OS X 2016-01-28 00:42:35 -05:00
Mike Ryan
a4c7397848 cmake: do not use -fsanitize=undefined on Apple debug builds
Feature is unavailable in Xcode's clang. lame
2016-01-28 00:42:35 -05:00
Jason Benaim
20e251ba75 One less annoying print. 2016-01-28 00:42:35 -05:00
Jason Benaim
c16561e388 Remove annoying print, add some comments 2016-01-28 00:42:35 -05:00
Jason Benaim
ee9e777701 Don't segfault if no tpak save was supplied. 2016-01-28 00:42:35 -05:00
Jason Benaim
b6c2e0bd7a Header fixes. 2016-01-28 00:42:35 -05:00
Jason Benaim
0c4a20abe1 Add support for MBC3. This is enough to get Pokemon games working with the transfer pak. 2016-01-28 00:42:35 -05:00
Tyler Stachecki
80cd82e571 Merge pull request #29 from mikeryan/sha1-validate
validate SHA1 of important files
2016-01-28 00:36:30 -05:00
Mike Ryan
1146c33755 sha1: validate NTSC-J pifrom
Thanks @vgturtle127 !
2016-01-27 19:40:43 -08:00
Mike Ryan
eaf3cee002 sha1: validate DD IPL 2016-01-27 13:28:32 -08:00
Mike Ryan
bc9f14c7f9 sha1: detect and validate PAL PIFROM 2016-01-27 09:38:42 -08:00
Tyler J. Stachecki
24350e2440 device: Tighter sync on AI/VR4300 with -multithread. 2016-01-27 03:14:05 -05:00
Tyler J. Stachecki
a4f0d7245a bus: Reduce number of MMIO address mappings. 2016-01-27 03:13:57 -05:00
Tyler J. Stachecki
b4a68338e1 ai: Fix some bugs and optimize just a little. 2016-01-27 01:44:35 -05:00
Mike Ryan
2803a304bc validate SHA1 of important files
Currently only validating pifdata.bin. Probably also worth validating DD
IPL and other known binaries.
2016-01-26 22:15:38 -08:00
Tyler J. Stachecki
15b4998b85 ai: Fire interrupts at proper time with -noaudio. 2016-01-27 01:00:12 -05:00
Tyler J. Stachecki
0a06f8850f Restore audio in Windows builds with -multithread.
Fixed the bug that was causing this a little bit ago.
2016-01-27 00:44:38 -05:00