cen64

mirror of https://github.com/n64dev/cen64.git synced 2025-04-02 10:31:54 -04:00

Author	SHA1	Message	Date
Tyler Stachecki	1e86268eee	rsp: Fix SQV and SRV (more endianness issues).	2016-07-09 19:38:26 -04:00
Tyler Stachecki	1e47020ccc	rsp: Fix LQV bug (related to endianness).	2016-07-09 16:24:40 -04:00
Tyler Stachecki	cae6b6de78	rsp: Fix LBV bug (related to endianness).	2016-07-09 16:14:27 -04:00
Tyler Stachecki	6d3cd1e0d0	rsp: Fix link PC result (12th bit should not get set).	2016-07-09 13:30:05 -04:00
Tyler J. Stachecki	d905183b11	izy removed the LUT from bitwise operations. In addition to removal of all memory accesses from the functions, these functions also result in fewer executed instructions in some cases.	2016-03-16 22:59:22 -04:00
Tyler J. Stachecki	88c65ae630	Another great optimization from izy. izy managed to remove another LUT used in add/sub related insructions. The devil is in the details (see commit). <new>: 00000000004006b0 <rsp_addsub_mask>: 4006b0: c1 ef 02 shr $0x2,%edi 4006b3: 19 c0 sbb %eax,%eax 4006b5: c3 retq <old>: 00000000004006d0 <rsp_addsub_mask>: 4006d0: 83 e7 02 and $0x2,%edi 4006d3: 8b 04 bd 80 07 40 00 mov 0x400780(,%rdi,4),%eax 4006da: c3 retq "You see that this patch doesn't increase the amount of instructions. They are always two/three/four instructions and with automatic register selection. This is always the case with a MOV from memory... you can load to any register, but the same will happen with a SBB over itself. That is also the reason why when the function is inlined it won't require any special register (such as a the EAX:EDX pair, the "cltd" instruction you see in the 32 bit code is only a coincidence caused by the optimizations done by the gcc and isn't mandatory). The System V AMD64 calling convention puts the input parameter in rdi, but wherever the selector is placed nothing changes. The output parameter is in rax, but MOV/SBB can work with any register when inlined.	2016-02-07 14:01:00 -05:00
Tyler J. Stachecki	e12a459b18	More optimization patches from izy. izy noticed that the branch LUT was generating memory moves and could be replaced with an inlined function that coerces gcc into generating a lea in its place: 4005ac: 8d 1c 00 lea (%rax,%rax,1),%ebx 4005af: c1 fb 1f sar $0x1f,%ebx 4005b2: f7 d3 not %ebx (no memory access) 4005b9: c1 e8 1e shr $0x1e,%eax 4005bc: 83 e0 01 and $0x1,%eax 4005bf: 44 8b 24 85 90 07 40 mov 0x400790(,%rax,4),%r12d (original has memory access) This ends up optimizing branch instructions quite nicely: "You see that when you use "mask" you execute "~mask". The compiler understands that ~(~(partial_mask)) = partial_mask and removes both "NOTs". So in this case my version uses 2 instructions and no memory access/cache pollution."	2016-02-06 13:43:07 -05:00
Derek "Turtle" Roe	8b89df2fdc	See long description Replaced all references to simulation with emulation Updated copyright year Updated .gitignore to reduce chances of random files being uploaded to the repo Added .gitattributes to normalize all text files, and to ignore binary files (which includes the logo and the NEC PDF)	2015-07-01 18:44:21 -05:00
Tyler J. Stachecki	f4b182835c	Various small optimizations.	2015-05-08 09:58:18 -04:00
Tyler Stachecki	2c94219a9b	RSP: Fix scalar load-use stall.	2015-01-09 23:22:32 -05:00
Tyler Stachecki	79b02e4702	RSP: Optimize memory requests slightly.	2015-01-09 23:22:26 -05:00
Tyler Stachecki	321cf584f0	Remove some hacks from the RSP pipeline.	2015-01-08 12:17:06 -05:00
Tyler Stachecki	efc4e38793	Remove an old, unused function.	2015-01-06 02:18:49 -05:00
Tyler Stachecki	e63f8b08e3	Perform some really clever branch folding. Fold all the integer loads and stores into one code path.	2015-01-06 02:18:31 -05:00
Tyler Stachecki	a648cedc87	More cleanup of the fault/TLB code.	2015-01-04 15:38:56 -05:00
Tyler Stachecki	aa175bf6d6	Fix the JALR RSP bug, similar to last commit.	2015-01-04 12:18:03 -05:00
Tyler Stachecki	b52962aa19	Fix RSP bug that arises on BREAK.	2015-01-01 10:46:48 -05:00
Tyler Stachecki	8f17a516bc	Fix a stray memory copy.	2014-12-26 14:19:46 -05:00
Tyler Stachecki	3a969b2379	Do some general cleanup/optimization.	2014-12-26 14:19:46 -05:00
Tyler Stachecki	fea458e70c	Add (partial) implementations for LPV/LUV/SPV/SUV. Also, cleanup other SSSE3+ accelerated loads and stores.	2014-12-26 14:19:45 -05:00
Tyler Stachecki	9f9e3ebf80	Sort out a pair of RSP bugs.	2014-12-26 14:19:45 -05:00
Tyler Stachecki	dc008abe77	Fix more show-stopping RSP bugs.	2014-12-26 14:19:45 -05:00
Tyler Stachecki	173815ed63	Another bug: make sure memory requests get filled.	2014-12-26 14:19:45 -05:00
Tyler Stachecki	645f4b06ea	Minor cleanup to the RSP pipeline.	2014-12-26 14:19:45 -05:00
Tyler Stachecki	6faca60054	Start reworking RSP vector loads and stores.	2014-12-26 14:19:45 -05:00
Tyler Stachecki	2ee295a671	Fix RSP DMEM accesses. Up until now, the simulator assumed that DMEM accesses had to be aligned (similarly to the VR4300). This is not actually the case, so allow scalar memory access to arbitrary DMEM addresses.	2014-12-22 23:53:13 -05:00
Tyler Stachecki	c72f2c5028	Fix RSP alignment issues once and for all.	2014-12-19 20:03:03 -05:00
Tyler Stachecki	e89f054674	Optimize extremely aggressively. Tell GCC to optimize cold functions for size and stash them away in a separate part of the binary. Put the simulate core, meanwhile, on the hot path. Also, bump optimization to -O3 as we can now "afford" to do so.	2014-11-05 08:39:47 -05:00
Tyler Stachecki	89ecd417d8	Pack RSP results into a result structure.	2014-11-02 13:40:49 -05:00
Tyler Stachecki	c4612418ed	Implement VINV, fixup INV.	2014-11-02 11:57:26 -05:00
Tyler Stachecki	6f54353825	Fix another incorrect RSP branch target.	2014-11-02 10:29:19 -05:00
Tyler Stachecki	aaf56a0928	Make sure RSP branch targets don't escape IMEM.	2014-11-02 09:35:50 -05:00
Tyler Stachecki	c522b7cab0	Some minor tweaks/fixes to the SU pipeline.	2014-10-25 17:11:45 -04:00
Tyler Stachecki	304f667674	Implement several LWC2/SWC2 opcodes.	2014-10-25 14:03:26 -04:00
Tyler Stachecki	b9b989131f	More peephole optimizations.	2014-10-25 13:25:07 -04:00
Tyler Stachecki	0c64ae620b	Combine SLL, SLLV function logic.	2014-10-25 13:01:20 -04:00
Tyler Stachecki	87986a5037	Cut some instructions from execution functions. Extend a LUT by a couple of entries to avoid a shift at runtime.	2014-10-25 12:52:41 -04:00
Tyler Stachecki	85a21616cc	Micro-optimization: faster `li` emulation. If we think about how the assembler forms 32-bit immediates, it usually generates a lui and addiu pair. Well, if can craft the simulation such that lui and addiu are the same indirect target when branching to execution functions, we can reduce the chance that we'll mispredict and have a resulting pipeline flush on the host. Every cycle counts!	2014-10-25 12:40:27 -04:00
Tyler Stachecki	e698bfe1d1	Improving accuracy of RSP LWC2/SWC2 operations.	2014-10-25 02:06:30 -04:00
Tyler Stachecki	74327ef79e	Compress LQV/SQV into one function.	2014-10-24 23:56:42 -04:00
Tyler Stachecki	ba2ca6f427	Fix more byte-ordering issues. This was hard.	2014-10-24 23:43:24 -04:00
Tyler Stachecki	e63b13605e	Various LWC2/SWC2 fixes, add VSAR.	2014-10-24 21:07:25 -04:00
Tyler Stachecki	f395be631e	Start adding in support for LWC2/SWC2 ops: LQV/SQV.	2014-10-24 18:31:13 -04:00
Tyler Stachecki	421b0e0519	Implement some RSP DMEM reads and writes.	2014-10-18 11:34:09 -04:00
Tyler Stachecki	4ff41a0e34	Fix DMA/interrupt issues with the RSP.	2014-10-18 11:34:02 -04:00
Tyler Stachecki	df68d13733	FIx some PC-related bugs in the RSP.	2014-10-18 11:33:56 -04:00
Tyler Stachecki	b421093700	Start fleshing out the RSP frontend.	2014-10-18 11:33:14 -04:00
Tyler Stachecki	7ac625cec1	Implement RSP DMAs, COP0 registers, etc.	2014-10-18 11:32:51 -04:00
Tyler Stachecki	440c51fef2	Add modified functions for RSP.	2014-10-18 11:32:43 -04:00

49 commits