Commit graph

340 commits

Author SHA1 Message Date
Henrik Rydgard
a0bf934796 ARM64: Some work on static allocation. Close to working, cube.elf runs 700 blocks but then hangs (?!) 2015-07-11 16:59:09 +02:00
Henrik Rydgard
698ef82452 ARM64: Fix vrot 2015-07-11 16:56:26 +02:00
Henrik Rydgard
9937b41461 ARM64: Fix vi2uc and vi2us and enable them. 2015-07-11 16:46:11 +02:00
Henrik Rydgard
dc2f6a30fb ARM64: Fix joining of lwl/lwr and swl/swr. "implement" the cache instruction. 2015-07-11 16:25:22 +02:00
Henrik Rydgard
1575025b3d ARM64: Store back fp registers in pairs where possible 2015-07-11 13:52:46 +02:00
Henrik Rydgard
35c65973c1 ARM64 jit: implement vuc2i, vc2i, vus2i, vs2i instructions 2015-07-11 13:25:58 +02:00
Henrik Rydgard
4a7ee6d6cd ARM64 jit: Implement vi2uc, vi2c, vi2us, vi2s instructions 2015-07-11 12:37:23 +02:00
Henrik Rydgard
a3b728dd1b ARM64 jit: Minor optimization of lv.q and sv.q 2015-07-08 11:59:48 +02:00
Henrik Rydgård
f3c5af570c Merge pull request #7849 from unknownbrackets/arm64-micro
Fix discarding imms and flushing zero in arm64
2015-07-05 21:11:28 +02:00
Unknown W. Brackets
db3dffb44d arm64: Oops, fix flushing zero from an armreg. 2015-07-05 11:57:18 -07:00
Unknown W. Brackets
204c1dc8dd arm64: Optimize 3ops against zero. 2015-07-05 09:52:53 -07:00
Henrik Rydgard
7011758e83 Move misplaced FlushIcache() in Arm64Asm.cpp 2015-07-05 10:03:52 +02:00
Unknown W. Brackets
003668fe66 armjit: Fix discarding imms. 2015-07-04 07:30:32 -07:00
Unknown W. Brackets
8ea7f99072 arm64: Fix imm wasting when STP doesn't work out. 2015-07-04 07:09:47 -07:00
Unknown W. Brackets
e6a7ba3fae arm64: Bring imms along for the STP ride. 2015-07-03 16:51:33 -07:00
Unknown W. Brackets
ca1e482a56 arm64: Avoid setting a reg to zero to store it. 2015-07-03 16:05:25 -07:00
Henrik Rydgård
82c66bc463 Merge pull request #7840 from unknownbrackets/arm64-micro
Flush using STP where possible in ARM64
2015-07-03 23:20:43 +02:00
Unknown W. Brackets
8fdceba7ca Add timing for all the basics.
This way we can see overall stats for a frame.
2015-07-03 12:05:08 -07:00
Unknown W. Brackets
90b7d135cb arm64: Flush in pairs if possible.
On an A57, this is around twice as fast (for just the STR/STR vs STP.)
2015-07-03 11:07:09 -07:00
Unknown W. Brackets
ddb955a527 arm64: Try to optimize imm stores.
If we already have a reg, we can use it.  This can happen when immediate
addresses are loaded and used as bases, although it's not super common.
2015-07-03 10:48:11 -07:00
Unknown W. Brackets
2331df8c70 arm64: Try to be more consistent in ZERO handling.
Let's keep it IMM where possible, even though we've added checks for
MIPS_REG_ZERO.
2015-07-03 10:21:24 -07:00
Unknown W. Brackets
66d85233b9 arm64: Flush only caller-saved regs before calls. 2015-07-03 10:09:43 -07:00
Unknown W. Brackets
66adc4e695 jit: Normalize CONDITIONAL_DISABLE formatting. 2015-07-02 20:31:37 -07:00
Unknown W. Brackets
fed687fb59 arm64: Meld LO and HI together for multiplies. 2015-07-02 20:31:37 -07:00
Unknown W. Brackets
1d1c80d9cf arm64: Use BFI for cfc1. 2015-07-02 20:31:35 -07:00
Unknown W. Brackets
757a1a414a arm64: Workaround an apparent gcc bug.
Only seems to happen with unsigned.  This took a while to track down...
2015-07-02 19:59:38 -07:00
Unknown W. Brackets
e94fd3d4bd arm64: Fix div/divu remainders.
Erp, I transposed the args when I pasted them.
2015-06-28 16:52:49 -07:00
Unknown W. Brackets
81b923f1dc arm64: Correct movz/movn. Weren't right after all. 2015-06-28 16:49:28 -07:00
Unknown W. Brackets
4d7a948717 arm64: Fix a dump mistake with rounding modes. 2015-06-28 16:35:46 -07:00
Unknown W. Brackets
b6612edf67 arm64: Use a cached rounding func for cvt.w.s.
This is much faster for this particular instruction, although not all
games even use it.
2015-06-28 12:40:29 -07:00
Unknown W. Brackets
1c163e4817 arm64: Avoid an ORR for c.ueq.
This is about 15% faster for this single, uncommon instruction on A57.
2015-06-28 10:52:17 -07:00
Unknown W. Brackets
febe435946 arm64: Use FP load/stores for non-reg pointers. 2015-06-28 10:45:44 -07:00
Unknown W. Brackets
213ad4bcc9 arm64: Cleanup branch code a tiny bit.
Want to make it clear that we can't kill W0 at this point (delay slots.)
2015-06-28 09:28:54 -07:00
Unknown W. Brackets
0978aa4d5e arm64: Use msub for div/divu remainder.
Not really much faster, but less instructions at least.
2015-06-28 09:05:39 -07:00
Unknown W. Brackets
0a5b1c030b arm64: Implement ext and ins. 2015-06-28 08:45:17 -07:00
Unknown W. Brackets
daddb73f22 arm64: Implement nor. 2015-06-28 00:41:04 -07:00
Unknown W. Brackets
11a851a139 arm64: Enable movz/movn. 2015-06-28 00:41:04 -07:00
Unknown W. Brackets
223e55a453 arm64: Undisable clz/clo, they work.
Also, avoid a temp in clo.  It's the tiniest bit faster on A57, though
we'll see how it works out elsewhere.  A bit clearer without the temp
imho.
2015-06-28 00:41:03 -07:00
Unknown W. Brackets
81bc8107cf arm64: Use UBFX, not LSR, for slti sign check.
This is about 22% faster on the A57 (for just this one instruction, so not
a huge impact overall.)  Makes sense that it would be since not arith.
2015-06-28 00:41:03 -07:00
Unknown W. Brackets
fedbe645e0 arm64: Use all immediate compares in safemem.
Ah, this is better.
2015-06-27 00:22:09 -07:00
Unknown W. Brackets
3c29ec2051 arm64: Optimize codesize in safemem path a bit.
Will only be used for scratchpad, I think.
2015-06-27 00:22:04 -07:00
Unknown W. Brackets
fbd4db0fc4 arm64: Add a safemem path.
This is probably not optimal but at least it works.
2015-06-27 00:22:04 -07:00
Unknown W. Brackets
b3aa6d89e9 Fix UBFX encoding (thanks SonicAdvance1.) 2015-06-26 21:27:03 -07:00
Henrik Rydgard
e848247f88 ARM64: Also save FP registers around the JIT dispatcher loop 2015-06-14 13:03:46 +02:00
Henrik Rydgard
2c05334d47 ARM64: Fix bug where we didn't save the FP registers correctly in the vertex decoder.
Also port a few ops from dolphin's ARM64 emitter.
2015-06-14 12:56:44 +02:00
Henrik Rydgård
70fa830ba5 Split out the ReplaceJalTo test logic.
This makes it so the IR, in the future, can work correctly for
replacements.
2015-04-12 13:35:10 -07:00
Henrik Rydgård
d014d420db Unify JitOptions across the backends.
This is required to make ExtractIR not a member of the various backends.
2015-04-12 11:41:26 -07:00
Henrik Rydgård
81dec36da8 Use an accessor to read the compilerPC.
In the IR it will be read from the block.
2015-04-11 01:14:37 -07:00
Henrik Rydgård
a897723e6a Separate out jit reading nearby instructions.
This makes it easier to use an IR for these things, or remove them.
2015-04-11 00:53:24 -07:00
Unknown W. Brackets
b0d291032d armjit Avoid cfc1/mfc1 to $0. 2015-04-07 18:30:36 -07:00