bsnes

mirror of https://github.com/bsnes-emu/bsnes.git synced 2025-04-02 10:42:14 -04:00

Author	SHA1	Message	Date
Lior Halphon	f9236d12bf	Improvements to the help command and general debugger usability.	2016-08-13 22:52:41 +03:00
Tim Allen	ac2d0ba1cf	Update to v101r05 release. byuu says: Changelog: - 68K: fixed bug that affected BSR return address - VDP: added very preliminary emulation of planes A, B, W (W is entirely broken though) - VDP: added command/address stuff so you can write to VRAM, CRAM, VSRAM - VDP: added VRAM fill DMA I would be really surprised if any commercial games showed anything at all, so I'd probably recommend against wasting your time trying, unless you're really bored :P Also, I wanted to add: I am accepting patches\! So if anyone wants to look over the 68K core for bugs, that would save me untold amounts of time in the near future :D	2016-08-13 09:47:30 +10:00
Lior Halphon	e79ddee705	Basic memory hex viewer/editor, using a (heavily stripped down) HexFiend framework	2016-08-13 00:58:52 +03:00
Tim Allen	1df2549d18	Update to v101r04 release. byuu says: Changelog: - pulled the (u)intN type aliases into higan instead of leaving them in nall - added 68K LINEA, LINEF hooks for illegal instructions - filled the rest of the 68K lambda table with generic instance of ILLEGAL - completed the 68K disassembler effective addressing modes - still unsure whether I should use An to decode absolute addresses or not - pro: way easier to read where accesses are taking place - con: requires An to be valid; so as a disassembler it does a poor job - making it optional: too much work; ick - added I/O decoding for the VDP command-port registers - added skeleton timing to all five processor cores - output at 1280x480 (needed for mixed 256/320 widths; and to handle interlace modes) The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and syncing; which is the pathological worst case for libco. But they also have no logic inside of them. With all the above, I'm averaging around 250fps with just the 68K core actually functional, and the VDP doing a dumb "draw white pixels" loop. Still way too early to tell how this emulator is going to perform. Also, the 320x240 mode of the Genesis means that we don't need an aspect correction ratio. But we do need to ensure the output window is a multiple 320x240 so that the scale values work correctly. I was hard-coding aspect correction to stretch the window an additional \*8/7. But that won't work anymore so ... the main higan window is now 640x480, 960x720, or 1280x960. Toggling aspect correction only changes the video width inside the window. It's a bit jarring ... the window is a lot wider, more black space now for most modes. But for now, it is what it is.	2016-08-12 11:07:04 +10:00
Tim Allen	9b8c3ff8c0	Update to v101r03 release. byuu says: The 68K core now implements all 88 instructions. It ended up being 111 instructions in my core due to splitting up opcodes with the same name but different addressing modes or directions (removes conditions at the expense of more code.) Technically, I don't have exceptions actually implemented yet, and RESET/STOP don't do anything but set flags. So there's still more to go. But ... close enough for statistics time! The M68K core source code is 124,712 bytes in size. The next largest core is the ARM7 core at 70,203 bytes in size. The M68K object size is 942KiB; with the next largest being the V30MZ core at 173KiB. There are a total of 19,656 invalid opcodes in the 68000 revision (unless of course I've made mistakes in my mappings, which is very probably.) Now the fun part ... figuring out how to fix bugs in this core without VDP emulation :/	2016-08-11 08:02:02 +10:00
Lior Halphon	806d0775a4	Added backtrace command to debugger	2016-08-09 22:48:53 +03:00
Tim Allen	0a57cac70c	Update to v101r02 release. byuu says: Changelog: - Emulator: use `(uintmax)-1 >> 1` for the units of time - MD: implemented 13 new 68K instructions (basically all of the remaining easy ones); 21 remain - nall: replaced `(u)intmax_t` (64-bit) with actual `(u)intmax` type (128-bit where available) - this extends to everything: atoi, string, etc. You can even print 128-bit variables if you like 22,552 opcodes still don't exist in the 68K map. Looking like quite a few entries will be blank once I finish.	2016-08-09 21:07:18 +10:00
Tim Allen	8bdf8f2a55	Update to v101r01 release. byuu says: Changelog: - added eight more 68K instructions - split ADD(direction) into two separate ADD functions I now have 54 out of 88 instructions implemented (thus, 34 remaining.) The map is missing 25,182 entries out of 65,536. Down from 32,680 for v101.00 Aside: this version number feels really silly. r10 and r11 surely will as well ...	2016-08-08 20:12:03 +10:00
Tim Allen	e39987a3e3	Update to v101 release. byuu says (in the public announcement): Not a large changelog this time, sorry. This release is mostly to fix the SA-1 issue, and to get some real-world testing of the new scheduler model. Most of the work in the past month has gone into writing a 68000 CPU core; yet it's still only about half-way finished. Changelog (since the previous release): - fixed SNES SA-1 IRQ regression (fixes Super Mario RPG level-up screen) - new scheduler for all emulator cores (precision of 2^-127) - icarus database adds nine new SNES games - added Input/Frequency to settings file (allows simulation of latency) byuu says (in the WIP forum): Changelog: - in 32-bit mode, Thread uses uint64\_t with 2^-63 time units (10^-7 precision in the worst case) - nearly ten times the precision of an attosecond - in 64-bit mode, Thread uses uint128\_t with 2^-127 time units (10^-26 precision in the worst case) - far more accurate than yoctoseconds; almost closing in on planck time Note: a quartz crystal is accurate to 10^-4 or 10^-5. A cesium fountain atomic clock is accurate to 10^-15. So ... yeah. 2^-63 was perfectly fine; but there was no speed penalty whatsoever for using uint128\_t in 64-bit mode, so why not?	2016-08-08 20:04:15 +10:00
Lior Halphon	a5670b6643	Fixed boot ROM trimming	2016-08-07 00:39:32 +03:00
Lior Halphon	109af49933	Updated DMG boot ROM to finish with the same register values as the original boot ROM	2016-08-06 19:11:54 +03:00
Lior Halphon	bebb5c7a41	Correctly emulating the unused OAM memory in DMG mode	2016-08-06 18:58:44 +03:00
Lior Halphon	cc8664b0a8	Correctly emulating a disconnected serial cable	2016-08-06 18:57:33 +03:00
Lior Halphon	af10e07ed7	Initing OBP0/1 correctly	2016-08-06 18:57:13 +03:00
Lior Halphon	5816b6a688	Updated change log and incremented version to 0.6	2016-08-06 17:16:39 +03:00
Lior Halphon	e95d2c4abe	Fixed DI instruction on CGB	2016-08-06 17:16:38 +03:00
Lior Halphon	68740c70e4	Stripping executables on release to reduce file size	2016-08-06 16:19:04 +03:00
Lior Halphon	722550c5bc	Enabled link time optimization when building in release, improving speed by about 6%	2016-08-06 16:18:23 +03:00
Lior Halphon	553f700b79	Fixed needless deep generation, which caused errors when compiling the Cocoa GUI when SDL is not installed	2016-08-06 15:57:32 +03:00
Lior Halphon	d03a1fbd16	Fixed TMA writing while reloading.	2016-08-06 14:36:33 +03:00
Lior Halphon	85a33ed8ef	Emulating DMA delay correctly	2016-08-06 14:24:43 +03:00
Lior Halphon	4a50000e83	Corrected timing for many instructions	2016-08-06 14:00:35 +03:00
Lior Halphon	8dd5462525	Correct DMA timing	2016-08-06 13:57:38 +03:00
Lior Halphon	0f98ac5ff9	Emulate TIMA reloading	2016-08-06 13:56:29 +03:00
Lior Halphon	55cbe5d4d0	Accuracy improvements to timers	2016-08-06 00:24:12 +03:00
Lior Halphon	d098458ee4	Major improvements to accuracy: Fixed instruction timing, DMA timing, and IO reg masking. Passes most of mooneye-gb acceptance tests.	2016-08-05 16:36:38 +03:00
Lior Halphon	47e3300b66	Improved DMA accuracy, mooneyegb test ROMs no longer crash miserably. (but still fail)	2016-08-03 23:31:10 +03:00
Lior Halphon	fad1007427	Merge branch 'master' of https://github.com/LIJI32/SameBoy	2016-08-03 22:28:28 +03:00
Tim Allen	f5e5bf1772	Update to v100r16 release. byuu says: (Windows users may need to include <sys/time.h> at the top of nall/chrono.hpp, not sure.) Unchangelog: - forgot to add the Scheduler clock=0 fix because I have the memory of a goldfish Changelog: - new icarus database with nine additional games - hiro(GTK,Qt) won't constantly write its settings.bml file to disk anymore - added latency simulator for fun (settings.bml => Input/Latency in milliseconds) So the last one ... I wanted to test out nall::chrono, and I was also thinking that by polling every emulated frame, it's pretty wasteful when you are using Fast Forward and hitting 200+fps. As I've said before, calls to ruby::input::poll are not cheap. So to get around this, I added a limiter so that if you called the hardware poll function within N milliseconds, it'll return without doing any actual work. And indeed, that increases my framerate of Zelda 3 uncapped from 133fps to 142fps. Yay. But it's not a "real" speedup, as it only helps you when you exceed 100% speed (theoretically, you'd need to crack 300% speed since the game itself will poll at 16ms at 100% speed, but yet it sped up Zelda 3, so who am I to complain?) I threw the latency value into the settings file. It should be 16, but I set it to 5 since that was the lowest before it started negatively impacting uncapped speeds. You're wasting your time and CPU cycles setting it lower than 5, but if people like placebo effects it might work. Maybe I should let it be a signed integer so people can set it to -16 and think it's actually faster :P (I'm only joking. I took out the 96000hz audio placebo effect as well. Not really into psychological tricks anymore.) But yeah seriously, I didn't do this to start this discussion again for the billionth time. Please don't go there. And please don't tell me this WIP has higher/lower latency than before. I don't want to hear it. The only reason I bring it up is for the fun part that is worth discussing: put up or shut up time on how sensitive you are to latency! You can set the value above 5 to see how games feel. I personally can't really tell a difference until about 50. And I can't be 100% confident it's worse until about 75. But ... when I set it to 150, games become "extra difficult" ... the higher it goes, the worse it gets :D For this WIP, I've left no upper limit cap. I'll probably set a cap of something like 500ms or 1000ms for the official release. Need to balance user error/trolling with enjoyability. I'll think about it. [...] Now, what I worry about is stupid people seeing it and thinking it's an "added latency" setting, as if anyone would intentionally make things worse by default. This is a limiter. So if 5ms have passed since the game last polled, and that will be the case 99.9% of the time in games, the next poll will happen just in time, immediately when the game polls the inputs. Thus, a value below 1/<framerate>ms is not only pointless, if you go too low it will ruin your fast forward max speeds. I did say I didn't want to resort to placebo tricks, but I also don't want to spark up public discussion on this again either. So it might be best to default Input/Latency to 0ms, and internally have a max(5, latency) wrapper around the value.	2016-08-03 22:32:40 +10:00
Tim Allen	c50723ef61	Update to v100r15 release. byuu wrote: Aforementioned scheduler changes added. Longer explanation of why here: http://hastebin.com/raw/toxedenece Again, we really need to test this as thoroughly as possible for regressions :/ This is a really major change that affects absolutely everything: all emulation cores, all coprocessors, etc. Also added ADDX and SUB to the 68K core, which brings us just barely above 50% of the instruction encoding space completed. [Editor's note: The "aformentioned scheduler changes" were described in a previous forum post: Unfortunately, 64-bits just wasn't enough precision (we were getting misalignments ~230 times a second on 21/24MHz clocks), so I had to move to 128-bit counters. This of course doesn't exist on 32-bit architectures (and probably not on all 64-bit ones either), so for now ... higan's only going to compile on 64-bit machines until we figure something out. Maybe we offer a "lower precision" fallback for machines that lack uint128_t or something. Using the booth algorithm would be way too slow. Anyway, the precision is now 2^-96, which is roughly 10^-29. That puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly referring to it as the byuusecond. The other 32-bits of precision allows a 1Hz clock to run up to one full second before all clocks need to be normalized to prevent overflow. I fixed a serious wobbling issue where I was using clock > other.clock for synchronization instead of clock >= other.clock; and also another aliasing issue when two threads share a common frequency, but don't run in lock-step. The latter I don't even fully understand, but I did observe it in testing. nall/serialization.hpp has been extended to support 128-bit integers, but without explicitly naming them (yay generic code), so nall will still compile on 32-bit platforms for all other applications. Speed is basically a wash now. FC's a bit slower, SFC's a bit faster. The "longer explanation" in the linked hastebin is: Okay, so the idea is that we can have an arbitrary number of oscillators. Take the SNES: - CPU/PPU clock = 21477272.727272hz - SMP/DSP clock = 24576000hz - Cartridge DSP1 clock = 8000000hz - Cartridge MSU1 clock = 44100hz - Controller Port 1 modem controller clock = 57600hz - Controller Port 2 barcode battler clock = 115200hz - Expansion Port exercise bike clock = 192000hz Is this a pathological case? Of course it is, but it's possible. The first four do exist in the wild already: see Rockman X2 MSU1 patch. Manifest files with higan let you specify any frequency you want for any component. The old trick higan used was to hold an int64 counter for each thread:thread synchronization, and adjust it like so: - if thread A steps X clocks; then clock += X * threadB.frequency - if clock >= 0; switch to threadB - if thread B steps X clocks; then clock -= X * threadA.frequency - if clock < 0; switch to threadA But there are also system configurations where one processor has to synchronize with more than one other processor. Take the Genesis: - the 68K has to sync with the Z80 and PSG and YM2612 and VDP - the Z80 has to sync with the 68K and PSG and YM2612 - the PSG has to sync with the 68K and Z80 and YM2612 Now I could do this by having an int64 clock value for every association. But these clock values would have to be outside the individual Thread class objects, and we would have to update every relationship's clock value. So the 68K would have to update the Z80, PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds per clock step event instead of one. As such, we have to account for both possibilities. The only way to do this is with a single time base. We do this like so: - setup: scalar = timeBase / frequency - step: clock += scalar * clocks Once per second, we look at every thread, find the smallest clock value. Then subtract that value from all threads. This prevents the clock counters from overflowing. Unfortunately, these oscillator values are psychotic, unpredictable, and often times repeating fractions. Even with a timeBase of 1,000,000,000,000,000,000 (one attosecond); we get rounding errors every ~16,300 synchronizations. Specifically, this happens with a CPU running at 21477273hz (rounded) and SMP running at 24576000hz. That may be good enough for most emulators, but ... you know how I am. Plus, even at the attosecond level, we're really pushing against the limits of 64-bit integers. Given the reciprocal inverse, a frequency of 1Hz (which does exist in higan!) would have a scalar that consumes 1/18th of the entire range of a uint64 on every single step. Yes, I could raise the frequency, and then step by that amount, I know. But I don't want to have weird gotchas like that in the scheduler core. Until I increase the accuracy to about 100 times greater than a yoctosecond, the rounding errors are too great. And since the only choice above 64-bit values is 128-bit values; we might as well use all the extra headroom. 2^-96 as a timebase gives me the ability to have both a 1Hz and 4GHz clock; and run them both for a full second; before an overflow event would occur. Another hastebin includes demonstration code: #include <libco/libco.h> #include <nall/nall.hpp> using namespace nall; // cothread_t mainThread = nullptr; const uint iterations = 100'000'000; const uint cpuFreq = 21477272.727272 + 0.5; const uint smpFreq = 24576000.000000 + 0.5; const uint cpuStep = 4; const uint smpStep = 5; // struct ThreadA { cothread_t handle = nullptr; uint64 frequency = 0; int64 clock = 0; auto create(auto (entrypoint)() -> void, uint frequency) { this->handle = co_create(65536, entrypoint); this->frequency = frequency; this->clock = 0; } }; struct CPUA : ThreadA { static auto Enter() -> void; auto main() -> void; CPUA() { create(&CPUA::Enter, cpuFreq); } } cpuA; struct SMPA : ThreadA { static auto Enter() -> void; auto main() -> void; SMPA() { create(&SMPA::Enter, smpFreq); } } smpA; uint8 queueA[iterations]; uint offsetA; cothread_t resumeA = cpuA.handle; auto EnterA() -> void { offsetA = 0; co_switch(resumeA); } auto QueueA(uint value) -> void { queueA[offsetA++] = value; if(offsetA >= iterations) { resumeA = co_active(); co_switch(mainThread); } } auto CPUA::Enter() -> void { while(true) cpuA.main(); } auto CPUA::main() -> void { QueueA(1); smpA.clock -= cpuStep smpA.frequency; if(smpA.clock < 0) co_switch(smpA.handle); } auto SMPA::Enter() -> void { while(true) smpA.main(); } auto SMPA::main() -> void { QueueA(2); smpA.clock += smpStep * cpuA.frequency; if(smpA.clock >= 0) co_switch(cpuA.handle); } // struct ThreadB { cothread_t handle = nullptr; uint128_t scalar = 0; uint128_t clock = 0; auto print128(uint128_t value) { string s; while(value) { s.append((char)('0' + value % 10)); value /= 10; } s.reverse(); print(s, "\n"); } //femtosecond (10^15) = 16306 //attosecond (10^18) = 688838 //zeptosecond (10^21) = 13712691 //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble) //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond) auto create(auto (entrypoint)() -> void, uint128_t frequency) { this->handle = co_create(65536, entrypoint); uint128_t unitOfTime = 1; //for(uint n : range(29)) unitOfTime = 10; unitOfTime <<= 96; //2^96 time units ... this->scalar = unitOfTime / frequency; print128(this->scalar); this->clock = 0; } auto step(uint128_t clocks) -> void { clock += clocks * scalar; } auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); } }; struct CPUB : ThreadB { static auto Enter() -> void; auto main() -> void; CPUB() { create(&CPUB::Enter, cpuFreq); } } cpuB; struct SMPB : ThreadB { static auto Enter() -> void; auto main() -> void; SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; } } smpB; auto correct() -> void { auto minimum = min(cpuB.clock, smpB.clock); cpuB.clock -= minimum; smpB.clock -= minimum; } uint8 queueB[iterations]; uint offsetB; cothread_t resumeB = cpuB.handle; auto EnterB() -> void { correct(); offsetB = 0; co_switch(resumeB); } auto QueueB(uint value) -> void { queueB[offsetB++] = value; if(offsetB >= iterations) { resumeB = co_active(); co_switch(mainThread); } } auto CPUB::Enter() -> void { while(true) cpuB.main(); } auto CPUB::main() -> void { QueueB(1); step(cpuStep); synchronize(smpB); } auto SMPB::Enter() -> void { while(true) smpB.main(); } auto SMPB::main() -> void { QueueB(2); step(smpStep); synchronize(cpuB); } // #include <nall/main.hpp> auto nall::main(string_vector) -> void { mainThread = co_active(); uint masterCounter = 0; while(true) { print(masterCounter++, " ...\n"); auto A = clock(); EnterA(); auto B = clock(); print((double)(B - A) / CLOCKS_PER_SEC, "s\n"); auto C = clock(); EnterB(); auto D = clock(); print((double)(D - C) / CLOCKS_PER_SEC, "s\n"); for(uint n : range(iterations)) { if(queueA[n] != queueB[n]) return print("fail at ", n, "\n"); } } } ...and that's everything.]	2016-07-31 12:11:20 +10:00
Tim Allen	ca277cd5e8	Update to v100r14 release. byuu says: (Windows: compile with -fpermissive to silence an annoying error. I'll fix it in the next WIP.) I completely replaced the time management system in higan and overhauled the scheduler. Before, processor threads would have "int64 clock"; and there would be a 1:1 relationship between two threads. When thread A ran for X cycles, it'd subtract X * B.Frequency from clock; and when thread B ran for Y cycles, it'd add Y * A.Frequency from clock. This worked well and allowed perfect precision; but it doesn't work when you have more complicated relationships: eg the 68K can sync to the Z80 and PSG; the Z80 to the 68K and PSG; so the PSG needs two counters. The new system instead uses a "uint64 clock" variable that represents time in attoseconds. Every time the scheduler exits, it subtracts the smallest clock count from all threads, to prevent an overflow scenario. The only real downside is that rounding errors mean that roughly every 20 minutes, we have a rounding error of one clock cycle (one 20,000,000th of a second.) However, this only applies to systems with multiple oscillators, like the SNES. And when you're in that situation ... there's no such thing as a perfect oscillator anyway. A real SNES will be thousands of times less out of spec than 1hz per 20 minutes. The advantages are pretty immense. First, we obviously can now support more complex relationships between threads. Second, we can build a much more abstracted scheduler. All of libco is now abstracted away completely, which may permit a state-machine / coroutine version of Thread in the future. We've basically gone from this: auto SMP::step(uint clocks) -> void { clock += clocks * (uint64)cpu.frequency; dsp.clock -= clocks; if(dsp.clock < 0 && !scheduler.synchronizing()) co_switch(dsp.thread); if(clock >= 0 && !scheduler.synchronizing()) co_switch(cpu.thread); } To this: auto SMP::step(uint clocks) -> void { Thread::step(clocks); synchronize(dsp); synchronize(cpu); } As you can see, we don't have to do multiple clock adjustments anymore. This is a huge win for the SNES CPU that had to update the SMP, DSP, all peripherals and all coprocessors. Likewise, we don't have to synchronize all coprocessors when one runs, now we can just synchronize the active one to the CPU. Third, when changing the frequencies of threads (think SGB speed setting modes, GBC double-speed mode, etc), it no longer causes the "int64 clock" value to be erroneous. Fourth, this results in a fairly decent speedup, mostly across the board. Aside from the GBA being mostly a wash (for unknown reasons), it's about an 8% - 12% speedup in every other emulation core. Now, all of this said ... this was an unbelievably massive change, so ... you know what that means >_> If anyone can help test all types of SNES coprocessors, and some other system games, it'd be appreciated. ---- Lastly, we have a bitchin' new about screen. It unfortunately adds ~200KiB onto the binary size, because the PNG->C++ header file transformation doesn't compress very well, and I want to keep the original resource files in with the higan archive. I might try some things to work around this file size increase in the future, but for now ... yeah, slightly larger archive sizes, sorry. The logo's a bit busted on Windows (the Label control's background transparency and alignment settings aren't working), but works well on GTK. I'll have to fix Windows before the next official release. For now, look on my Twitter feed if you want to see what it's supposed to look like. ---- EDIT: forgot about ICD2::Enter. It's doing some weird inverse run-to-save thing that I need to implement support for somehow. So, save states on the SGB core probably won't work with this WIP.	2016-07-30 13:56:12 +10:00
Tim Allen	306cac2b54	Update to v100r13 release. byuu says: Changelog: M68K improvements, new instructions added.	2016-07-26 20:46:43 +10:00
Tim Allen	f230d144b5	Update to v100r12 release. byuu says: All of the above fixes, plus I added all 24 variations on the shift opcodes, plus SUBQ, plus fixes to the BCC instruction. I can now run 851,767 instructions into Sonic the Hedgehog before hitting an unimplemented instruction (SUB). The 68K core is probably only ~35% complete, and yet it's already within 4KiB of being the largest CPU core, code size wise, in all of higan. Fuck this chip.	2016-07-25 23:15:54 +10:00
Tim Allen	7ccfbe0206	Update to v100r11 release. byuu says: I split the Register class and read/write handlers into DataRegister and AddressRegister, given that they have different behaviors on byte/word accesses (data tends to preserve the upper bits; address tends to sign-extend things.) I expanded EA to EffectiveAddress. No sense in abbreviating things to death. I've now implemented 26 instructions. But the new ones are just all the stupid from/to ccr/sr instructions. Ryphecha confirmed that you can't set the undefined bits, so I don't think the BitField concept is appropriate for the CCR/SR. Instead, I'm just storing direct flags and have (read,write)(CCR,SR) instead. This isn't like the 65816 where you have subroutines that push and pop the flag register. It's much more common to access individual flags. Doesn't match the consistency angle of the other CPU cores, but ... I think this is the right thing to for the 68K specifically.	2016-07-23 12:32:35 +10:00
Tim Allen	4b897ba791	Update to v100r10 release. byuu says: Redesigned the handling of reading/writing registers to be about eight times faster than the old system. More work may be needed ... it seems data registers tend to preserve their upper bits upon assignment; whereas address registers tend to sign-extend values into them. It may make sense to have DataRegister and AddressRegister classes with separate read/write handlers. I'd have to hold two Register objects inside the EffectiveAddress (EA) class if we do that. Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets the first ~530,000 instructions in Sonic the Hedgehog running (though probably wrong. But we can run a lot thanks to large initialization loops.) If I force the core to loop back to the reset vector on an invalid opcode, I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if that's good or not. I guess we'll find out. I had to stop tonight because the final opcode I execute is an RTS (return from subroutine) that's branching back to address 0; which is invalid ... meaning something went terribly wrong and the system crashed.	2016-07-22 22:03:25 +10:00
Lior Halphon	e6d4cac00e	Fix logical bug when changing watchpoint flags	2016-07-21 15:20:25 +03:00
Lior Halphon	185e71fe12	Improvements to IR API, since timing is VERY important	2016-07-21 01:03:13 +03:00
Lior Halphon	b740b7f3ba	Fixed Cocoa memory leak	2016-07-20 23:52:29 +03:00
Tim Allen	be3f6ac0d5	Update to v100r09 release. byuu says: Another six hours in ... I have all of the opcodes, memory access functions, disassembler mnemonics and table building converted over to the new template<uint Size> format. Certainly, it would be quite easy for this nightmare chip to throw me another curveball, but so far I can handle: - MOVE (EA to, EA from) case - read(from) has to update register index for +/-(aN) mode - MOVEM (EA from) case - when using +/-(aN), RA can't actually be updated until the transfer is completed - LEA (EA from) case - doesn't actually perform the final read; just returns the address to be read from - ANDI (EA from-and-to) case - same EA has to be read from and written to - for -(aN), the read has to come from aN-2, but can't update aN yet; so that the write also goes to aN-2 - no opcode can ever fetch the extension words more than once - manually control the order of extension word fetching order for proper opcode decoding To do all of that without a whole lot of duplicated code (or really bloating out every single instruction with red tape), I had to bring back the "bool valid / uint32 address" variables inside the EA struct =( If weird exceptions creep in like timing constraints only on certain opcodes, I can use template flags to the EA read/write functions to handle that.	2016-07-19 19:12:05 +10:00
Lior Halphon	1d35c04ab1	Infrared API	2016-07-18 22:11:18 +03:00
Lior Halphon	0fbc72f197	SDL save states	2016-07-18 14:37:06 +03:00
Lior Halphon	da0911d69b	Fixed SDL crash	2016-07-18 14:30:21 +03:00
Lior Halphon	b30822fd0b	Async commands in SDL port, better handling of ^C and ^D	2016-07-18 13:10:19 +03:00
Tim Allen	92fe5b0813	Update to v100r08 release. byuu says: Six and a half hours this time ... one new opcode, and all old opcodes now in a deprecated format. Hooray, progress! For building the table, I've decided to move from: for(uint opcode : range(65536)) { if(match(...)) bind(opNAME, ...); } To instead having separate for loops for each supported opcode. This lets me specialize parts I want with templates. And to this aim, I'm moving to replace all of the (read,write)(size, ...) functions with (read,write)<Size>(...) functions. This will amount to the ~70ish instructions being triplicated ot ~210ish instructions; but I think this is really important. When I was getting into flag calculations, a ton of conditionals were needed to mask sizes to byte/word/long. There was also lots of conditionals in all the memory access handlers. The template code is ugly, but we eliminate a huge amount of branch conditions this way.	2016-07-18 08:11:29 +10:00
Lior Halphon	aa6438fa06	Async debugger commands	2016-07-18 00:46:45 +03:00
Lior Halphon	67f3a3a9d8	Symbol support in SDL port	2016-07-17 23:08:07 +03:00
Lior Halphon	9d53760016	Fixing Linux build	2016-07-17 22:43:23 +03:00
Tim Allen	059347e575	Update to v100r07 release. byuu says: Four and a half hours of work and ... zero new opcodes implemented. This was the best job I could do refining the effective address computations. Should have all twelve 68000 modes implemented now. Still have a billion questions about when and how I'm supposed to perform certain edge case operations, though.	2016-07-17 13:24:28 +10:00
Tim Allen	0d6a09f9f8	Update to v100r06 release. byuu says: Up to ten 68K instructions out of somewhere between 61 and 88, depending upon which PDF you look at. Of course, some of them aren't 100% completed yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant that needs stack push/pop functions. This WIP actually took over eight hours to make, going through every possible permutation on how to design the core itself. The updated design now builds both the instruction decoder+dispatcher and the disassembler decoder into the same main loop during M68K's constructor. The special cases are also really psychotic on this processor, and I'm afraid of missing something via the fallthrough cases. So instead, I'm ordering the instructions alphabetically, and including exclusion cases to ignore binding invalid cases. If I end up remapping an existing register, then it'll throw a run-time assertion at program startup. I wanted very much to get rid of struct EA (EffectiveAddress), but it's too difficult to keep track of the internal effective address without it. So I split out the size to a separate parameter, since every opcode only has one size parameter, and otherwise it was getting duplicated in opcodes that take two EAs, and was also awkward with the flag testing. It's a bit more typing, but I feel it's more clean this way. Overall, I'm really worried this is going to be too slow. I don't want to turn the EA stuff into templates, because that will massively bloat out compilation times and object sizes, and will also need a special DSL preprocessor since C++ doesn't have a static for loop. I can definitely optimize a lot of EA's address/read/write functions away once the core is completed, but it's never going to hold a candle to a templatized 68K core. ---- Forgot to include the SA-1 regression fix. I always remember immediately after I upload and archive the WIP. Will try to get that in next time, I guess.	2016-07-16 18:39:44 +10:00
Lior Halphon	a68b06226a	Fixed crash on free	2016-07-15 23:20:14 +03:00

... 46 47 48 49 50 ...

3018 commits