Kaizen/external/capstone/suite/auto-sync/RefactorGuide.md

84 lines
5.5 KiB
Markdown

# Refactor guide
This is a step by step overview how to refactor an architecture.
It can also be used to add a new architecture module. As long as it is supported by LLVM or a fork of it.
Please always contact us in the [Auto-Sync tracking issue](https://github.com/capstone-engine/capstone/issues/2015)
before working on a module.
We can provide support and save you a lot of time.
Don't hesitate to ask any questions in our [Telegram Community channel](https://t.me/CapstoneEngine).
Especially if you feel stuck or struggle to understand where an issue is coming from.
The update process is, although already simplified, relatively complex.
## Refactoring
Note:
- If we talk about C++ files in the steps below, we always refer to the files in the LLVM repo.
- `PrinterCapstone` is the class defined in `llvm-capstone/llvm/utils/TabelGen/PrinterCapstone.cpp`
- Always attempt to make the translated C file behave as closely as possible to the original C++ file! This greatly helps debugging and assures that Capstone behaves almost exactly the same as original LLVM.
- ### Prepare
- Read `CONTRIBUTING.md`
- Read `docs/ARCHITECTURE.md`
- Read `suite/auto-sync/README.md`
- Read `suite/auto-sync/ARCHITECTURE.md`
- Read `suite/auto-sync/intro.md`
- Delete all files in `arch/<ARCH>/`, except the `ARCHModule.*` and `ARCHMapping.*`.
- `cd suite/auto-sync/`
- ### Generate `inc` files
- `pip install -e .`
- Clone and build `llvm-tblgen` (see docs)
- Quickly check options of the updater `ASUpdater -h`
- Add Arch name in `Target.py`
- In [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) handle arch in `PrinterCapstone.cpp::decoderEmitterEmitFieldFromInstruction()` (add decoder function)
- Generate: `ASUpdater -s IncGen -a ARCH`
- Errors? Check if the error message tells you what to do. If no hint exists, ask us.
- Check if `inc` files in `build` look good.
- ### Translation and Patching
- Check for template functions in `<ARCH>InstPrinter.cpp` and `<ARCH>Disassember.cpp`
- Copy new config in `arch_conf.json` (LoongArch for a minimal example).
- Don't forget to add `ARCHIntPrinter.cpp` to the list of the `AddCSDetail` tests!
- Add as a minimum the `<ARCH>InstPrinter.cpp`, `<ARCH>InstPrinter.h` and `<ARCH>Disassembler.cpp` to the translation list.
- Tip: The variables use in there are defined in `path_vars.json`
- Add architecture specific includes in `Patches/Includes.py`. Copy the code from another architecture for the beginning.
- Prepare API header (`<arch>.h`) for patching:
- Check the generated `inc` files. Files names like `<ARCH>GenCS<something>Enum.inc` contain enumerations for the header. Those get patched into the main header file of the architecture.
- Remove old values and add `// generated content <...> begin` comments for patching. Checkout `longarch.h` as example.
- Commit all changes so far.
- The next step will write to the `arch/` and `include/capstone/<arch>.h` header!
- Run generation, translation and copy/patch the files: `ASUpdater -a <ARCH> -w --copy-translated -s IncGen Translate PatchArchHeader`
- ### Clean up
- #### Check: All necessary files
- Arch header:
- Invalid characters in enum identifiers? Replace char in `PrinterCapstone::normalizedMnemonic`
- In `arch/<ARCH>`
- Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update `Include.py`. If not, you have to find the LLVM source file where they are defined and add it to the `arch_config.json` to translate it.
- OR it needs the `SystemOperands.inc` file. Also can be generated by adding the arch to the list in `inc_gen.json`.
- Note: When you start the next step, you likely don't want to generate, translate and copy files again. Because your had-made fixes get overwritten. So ensure you no longer use the `-w` flag for the `ASUpdater` and you checked thoroughly that all necessary files got translated!
- Commit to save changes so far.
- #### Remove and fix C++ syntax
- Remove all **obvious irrelevant** C++ code from the translated files (e.g. class initializes)
- Double check non-obvious cases, if they are important. Rember: removing something might lead to bugs later!
- If in doubt, ask us.
- If you fix the same syntax over and over again, consider adding a Patch for the `CppTranslator`.
- Common problems:
- Missing namespace prefix `unsigned GR32Regs[]` should be `unsigned ARCH_GR32Regs[]`. See `namespace begin/end` comments in the code.
- TODO: Add more.
- If in doubt, check the original C++ file in the LLVM repo.
- ### Make it build
- Add `ARCHLinkage.h` and the functions in the `InstPrinter.c`, `ArchDisassembler.c`.
- Add essential code in `ARCHMapping.c`. Esential is everything **not** releated to details.
- If unsure how to do Capstone <-> LLVM code things, always check LoongArch. If LoongArch doesn't handle this case, check Mips, SystemZ
- ### Run tests & Fixing bugs
- Update regression MC tests: Map LLVM `mattr` and `mcpu` names to the CS identifiers if necessary. -> Edit the `mcupdater.json` config file.
- Update tests: `ASUpdater -s MCUpdate -a Arch -w`
- Run MC tests: `cstest tests/MC/Arch`
- ### Add details
- Effectively copy behavior from `LoongArchMapping.c` or `SystemZMapping.c` but change values.
- Changes to the API (structs in `arch.h`) are only allowed if it was wrong before. Otherwise only extensions.
- Don't forget to update the Python bindings.
- Run detail tests to check results.
- Run detail tests with coverage. `ArchMapping.c` should be covered near 100%