# Refactor guide This is a step by step overview how to refactor an architecture. It can also be used to add a new architecture module. As long as it is supported by LLVM or a fork of it. Please always contact us in the [Auto-Sync tracking issue](https://github.com/capstone-engine/capstone/issues/2015) before working on a module. We can provide support and save you a lot of time. Don't hesitate to ask any questions in our [Telegram Community channel](https://t.me/CapstoneEngine). Especially if you feel stuck or struggle to understand where an issue is coming from. The update process is, although already simplified, relatively complex. ## Refactoring Note: - If we talk about C++ files in the steps below, we always refer to the files in the LLVM repo. - `PrinterCapstone` is the class defined in `llvm-capstone/llvm/utils/TabelGen/PrinterCapstone.cpp` - Always attempt to make the translated C file behave as closely as possible to the original C++ file! This greatly helps debugging and assures that Capstone behaves almost exactly the same as original LLVM. - ### Prepare - Read `CONTRIBUTING.md` - Read `docs/ARCHITECTURE.md` - Read `suite/auto-sync/README.md` - Read `suite/auto-sync/ARCHITECTURE.md` - Read `suite/auto-sync/intro.md` - Delete all files in `arch//`, except the `ARCHModule.*` and `ARCHMapping.*`. - `cd suite/auto-sync/` - ### Generate `inc` files - `pip install -e .` - Clone and build `llvm-tblgen` (see docs) - Quickly check options of the updater `ASUpdater -h` - Add Arch name in `Target.py` - In [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) handle arch in `PrinterCapstone.cpp::decoderEmitterEmitFieldFromInstruction()` (add decoder function) - Generate: `ASUpdater -s IncGen -a ARCH` - Errors? Check if the error message tells you what to do. If no hint exists, ask us. - Check if `inc` files in `build` look good. - ### Translation and Patching - Check for template functions in `InstPrinter.cpp` and `Disassember.cpp` - Copy new config in `arch_conf.json` (LoongArch for a minimal example). - Don't forget to add `ARCHIntPrinter.cpp` to the list of the `AddCSDetail` tests! - Add as a minimum the `InstPrinter.cpp`, `InstPrinter.h` and `Disassembler.cpp` to the translation list. - Tip: The variables use in there are defined in `path_vars.json` - Add architecture specific includes in `Patches/Includes.py`. Copy the code from another architecture for the beginning. - Prepare API header (`.h`) for patching: - Check the generated `inc` files. Files names like `GenCSEnum.inc` contain enumerations for the header. Those get patched into the main header file of the architecture. - Remove old values and add `// generated content <...> begin` comments for patching. Checkout `longarch.h` as example. - Commit all changes so far. - The next step will write to the `arch/` and `include/capstone/.h` header! - Run generation, translation and copy/patch the files: `ASUpdater -a -w --copy-translated -s IncGen Translate PatchArchHeader` - ### Clean up - #### Check: All necessary files - Arch header: - Invalid characters in enum identifiers? Replace char in `PrinterCapstone::normalizedMnemonic` - In `arch/` - Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update `Include.py`. If not, you have to find the LLVM source file where they are defined and add it to the `arch_config.json` to translate it. - OR it needs the `SystemOperands.inc` file. Also can be generated by adding the arch to the list in `inc_gen.json`. - Note: When you start the next step, you likely don't want to generate, translate and copy files again. Because your had-made fixes get overwritten. So ensure you no longer use the `-w` flag for the `ASUpdater` and you checked thoroughly that all necessary files got translated! - Commit to save changes so far. - #### Remove and fix C++ syntax - Remove all **obvious irrelevant** C++ code from the translated files (e.g. class initializes) - Double check non-obvious cases, if they are important. Rember: removing something might lead to bugs later! - If in doubt, ask us. - If you fix the same syntax over and over again, consider adding a Patch for the `CppTranslator`. - Common problems: - Missing namespace prefix `unsigned GR32Regs[]` should be `unsigned ARCH_GR32Regs[]`. See `namespace begin/end` comments in the code. - TODO: Add more. - If in doubt, check the original C++ file in the LLVM repo. - ### Make it build - Add `ARCHLinkage.h` and the functions in the `InstPrinter.c`, `ArchDisassembler.c`. - Add essential code in `ARCHMapping.c`. Esential is everything **not** releated to details. - If unsure how to do Capstone <-> LLVM code things, always check LoongArch. If LoongArch doesn't handle this case, check Mips, SystemZ - ### Run tests & Fixing bugs - Update regression MC tests: Map LLVM `mattr` and `mcpu` names to the CS identifiers if necessary. -> Edit the `mcupdater.json` config file. - Update tests: `ASUpdater -s MCUpdate -a Arch -w` - Run MC tests: `cstest tests/MC/Arch` - ### Add details - Effectively copy behavior from `LoongArchMapping.c` or `SystemZMapping.c` but change values. - Changes to the API (structs in `arch.h`) are only allowed if it was wrong before. Otherwise only extensions. - Don't forget to update the Python bindings. - Run detail tests to check results. - Run detail tests with coverage. `ArchMapping.c` should be covered near 100%