# Getting Started Anybody is welcome to contribute to the decompilation effort! There are two main roles a contributor can fulfill: - *Delinking*, which entails analyzing the JSRF executable in-situ to figure out how to break it up into small chunks of code and data, and - *Decompiling*, which is writing C++ code that compiles down to the same code and data found in those chunks. Of these two tasks, the latter is more accessible and benefits more from a large group of volunteers, so we'll begin there. Those who want to participate in the delinking effort can follow the decompilation guide and then continue on to the delinking guide afterwards. ## Setting Up Decompilation You'll need a few things to get a decompilation workflow ready: - The JSRF executable (`default.xbe` in the root directory of the game disc) to provide the target compiled code to match - The Microsoft Visual C++ 7.0 (AKA Visual C++ .NET 2002) compiler to compile your C++ code - You'll also want to add its `Bin/` directory to your `PATH` so that objdiff can find it - The [Git](https://git-scm.com/) version control tool to clone and work on this repository - The [Ghidra](https://github.com/NationalSecurityAgency/ghidra) reverse engineering tool to analyze and browse the executable - The [XBE extension](https://github.com/XboxDev/ghidra-xbe) for Ghidra to import and analyze the JSRF executable - The [delinker extension](https://github.com/boricj/ghidra-delinker-extension) for Ghidra to export object files from the executable - The [objdiff](https://github.com/encounter/objdiff) code diffing tool to compare your C++ code's compiled output to the delinked object files Keep in mind that Ghidra and its extensions need to have their versions coordinated. The safest thing to do is to get the same version of each, e.g. 11.4. The general flow for installing extensions is to download a release `.zip` for the extension from the linked repository's releases page, open Ghidra, open the `File > Install Extensions` menu, click the green plus at the top right of the extensions window, and then select the `.zip` you just downloaded. Make sure the box to the left of the extension's name is checked to enable it before clicking "OK" to close the extensions window. With all these tools acquired, the last thing to get is this repository. Clone it with `git` in the usual fashion: ``` git clone https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git ``` The following sections detail how to use all these tools to start writing decompiled code. ### Creating a JSRF Ghidra Project Even if you have no intention of analyzing the executable in Ghidra otherwise, Ghidra is needed to produce the object files that objdiff will compare your recompiled code against. This section will only cover the steps needed to get to that point. Open Ghidra and create a new project (`File > New Project...`). Select the "Non-Shared Project" option, and set whatever location and name you'd like. With the project created, open the file import dialogue (`File > Import File...`) and select the `default.xbe` from JSRF. Ensure that the format in the next window is set to "Xbox Executable Format (XBE)" (if this isn't an option, you need to install/enable the XBE extension), and that the name is "default.xbe" (our tooling depends on it having this specific name). Click "OK," and you should see a window with a successful import results summary after a moment (you'll probably see the message `[xboxkrnl.exe] -> not found in project`, but this is fine and expected). `default.xbe` should now be visible in the file listing for the project. Double click it to open it in the CodeBrowser. The window that opens is where you'll do all your in-situ analysis, should you choose to do so. You'll be asked whether you want to run analyzers; say yes. Afterwards, simply clicking "Analyze" in the analysis options window without changing anything is fine, and the analysis will probably take a couple minutes. You can tell that the analysis is still running if there's a progress bar in the bottom right saying what it's currently analyzing. There's a small oddity that needs fixing: certain parts of memory are marked as executable where objdiff doesn't expect them to be, which will mess up our diffs. To correct this, open the memory map (`Window > Memory Map`) and uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`. Now we'll import data types from the decompilation. Open a Unix-style shell (e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the repository and run `make_header.sh`, which will produce a `jsrf.h` in the same directory with the combined contents of every header in a format suitable for Ghidra. Then, in Ghidra, select `File > Parse C Source...` to open a window for importing C headers. Remove everything from the "Source files to parse" and "Parse options" boxes, and add `jsrf.h` to the former (click the green + symbol on the right and select the `jsrf.h` file). Click the "..." on the "Program Architecture:" box and select the row with the values "x86," "default," "32," "little," and "Visual Studio." Finally, click the "Parse to Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the header is completely self-contained and doesn't need any information from any other data type archives). You should then see a window reporting successful import, and you'll be able to find `jsrf.h` with all of its definitions under `default.xbe` in the Data Type Manager window in the bottom left. Much of our work with Ghidra will make use of some custom scripts we've written, so we'll have to tell it where to find them. Open up the Script Manager (`Window > Script Manager`) and then open the Bundle Manager by clicking the "manage script directories" button (it looks sort of like a bulleted list). Click the green + in the top right to add a new directory and select the `ghidra/ghidra_scripts` directory in this repository. The first script we'll want to run is the symbol importer to get known data and functions into your Ghidra project. In the Script Manager window, select the "Import" category in the left pane and double click the `EnhancedImport.java` script in the right pane to run it. You'll then be asked for an input file; select `ghidra/symboltable.tsv` from this repository. Afterwards, you'll see a bunch of "Importing ..." messages in a console in the main CodeBrowser window, some of which may have "can't find data type X" added on if something's marked with a type that hasn't made its way into our decompiled code yet, and there'll be a bunch of new functions and labels defined. While we imported a bunch of data types earlier, Ghidra's C parser leaves out some important information that we'll have to fill in with another script. In the Script Manager, run `ClassFixup.java` from the "Data Types" category, and you should see some "Converting X to class" and "Fixing calling convention of X" messages in the console. Now you've got a Ghidra project containing everything we know about JSRF's code! Make sure you save your Ghidra project now that everything's set up. ### Producing Object Files Close all of your Ghidra windows and open a Unix-style shell in the decompilation repository's `ghidra/` directory. The `delink.sh` script is our automated tool for extracting all the object files that have been identified so far. The easiest way to run it is to invoke it with three arguments: - The path to your Ghidra installation (the directory with files like `ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and `Extensions/` - The path to your JSRF Ghidra project (the directory with a `.gpr` file and a directory with a name ending in `.rep`) - The name of your JSRF Ghidra project If you're on Windows, the paths you provide should be Windows filepaths, not Unix-style paths. Make sure the paths are surrounded by quotes, too (e.g. `'C:\path\to\whatever'`), else the shell won't understand the backslashes correctly. If you find typing out these arguments to be too much of a pain, you can also define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and `$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments. There are a couple errors you might get here: - `Unable to lock project!`: This means that Ghidra isn't fully closed. Make sure you've completely closed every Ghidra window before running `delink.sh`. - `Script not found` and `Invalid script`: This means that you haven't added the repository's `ghidra_scripts` directory to the script search path as described in the previous section (particulary if it mentions `MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed (particularly if it mentions `DelinkProgram.java`), or you've somehow invoked the script in a way that can't see the scripts (e.g. installing Ghidra on Windows and then invoking the script from WSL). - `java.lang.RuntimeException: Failed to export ...`: This means that the delinker extension doesn't like something about what it was told to delink. One known cause is duplicate symbol names. If you haven't modified `objects.csv` or `symboltable.tsv`, let other people on the project know so that they can look into fixing it. If all goes well, the extracted object files will be in the `decompile/target/` directory of the repository. Now we're ready to start recompiling and diffing code with objdiff. ### Setting Up objdiff Open the objdiff GUI program (by default named something like `objdiff-os-arch`, e.g. `objdiff-windows-x86_64.exe`). Click "Settings" in the left sidebar and then "Select" next to "Project directory" in the popup window. In the file picker, select the `decompile/` directory in the JSRF decompilation repository. The sidebar will now have a listing of all the extracted object files. Click on one, and you should see two panes: one on the left labelled "Target object" that lists the contents of the extracted object file, and one on the right listing the contents of the recompiled object file. If the right pane displays an error like "program not found," the Visual C++ 7.0 compiler probably wasn't correctly set up on your `PATH`. One important piece of information, to make sure you get the correct match percentages: set `Diff Options > Function relocation diffs` to "None." Otherwise, some references to non-local variables will be marked as nonmatching (this is because it's sometimes not possible to make certain things named variables in Ghidra, particularly thread-local storage, and other times it's not possible to assign a fixed name to certain implicitly generated output in the recompiled code). ### Using objdiff The basic idea of objdiff is to match up the contents of an object file compiled from our own decompiled code to the contents of an object file extracted from the game. To that end, functions have to be matched up between them. In the best case, corresponding functions in each file will have the same name and be in the same section, at which point objdiff can link them automatically. Otherwise, one has to click on one of the corresponding functions in one pane and the other function in the other pane to tell objdiff to link them. The most common cases of this are implicitly generated functions and data, such as exception handling code placed in `.text$x` in the recompiled object file. Be aware that objdiff's matching does not appear fully reliable in some cases, particularly when diffing data with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but still somehow reduce the match percentage, so you'll have to use a tiny amount of judgement to determine when you actually have a match. Clicking on a function that's been linked across both object files shows a diff of the disassembly of both versions of the function, with any differences highlighted. The task at hand is to modify the function in the corresponding source file (in the `decompile/src/` directory) such that the match percentage reaches 100%. Depending on how you configure objdiff, it will rebuild automatically whenever you save a change to a source file, or you can manually rebuild with the "Build" button at the top of the right pane. When viewing and editing decompiled source files, be mindful of the `// Status:` annotation above each function, which has the following meanings: - `unimplemented`: The decompiled function does not yet reproduce the behaviour of the original - `nonmatching`: The decompiled function is believed to behave the same as the original, but it does not fully match in objdiff - `matching`: The decompiled function perfectly matches the original in objdiff Be sure to update them as you decompile if appropriate. Some functions may also have other annotations describing nontrivial effects of link-time code generation (LTCG), such as a nonstandard calling convention or multiple functions being merged into one. Otherwise, there are no concrete instructions to give for writing decompiled code. Try importing headers from `decompile/src/` into Ghidra (`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's decompilation of the function in the CodeBrowser as a starting point for writing your matching function, exercising whatever C++ and x86 assembly knowledge you have. If you have basic decompilation experience but are new to decompiling C++ specifically, you might want to take a look at the [Decompiling C++](decompilingcpp.md) article. Whenever you have some decompiled code that you'd like to contribute to the repository, commit it to your local copy of the repository and create a merge request to merge it back into the online copy. ## Contributing to Delinking Getting the JSRF binary delinked is just as important as decompiling the resulting object files, but takes a bit more investment. The concrete task of a delinking contributor is to populate `symboltable.tsv` and `objects.csv` in the `ghidra/` directory, which together enable consistent delinking of object files. The former lists symbols at different addresses through the whole executable, while the latter lists the address ranges that have been identified as separable objects. Both of these things are figured out by combing over the whole executable in Ghidra. ### Updating `symboltable.tsv` If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you can generate a new copy from your Ghidra project by running the `EnhancedExport.java` script from the "Export" category. If you want to merge the new table into the repository, make sure to take a look at the diff first to ensure you're not inadvertently deleting anything. ### Updating `make_header.sh` If you've added any header files, you'll want to add them to the `HEADERS` variable in `ghidra/make_header.sh`. Make sure that any other header files they depend on are earlier in the list, as this script combines everything into one file without any `#include` directives. Make sure the script runs successfully and Ghidra is able to import the resulting `jsrf.h`. Keep in mind that `make_header.sh` uses a fairly rudimentary `awk` script to convert C++ headers to C, which places some gentle constraints on how declarations need to be written. In general, it's enough to just keep things simple and not do anything unusual (keep data type and variable declarations separate, don't use macros for declarations, etc.), but the one big catch is that the body of a data type definition must not be on the same line as the opening or closing braces. That is, do not write ```c++ struct X { unsigned x; }; ``` but rather ```c++ struct X { unsigned x; }; ``` ### Updating `objects.csv` `objects.csv` is a listing of addresses for each object file or group of object files that we've identified. Each column after the first two corresponds to a section of the executable, with filled cells indicating an address range occupied by that object file, empty cells indicating that the object occupies none of that section, and a `?` indicating an unknown address range or boundary. The `Object` column gives the path under `decompile/target/` to extract the object file to if the `Delink?` column is `true`, otherwise it's just a human-readable label for that row. `delink.sh` parses this file and uses any rows marked for delinking to produce object files. A couple criteria should be fulfilled before marking row in `objects.csv` for extraction. First, of course, the whole row should be filled with an object path and with address ranges that we're certain of. Make sure that not just the `.text` section, but also `.text$x` (exception handling code), `.data`, `.rdata`, and `.rdata$x` (data pointing to exception-handing code) are included in the object file if applicable! Address ranges also should not include any padding before or after data or code. Second, all of the symbols within those address ranges need to be present in `symboltable.tsv`, else delinking after only importing those symbols won't arrange the object file's internals correctly (exception-handling code might be appended onto another function, for example). Because `symboltable.tsv` should only be populated with symbols that have been manually defined as per the previous section, this means that you need to define variable names and labels in Ghidra for everything therein (and ideally everything referenced externally, as well). Strive to maintain basic consistency with the rest of the codebase: functions and methods begin with lowercase letters, for instance, while class/struct/enum names begin with capital letters, and special methods like constructors and destructors should have the names they would have in real C++ code (i.e. `Class::Class` and `Class::~Class`, respectively). Special class methods and members like constructors and vtables must follow their established naming conventions for our tooling to work properly. Also note that you can (mostly) disable name mangling for a symbol by making it a member of the `extern_"C"` namespace, which applies C-style name mangling as used by some symbols. Once an object is ready for extracting, its `Delink?` column should be set to `true` and the `objdiff.json` file in the `decompile/` directory should be updated to include it (give it an entry in the `units` list, modelled after other existing entries minus the `complete` and `symbol_mappings` fields), plus a `.cpp` file (and `.hpp` file if suitable) for it should be added for it in the `decompile/src/` directory. Make sure that any relevant data structures you've figured out are included in the new source files, then give extraction via `delink.sh` a test. Add a new prerequisite to `all:` at the top of the `Makefile` at the top of the `decompile/` directory, and add an entry at the bottom to record which header files need to be up to date to build the new object file (including anything included transitively!). Finally, make sure that the new object file builds in objdiff, even if its functions haven't actually been implemented yet. When you have it all sorted out, make a merge request to share your work with us!