JSRF-Decompilation/contributing.md

8.8 KiB

Getting Started

Anybody is welcome to contribute to the decompilation effort! There are two main roles a contributor can fulfill:

  • Delinking, which entails analyzing the JSRF executable in-situ to figure out how to break it up into small chunks of code and data, and
  • Decompiling, which is writing C++ code that compiles down to the same code and data found in those chunks.

Of these two tasks, the latter is more accessible and benefits more from a large group of volunteers, so we'll begin there. Those who want to participate in the delinking effort can follow the decompilation guide and then continue on to the delinking guide afterwards.

Setting Up Decompilation

You'll need a few things to get a decompilation workflow ready:

  • The JSRF executable (default.xbe in the root directory of the game disc) to provide the target compiled code to match
  • The Microsoft Visual C++ 7.0 (AKA Visual C++ .NET 2002) compiler to compile your C++ code
    • You'll also want to add its Bin/ directory to your PATH so that objdiff can find it
  • The Git version control tool to clone and work on this repository
  • The Ghidra reverse engineering tool to analyze and browse the executable
  • The XBE extension for Ghidra to import and analyze the JSRF executable
  • The delinker extension for Ghidra to export object files from the executable
  • The objdiff code diffing tool to compare your C++ code's compiled output to the delinked object files

Keep in mind that Ghidra and its extensions need to have their versions coordinated. The safest thing to do is to get the same version of each, e.g. 11.4. The general flow for installing extensions is to download a release .zip for the extension from the linked repository's releases page, open Ghidra, open the File > Install Extensions menu, click the green plus at the top right of the extensions window, and then select the .zip you just downloaded. Make sure the box to the left of the extension's name is checked to enable it before clicking "OK" to close the extensions window.

With all these tools acquired, the last thing to get is this repository. Clone it with git in the usual fashion:

git clone https://codeberg.org/KeybadeBlox/JSRF-Decomp-Notes.git

The following sections detail how to use all these tools to start writing decompiled code.

Creating a JSRF Ghidra Project

Even if you have no intention of analyzing the executable in Ghidra otherwise, Ghidra is needed to produce the object files that objdiff will compare your recompiled code against. This section will only cover the steps needed to get to that point.

Open Ghidra and create a new project (File > New Project...). Select the "Non-Shared Project" option, and set whatever location and name you'd like. With the project created, open the file import dialogue (File > Import File...) and select the default.xbe from JSRF. Ensure that the format in the next window is set to "Xbox Executable Format (XBE)" (if this isn't an option, you need to install/enable the XBE extension), and that the name is "default.xbe" (our tooling depends on it having this specific name). Click "OK," and you should see a window with a successful import results summary after a moment (you'll probably see the message [xboxkrnl.exe] -> not found in project, but this is fine and expected).

default.xbe should now be visible in the file listing for the project. Double click it to open it in the CodeBrowser. The window that opens is where you'll do all your in-situ analysis, should you choose to do so. You'll be asked whether you want to run analyzers; say yes. Afterwards, simply clicking "Analyze" in the analysis options window without changing anything is fine, and the analysis will probably take a couple minutes.

Now we'll import symbols from the JSRF decompilation repository. After running the analysis, open the script manager (Window > Script Manager) and select the "Data" folder in the left pane. Double click the script titled ImportSymbolsScript.py, and a file picker will open after a moment. Select symboltable.tsv from the delink/ directory of your cloned JSRF decompilation repository, and you should see a bunch of Created function... and Created label... in the scripting console window. Save your changes (save icon in the top left of the CodeBrowser window), and your Ghidra project should be all ready for creating object files for objdiff.

Producing Object Files

Close all of your Ghidra windows and open a shell in the decompilation repository's delink/ directory. The delink.sh script is our automated tool for extracting all the object files that have been identified so far. Invoke it with three arguments:

  • The path to your Ghidra installation (the directory with files like ghidraRun and ghidraRun.bat, and directories like docs/ and Extensions/
  • The path to your JSRF Ghidra project (the directory with a .gpr file and a directory with a name ending in .rep)
  • The name of your JSRF Ghidra project

There are two common errors you might get here:

  • Unable to lock project!: This means that Ghidra isn't fully closed. Make sure you've completely closed every Ghidra window before running delink.sh.
  • Script not found: DelinkProgram.java and Invalid script: DelinkProgram.java: This means that the Ghidra delinker extension isn't properly installed. Ensure it's installed and enabled first.

If all goes well, you'll see the message Delinking complete! at the end of the script's output, and the extracted object files will be in the decompile/target/ directory of the repository. Now we're ready to start recompiling and diffing code with objdiff.

Setting Up objdiff

Open the objdiff GUI program (by default named something like objdiff-os-arch, e.g. objdiff-windows-x86_64.exe). Click "Settings" in the left sidebar and then "Select" next to "Project directory" in the popup window. In the file picker, select the decompile/ directory in the JSRF decompilation repository.

The sidebar will now have a listing of all the extracted object files. Click on one, and you should see two panes: one on the left labelled "Target object" that lists the contents of the extracted object file, and one on the right listing the contents of the recompiled object file. If the right pane displays an error like "program not found," the Visual C++ 7.0 compiler probably wasn't correctly set up on your PATH.

One important piece of information, to make sure you get the correct match percentages: set Diff Options > Function relocation diffs to "None." Otherwise, approximately all references to functions and non-local variables will be marked as nonmatching (this has to do with the delinking process not applying name mangling, which isn't expected to be fixed).

Using objdiff

The basic idea of objdiff is to match up the contents of an object file compiled from our own decompiled code to the contents of an object file extracted from the game. To that end, functions have to be matched up between them. In the best case, corresponding functions in each file will have the same name and be in the same section, at which point objdiff can link them automatically. Otherwise, one has two click on one of the corresponding functions in one pane and the other function in the other pane to tell objdiff to link them. Common cases of this are class methods (the names won't match) and implicitly generated functions, such as exception handling code placed in .text$x in the recompiled object file.

Clicking on a function that's been linked across both object files shows a diff of the disassembly of both versions of the function, with any differences highlighted. The task at hand is to modify the function in the corresponding source file such that the match percentage reaches 100%. Depending on how you configure objdiff, it will rebuild automatically whenever you save a change to a source file, or you can manually rebuild with the "Build" button at the top of the right pane.

There are no hard instructions to give for writing decompiled code. Use Ghidra's decompilation of the function in the CodeBrowser as a starting point, and exercise whatever C++ and x86 assembly knowledge you have. Exception handling code in particular can appear in unexpected places (around new statements, in constructors) and has unambiguous but nonobvious signs in the disassembly, so it might be worth reading up on how they're implemented to learn to recognize them in disassembly and recreate them in C++ code.

Contrbuting to Delinking