JSRF-Decompilation/documentation/gettingstarted.md

# Getting Started
Anybody is welcome to contribute to the decompilation effort!  There are two
main roles a contributor can fulfill:

- *Delinking*, which entails analyzing the JSRF executable in-situ to figure
  out how to break it up into small chunks of code and data, and
- *Decompiling*, which is writing C++ code that compiles down to the same code
  and data found in those chunks.

Of these two tasks, the latter is more accessible and benefits more from a
large group of volunteers, so we'll begin there.  Those who want to participate
in the delinking effort can follow the decompilation guide and then continue on
to the delinking guide afterwards.


## Setting Up Decompilation
You'll need a few things to get a decompilation workflow ready:

- The JSRF executable (`default.xbe` in the root directory of the game disc) to
  provide the target compiled code to match
- The Microsoft Visual C++ 7.0 (AKA Visual C++ .NET 2002) compiler to compile
  your C++ code
  - You'll also want to add its `Bin/` directory to your `PATH` so that objdiff
    can find it
- The [Git](https://git-scm.com/) version control tool to clone and work on
  this repository
- The [Ghidra](https://github.com/NationalSecurityAgency/ghidra) reverse
  engineering tool to analyze and browse the executable
- The [XBE extension](https://github.com/XboxDev/ghidra-xbe) for Ghidra to
  import and analyze the JSRF executable
- The [delinker extension](https://github.com/boricj/ghidra-delinker-extension)
  for Ghidra to export object files from the executable
- The [objdiff](https://github.com/encounter/objdiff) code diffing tool to
  compare your C++ code's compiled output to the delinked object files

Keep in mind that Ghidra and its extensions need to have their versions
coordinated.  The safest thing to do is to get the same version of each, e.g.
11.4.  The general flow for installing extensions is to download a release
`.zip` for the extension from the linked repository's releases page, open
Ghidra, open the `File > Install Extensions` menu, click the green plus at the
top right of the extensions window, and then select the `.zip` you just
downloaded.  Make sure the box to the left of the extension's name is checked
to enable it before clicking "OK" to close the extensions window.

With all these tools acquired, the last thing to get is this repository.  Clone
it with `git` in the usual fashion:
```
git clone https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
```

The following sections detail how to use all these tools to start writing
decompiled code.


### Creating a JSRF Ghidra Project
Even if you have no intention of analyzing the executable in Ghidra otherwise,
Ghidra is needed to produce the object files that objdiff will compare your
recompiled code against.  This section will only cover the steps needed to get
to that point.

Open Ghidra and create a new project (`File > New Project...`).  Select the
"Non-Shared Project" option, and set whatever location and name you'd like.
With the project created, open the file import dialogue
(`File > Import File...`) and select the `default.xbe` from JSRF.  Ensure that
the format in the next window is set to "Xbox Executable Format (XBE)" (if this
isn't an option, you need to install/enable the XBE extension), and that the
name is "default.xbe" (our tooling depends on it having this specific name).
Click "OK," and you should see a window with a successful import results
summary after a moment (you'll probably see the message
`[xboxkrnl.exe] -> not found in project`, but this is fine and expected).

`default.xbe` should now be visible in the file listing for the project.
Double click it to open it in the CodeBrowser.  The window that opens is where
you'll do all your in-situ analysis, should you choose to do so.  You'll be
asked whether you want to run analyzers; say yes.  Afterwards, simply clicking
"Analyze" in the analysis options window without changing anything is fine, and
the analysis will probably take a couple minutes.  You can tell that the
analysis is still running if there's a progress bar in the bottom right saying
what it's currently analyzing.

There's a small oddity that needs fixing: certain parts of memory are marked as
executable where objdiff doesn't expect them to be, which will mess up our
diffs.  To correct this, open the memory map (`Window > Memory Map`) and
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.

Now we'll import data types from the decompilation.  Open a shell in the
`ghidra/` directory of your copy of the repository and run `make_header.sh`,
which will produce a `jsrf.h` in the same directory with the combined contents
of every header in a format suitable for Ghidra.  Then, in Ghidra, select
`File > Parse C Source...` to open a window for importing C headers.  Remove
everything from the "Source files to parse" and "Parse options" boxes, and add
`jsrf.h` to the former (click the green + symbol on the right and select the
`jsrf.h` file).  Click the "..." on the "Program Architecture:" box and select
the row with the values "x86," "default," "32," "little," and "Visual Studio."
Finally, click the "Parse to Program" button, "Continue" to confirm, and
"Don't Use Open Archives" (the header is completely self-contained and doesn't
need any information from any other data type archives).  You should then see a
window reporting successful import, and you'll be able to find `jsrf.h` with
all of its definitions under `default.xbe` in the Data Type Manager window in
the bottom left.

Lastly, we'll import symbols from the JSRF decompilation repository.  Open the
script manager (`Window > Script Manager`) and select the "Data" folder in the
left pane.  Double click the script titled `ImportSymbolsScript.py`, and a file
picker will open after a moment.  Select `symboltable.tsv` from the `ghidra/`
directory of your cloned JSRF decompilation repository, and you should see a
bunch of `Created function...` and `Created label...` printed to the scripting
console window.  Save your changes (save icon in the top left of the
CodeBrowser window), and your Ghidra project should be all ready for creating
object files for objdiff.


### Producing Object Files
Close all of your Ghidra windows and open a Unix-style shell (e.g. Git Bash if
on Windows) in the decompilation repository's `delink/` directory.  The
`delink.sh` script is our automated tool for extracting all the object files
that have been identified so far.  Invoke it with three arguments:

- The path to your Ghidra installation (the directory with files like
  `ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
  `Extensions/`
- The path to your JSRF Ghidra project (the directory with a `.gpr` file and a
  directory with a name ending in `.rep`)
- The name of your JSRF Ghidra project

If you're on Windows, the paths you provide should be Windows filepaths, not
Unix-style paths.  Make sure the paths are surrounded by quotes, too (e.g.
`'C:\path\to\whatever'`), else the shell won't understand the backslashes
correctly.

There are a couple errors you might get here:

- `Unable to lock project!`: This means that Ghidra isn't fully closed.  Make
  sure you've completely closed every Ghidra window before running `delink.sh`.
- `Script not found: DelinkProgram.java` and
  `Invalid script: DelinkProgram.java`: This means that the either the Ghidra
  delinker extension isn't properly installed, or you've somehow invoked the
  script in a way that can't see the extension (e.g. installing Ghidra on
  Windows and then invoking the script from WSL).  Ensure it's installed and
  enabled first, and that you're not running in some kind of environment
  different from where you installed Ghidra.
- `java.lang.RuntimeException: Failed to export ...`: This means that the
  delinker extension doesn't like something about what it was told to delink.
  One known cause is duplicate symbol names.  If you haven't modified
  `objects.csv` or `symboltable.tsv`, let other people on the project know so
  that they can look into fixing it.

If all goes well, you'll see the message `Delinking complete!` at the end of
the script's output, and the extracted object files will be in the
`decompile/target/` directory of the repository.  Now we're ready to start
recompiling and diffing code with objdiff.


### Setting Up objdiff
Open the objdiff GUI program (by default named something like
`objdiff-os-arch`, e.g. `objdiff-windows-x86_64.exe`).  Click "Settings" in the
left sidebar and then "Select" next to "Project directory" in the popup window.
In the file picker, select the `decompile/` directory in the JSRF decompilation
repository.

The sidebar will now have a listing of all the extracted object files.  Click
on one, and you should see two panes: one on the left labelled "Target object"
that lists the contents of the extracted object file, and one on the right
listing the contents of the recompiled object file.  If the right pane displays
an error like "program not found," the Visual C++ 7.0 compiler probably wasn't
correctly set up on your `PATH`.

One important piece of information, to make sure you get the correct match
percentages: set `Diff Options > Function relocation diffs` to "None."
Otherwise, approximately all references to functions and non-local variables
will be marked as nonmatching (this has to do with the delinking process not
applying name mangling, which isn't expected to be fixed).


### Using objdiff
The basic idea of objdiff is to match up the contents of an object file
compiled from our own decompiled code to the contents of an object file
extracted from the game.  To that end, functions have to be matched up between
them.  In the best case, corresponding functions in each file will have the
same name and be in the same section, at which point objdiff can link them
automatically.  Otherwise, one has to click on one of the corresponding
functions in one pane and the other function in the other pane to tell objdiff
to link them.  Common cases of this are class methods (the names won't match)
and implicitly generated functions, such as exception handling code placed in
`.text$x` in the recompiled object file.  Keep in mind that objdiff's matching
does not appear fully reliable in some cases, particularly when diffing data
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
marked as non-matching but still somehow reduce the match percentage, so you'll
have to use a tiny amount of judgement to determine when you actually have a
match.

Clicking on a function that's been linked across both object files shows a diff
of the disassembly of both versions of the function, with any differences
highlighted.  The task at hand is to modify the function in the corresponding
source file (in the `decompile/src/` directory) such that the match percentage
reaches 100%.  Depending on how you configure objdiff, it will rebuild
automatically whenever you save a change to a source file, or you can manually
rebuild with the "Build" button at the top of the right pane.

There are no concrete instructions to give for writing decompiled code.  Try
importing headers from `decompile/src/` into Ghidra
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
decompilation of the function in the CodeBrowser as a starting point for
writing your matching function, exercising whatever C++ and x86 assembly
knowledge you have.  If you have basic decompilation experience but are new to
decompiling C++ specifically, you might want to take a look at the
[Decompiling C++](decompilingcpp.md) article.

Whenever you have some decompiled code that you'd like to contribute to the
repository, commit it to your local copy of the repository and create a merge
request to merge it back into the online copy.


## Contributing to Delinking
Getting the JSRF binary delinked is just as important as decompiling the
resulting object files, but takes a bit more investment.  The concrete task of
a delinking contributor is to populate `symboltable.tsv` in the `ghidra/`
directory and `objects.csv` in the `delink/` directory, which together enable
consistent delinking of object files.  The former lists symbols at different
addresses through the whole executable, while the latter lists the address
ranges that have been identified as separable objects.  Both of these things
are figured out by combing over the whole executable in Ghidra.


### Updating `symboltable.tsv`
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
workflow has been devised to generate it from your Ghidra project.  Before
regenerating the table, however, make sure that you have all of it symbols
already in your project so that you don't end up deleting any.  One option is
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
this will overwrite any names you've assigned to the same symbols.  You will
also have to ensure that no two symbols share the same name.  This can be
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
coexist), but function overloading must be avoided (you may not have one
function with the signature `void X::f(int)` and another with the signature
`void X::f(float)`), else errors can arise when delinking, as the delinker
extension does not mangle symbol names.  Thunked functions can also cause
problems because Ghidra does not include them alongside other functions in the
symbol table, so convert them to regular functions (right click on the thunked
function in the symbol tree and unset it as a thunk in the `Function` submenu).

Once you're ready to export your symbols, open the symbol table
(`Window > Symbol Table`).  Open the symbol filter window (cog button near the
top right), and uncheck everything but "User Defined" under "Symbol Source,"
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
Filters," and "Non-Externals" under "Non-Externals."  This ensures that you
only export symbols that you've defined and that are useful for delinking.

Now we need to configure the columns that we want to export.  Right-click on
one of the colum headers, click "Add/Remove Columns..." to open the "Select
Columns" window, and in it check only "Location," "Name," "Namespace," and
"Type."  Click "OK" to close the window and ensure that the column order is
"Location," "Namespace," "Name," "Type" (you can drag the column headers to
reorder them if needed).

Now, to actually export the table, right-click on one of the table cells, click
"Select All," and then right-click again on a cell to select "Export > Export
to CSV..." before selecting where to save your exported symbol table.

The final step is converting this CSV file to the format expected by
`ImportSymbolsScript.py`.  Open a shell in the repository's `ghidra/` directory
and run `make_symboltable.sh` with the path of your exported CSV as an
argument, and `symboltable.tsv` will be overwritten with a new table containing
your exported symbols.


### Updating `make_header.sh`
If you've added any header files, you'll want to add them to the `HEADERS`
variable in `ghidra/make_header.sh`.  Make sure that any other header files
they depend on are earlier in the list, as this script combines everything into
one file without any `#include` directives.  Make sure the script runs
successfully and Ghidra is able to import the resulting `jsrf.h`.

Keep in mind that `make_header.sh` uses a fairly rudimentary `awk` script to
convert C++ headers to C, which places some gentle constraints on how
declarations need to be written.  In general, it's enough to just keep things
simple and not do anything unusual (keep data type and variable declarations
separate, don't use macros for declarations, etc.), but the one big catch is
that the body of a data type definition must not be on the same line as the
opening or closing braces.  That is, do not write
```c++
struct X { unsigned x; };
```
but rather
```c++
struct X {
    unsigned x;
};
```


### Updating `objects.csv`
`objects.csv` is a listing of addresses for each object file or group of object
files that we've identified.  Each column after the first two corresponds to a
section of the executable, with filled cells indicating an address range
occupied by that object file, empty cells indicating that the object occupies
none of that section, and a `?` indicating an unknown address range or
boundary.  The `Object` column gives the path under `decompile/target/` to
extract the object file to if the `Delink?` column is `true`, otherwise it's
just a human-readable label for that row.  `delink.sh` parses this file and
uses any rows marked for delinking to produce object files.

A couple criteria should be fulfilled before marking row in `objects.csv` for
extraction.  First, of course, the whole row should be filled with an object
path and with address ranges that we're certain of.  Make sure that not just
the `.text` section, but also `.text$x` (exception handling code), `.data`,
`.rdata`, and `.rdata$x` (data pointing to exception-handing code) are included
in the object file if applicable!  Address ranges also should not include any
padding before or after data or code.  Second, all of the symbols within those
address ranges need to be present in `symboltable.tsv`, else delinking after
only importing those symbols won't arrange the object file's internals
correctly (exception-handling code might be appended onto another function, for
example).  Because `symboltable.tsv` should only be populated with symbols that
have been manually defined as per the previous section, this means that you
need to define variable names and labels in Ghidra for everything therein (and
ideally everything referenced externally, as well).  Do try to maintain basic
consistency with the rest of the codebase: functions and methods begin with
lowercase letters, for instance, while class/struct/enum names begin with
capital letters, and special methods like constructors and destructors should
have the names they would have in real C++ code (i.e. `Class::Class` and
`Class::~Class`, respectively).

Once an object is ready for extracting, its `Delink?` column should be set to
`true` and the `objdiff.json` file in the `decompile/` directory should be
updated to include it (give it an entry in the `units` list, modelled after
other existing entries minus the `complete` and `symbol_mappings` fields), plus
a `.cpp` file (and `.hpp` file if suitable) for it should be added for it in
the `decompile/src/` directory.  Make sure that any relevant data structures
you've figured out are included in the new source files, then give extraction
via `delink.sh` a test.  Add a new prerequisite to `all:` at the top of the
`Makefile` at the top of the `decompile/` directory, and add an entry at the
bottom to record which header files need to be up to date to build the new
object file (including anything included transitively!).  Finally, make sure
that the new object file builds in objdiff, even if its functions haven't
actually been implemented yet.

When you have it all sorted out, make a merge request to share your work with
us!