Create docs directory; begin "Decompiling C++"

This commit is contained in:
KeybadeBlox 2026-01-03 23:20:39 -05:00
parent 683818b637
commit 547f2ba179
3 changed files with 159 additions and 19 deletions

View file

@ -0,0 +1,152 @@
# Decompiling C++
Like most (all?) Xbox titles and most sixth-generation games more generally,
JSRF is not written in assembly or C as those before it were, but rather C++.
C++ introduces new features that both complicate the final machine code and
weaken the correspondence between said machine code and the original C++
source.
This guide will cover various C++ features appearing in JSRF, explaining how
they manifest in the game's executable and how to properly decompile them, to
the extent possible. Basic familiarity with C features (e.g. functions,
structs) and how to decompile them is assumed.
## Name Mangling
(on the off chance you actually get symbol names, like from debug info; also
why symbol names don't match in objdiff)
## Classes
C++ classes evolve the C struct to associate the data structure with code,
which are called methods in this context. Classes can also inherit from one or
more other classes, sharing their data members and access to their methods.
Certain special methods called constructors and destructors can also be added
to a class, and these can be called implicitly when an instance of a class goes
in or out of scope. Classes can also have fields and methods marked as
private, but these permissions are usually completely erased during
compilation and don't need to be respected by a decompilation.
### `class` vs. `struct`
The `struct` keyword can still be used in C++ and is equivalent to `class`,
except that the former makes all members public by default and the latter makes
all private by default. Since there's not much reason to make anything private
in a decompilation, one will usually use `struct` declarations in
decompilations rather than `class`.
```c++
// These two declarations are equivalent
class SomeClass {
public: // Makes everything after public
float someMemberVariable;
unsigned anotherMemberVariable;
};
struct SomeStruct {
float someMemberVariable;
unsigned anotherMemberVariable;
};
```
A reasonable way to implement an inherited struct in Ghidra is to define the
base class normally, and then define the child with a first member called
`super` of the parent class type. Members specific to the child class can then
be inserted afterwards.
### Class Methods
Methods are functions declared within a class's namespace, like so:
```c++
class SomeClass {
// Regular data members
float someMemberVariable;
unsigned anotherMemberVariable;
// Methods declared in class definition
SomeClass(int anArgument); // Constructor
~SomeClass(); // Destructor
void regularMethod(unsigned anArgument);
virtual void virtualMethod(char * anArgument);
static void staticMethod (char * anArgument);
// Can also provide entire definition in class
float anotherMethod(float x) {
this->someMemberVariable += x;
return this->someMemberVariable;
}
};
// Definition of a method declared in class
void SomeClass::regularMethod(unsigned anArgument) {
this->anotherMemberVariable -= anArgument;
}
```
Methods can then be accessed and called with member access syntax, like
`classInstance.regularMethod(3)` and `instancePtr->anotherMethod(1.2)`.
Static methods are indistinguishable from regular functions in compiled code,
so they probably won't see much use in decompilations. They don't have access
to the `this` pointer that other types of methods can use.
Regular methods are similar to regular functions, but have an implicit first
argument called `this` representing a pointer to the object that the method
was called from. Some C++ implementations use a different calling convention
for method calls, such as Microsoft's implementation for the Xbox using the
`__thiscall` convention where the `this` pointer is passed in the ECX register
while all other arguments are passed on the stack.
Constructors and destructors function largely like regular methods, but
implicitly return the `this` pointer.
Virtual methods are methods that can be overridden on child classes. They're
not called directly, but instead called through a hidden first member that
points to an array of method function pointers, usually called a vtable (Visual
C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is
made virtual, additional "deleting destructors" may be generated as well, which
are methods taking one boolean argument that call the destructor and then,
depending on the argument, free the object's memory.
(TODO: how to implement methods and vtables in Ghidra)
### Inheritance
Child classes can be used in most places that their parent class can be used:
```c++
// Class inheriting from SomeStruct
struct SomeStructChild : SomeStruct {
// Inherits these from SomeStruct:
// float someMemberVariable;
// unsigned anotherMemberVariable;
char * additionalMemberVariable;
};
// Could call this with either a SomeStruct* or SomeStructChild* argument
float getSomeMemberVariable(SomeStruct const * const ss) {
return ss->someMemberVariable;
}
```
## The `new` and `delete` Operators
One way to allocate an object in C++ is using `new` and `delete`. The former
can both allocate and construct the object, while the latter is analogous to
calling `free()`. Each has a corresponding `operator new()` or
`operator delete()` function called implicitly.
The generated code for a use of `new` with a constructor (like
`SomeStruct ss = new SomeStruct(7)`) performs the allocator and constructor
calls separately, roughly as follows (as it would appear in Ghidra; note that
Ghidra shows explicitly the passing of the `this` pointer):
```c++
SomeStruct *ss;
ss = (SomeStruct *)operator_new(0xc);
if (ss == NULL) {
ss = NULL; // No, I'm not sure what the point of reassigning NULL is
}
else {
SomeStruct::SomeStruct(7);
}
```
## Exception Handling