diff --git a/documentation/decompilingcpp.md b/documentation/decompilingcpp.md new file mode 100644 index 0000000..540025b --- /dev/null +++ b/documentation/decompilingcpp.md @@ -0,0 +1,152 @@ +# Decompiling C++ +Like most (all?) Xbox titles and most sixth-generation games more generally, +JSRF is not written in assembly or C as those before it were, but rather C++. +C++ introduces new features that both complicate the final machine code and +weaken the correspondence between said machine code and the original C++ +source. + +This guide will cover various C++ features appearing in JSRF, explaining how +they manifest in the game's executable and how to properly decompile them, to +the extent possible. Basic familiarity with C features (e.g. functions, +structs) and how to decompile them is assumed. + + +## Name Mangling +(on the off chance you actually get symbol names, like from debug info; also +why symbol names don't match in objdiff) + + +## Classes +C++ classes evolve the C struct to associate the data structure with code, +which are called methods in this context. Classes can also inherit from one or +more other classes, sharing their data members and access to their methods. +Certain special methods called constructors and destructors can also be added +to a class, and these can be called implicitly when an instance of a class goes +in or out of scope. Classes can also have fields and methods marked as +private, but these permissions are usually completely erased during +compilation and don't need to be respected by a decompilation. + +### `class` vs. `struct` +The `struct` keyword can still be used in C++ and is equivalent to `class`, +except that the former makes all members public by default and the latter makes +all private by default. Since there's not much reason to make anything private +in a decompilation, one will usually use `struct` declarations in +decompilations rather than `class`. + +```c++ +// These two declarations are equivalent +class SomeClass { +public: // Makes everything after public + float someMemberVariable; + unsigned anotherMemberVariable; +}; + +struct SomeStruct { + float someMemberVariable; + unsigned anotherMemberVariable; +}; +``` + +A reasonable way to implement an inherited struct in Ghidra is to define the +base class normally, and then define the child with a first member called +`super` of the parent class type. Members specific to the child class can then +be inserted afterwards. + +### Class Methods +Methods are functions declared within a class's namespace, like so: +```c++ +class SomeClass { + // Regular data members + float someMemberVariable; + unsigned anotherMemberVariable; + + // Methods declared in class definition + SomeClass(int anArgument); // Constructor + ~SomeClass(); // Destructor + + void regularMethod(unsigned anArgument); + virtual void virtualMethod(char * anArgument); + static void staticMethod (char * anArgument); + + // Can also provide entire definition in class + float anotherMethod(float x) { + this->someMemberVariable += x; + return this->someMemberVariable; + } +}; + +// Definition of a method declared in class +void SomeClass::regularMethod(unsigned anArgument) { + this->anotherMemberVariable -= anArgument; +} +``` + +Methods can then be accessed and called with member access syntax, like +`classInstance.regularMethod(3)` and `instancePtr->anotherMethod(1.2)`. + +Static methods are indistinguishable from regular functions in compiled code, +so they probably won't see much use in decompilations. They don't have access +to the `this` pointer that other types of methods can use. + +Regular methods are similar to regular functions, but have an implicit first +argument called `this` representing a pointer to the object that the method +was called from. Some C++ implementations use a different calling convention +for method calls, such as Microsoft's implementation for the Xbox using the +`__thiscall` convention where the `this` pointer is passed in the ECX register +while all other arguments are passed on the stack. + +Constructors and destructors function largely like regular methods, but +implicitly return the `this` pointer. + +Virtual methods are methods that can be overridden on child classes. They're +not called directly, but instead called through a hidden first member that +points to an array of method function pointers, usually called a vtable (Visual +C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is +made virtual, additional "deleting destructors" may be generated as well, which +are methods taking one boolean argument that call the destructor and then, +depending on the argument, free the object's memory. + +(TODO: how to implement methods and vtables in Ghidra) + +### Inheritance +Child classes can be used in most places that their parent class can be used: +```c++ +// Class inheriting from SomeStruct +struct SomeStructChild : SomeStruct { + // Inherits these from SomeStruct: + // float someMemberVariable; + // unsigned anotherMemberVariable; + char * additionalMemberVariable; +}; + +// Could call this with either a SomeStruct* or SomeStructChild* argument +float getSomeMemberVariable(SomeStruct const * const ss) { + return ss->someMemberVariable; +} +``` + + +## The `new` and `delete` Operators +One way to allocate an object in C++ is using `new` and `delete`. The former +can both allocate and construct the object, while the latter is analogous to +calling `free()`. Each has a corresponding `operator new()` or +`operator delete()` function called implicitly. + +The generated code for a use of `new` with a constructor (like +`SomeStruct ss = new SomeStruct(7)`) performs the allocator and constructor +calls separately, roughly as follows (as it would appear in Ghidra; note that +Ghidra shows explicitly the passing of the `this` pointer): +```c++ +SomeStruct *ss; +ss = (SomeStruct *)operator_new(0xc); +if (ss == NULL) { + ss = NULL; // No, I'm not sure what the point of reassigning NULL is +} +else { + SomeStruct::SomeStruct(7); +} +``` + + +## Exception Handling + diff --git a/contributing.md b/documentation/gettingstarted.md similarity index 94% rename from contributing.md rename to documentation/gettingstarted.md index 3257217..3b1e9ac 100644 --- a/contributing.md +++ b/documentation/gettingstarted.md @@ -170,13 +170,9 @@ importing headers from `decompile/src/` into Ghidra (`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's decompilation of the function in the CodeBrowser as a starting point for writing your matching function, exercising whatever C++ and x86 assembly -knowledge you have. Exception handling code in particular can appear in -unexpected places (e.g. around `new` statements and in constructors) and has -unambiguous but nonobvious signs in the disassembly, so it might be worth -[reading](https://www.openrce.org/articles/full_view/21) up -[on](https://www.openrce.org/articles/full_view/23) how they're -[implemented](https://web.archive.org/web/20101007110629/http://www.microsoft.com/msj/0197/exception/exception.aspx) -to learn to recognize them in disassembly and recreate them in C++ code. +knowledge you have. If you have basic decompilation experience but are new to +decompiling C++ specifically, you might want to take a look at the +[Decompiling C++](decompilingcpp.md) article. Whenever you have some decompiled code that you'd like to contribute to the repository, commit it to your local copy of the repository and create a merge @@ -280,12 +276,3 @@ actually been implemented yet. When you have it all sorted out, make a merge request to share your work with us! - - -# Special Topics -This would be a good place to include guidance on some trickier aspects of -reverse engineering C++ code, like an accessible explanation of navigating -exception handling in Ghidra, implementing classes with virtual methods or -inheritance Ghidra and writing decompiled code for them, or what in the world a -COM object is and how to make Ghidra understand it (especially the one wrapping -all of JSRF's Direct3D calls). diff --git a/readme.md b/readme.md index 273b5fd..a007c5f 100644 --- a/readme.md +++ b/readme.md @@ -16,9 +16,10 @@ The approach of this decompilation is to: We are currently engaging in the first two steps simultaneously, decompiling code as it's delinked. Further details on these steps can be found in the -[contribution guide](contributing.md). Step 3 will use the linker from the -same Visual C++ 7.0 already used to compile object files. Step 4 is expected -to use the `cxbe` tool found in e.g. [nxdk](https://github.com/XboxDev/nxdk). +[contribution guide](documentation/gettingstarted.md). Step 3 will use the +linker from the same Visual C++ 7.0 already used to compile object files. Step +4 is expected to use the `cxbe` tool found in e.g. +[nxdk](https://github.com/XboxDev/nxdk). ## Contributing Anybody interested in joining the effort is welcome to read the