I see a lot about source codes being leaked and I’m wondering how it that you could make something like an exact replica of Super Mario Bros without the source code or how you can’t take the finished product and run it back through the compilation software?
It depends on the specifics of how the language is compiled. I’ll use C# as an example since that’s what I’m currently working with, but the process is different between all of them.
C#, when compiled, actually gets compressed down to what is known as an intermediate language (MSIL for C# specifically). This intermediate file is basically a set of genericized instructions that are not linked to any specific CPU. This is useful because different CPUs require different instructions.
Then, when the program is run, a second compiler known as the JIT (just-in-time) compiler takes the intermediate commands and translates them into something directly relevant to the CPU being used.
When we decompile a C# dll, we’re really converting from the intermediate language (generic CPU-agnostic instructions) and translating it back into source code.
To your second point, you are correct that the decompiled version will be more efficient from a processing perspective, but that efficiency comes at the direct cost of being able to easily understand what is happening at a human level. :)
Could I trouble you to go deeper? I’m think I’m getting it but if we were to say uncompile GTA V or Super Mario Bros, could we make changes and figure it out from there or would it be complete nonsense with no way points to jump in at and get a grip on what is being done.
On a side note I was told once that everything is 1s and 0s and as a result that someone could type a picture of you if they got the order right. This could be why I’m so wrong in my understanding given I’m now assuming this was bullshit.
Here is a real world example of someone doing some reverse engineering of compiled code. Might help you understand what is possible, and some of the processes. https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/
At a very low level, yes, everything is 1s and 0s. However, virtually nobody deals with binary anymore. Programming languages are abstractions over abstractions over abstractions not to have to deal with typing binary.
The point of programming languages is for humans to be able to read it and make sense out of it. It’s a way to represent in a kind of intermediate language that’s halfway between something humans can read and computers can interpret.
Say the game’s programmer wants to handle moving your character right on pressing the right arrow key. They might write some function called “handleRightArrow()”, which does whatever. Then your compiler will turn this to some instructions - read stuff in RAM at address XYZ, copy it over, etc. The original code with readable names, comments, documentation, proper organization, it’s gone. Once you decompile, it’s gonna be random function/variable names, compiler might have rewritten some parts of the implementation as automatic optimizations, unlined some functions, etc. The human readable meaning of the code is lost. It does the same thing as the original code, but it isn’t the original code either.