I see a lot about source codes being leaked and I’m wondering how it that you could make something like an exact replica of Super Mario Bros without the source code or how you can’t take the finished product and run it back through the compilation software?
The implicit assumption with decompiling code is that the goal is either to inspect how the code works, or to try compiling for a different machine. I’ll try to explain why the latter is quite difficult.
As you said, compilation to machine code only keeps the details needed for the CPU to accomplish what was instructed. And indeed, that is supposed to be efficient to run on that CPU, by reason of being targeted exactly for that CPU. But when decompiling, the resulting code will reflect the specificity to that same CPU. If you then try to compile that code for a different CPU, it will likely work, but will likely be inefficient because the second CPU’s unique advantages won’t be leveraged.
To use an example, consider how someone might divide two large numbers. Person A learned long division in school, and so takes each number and breaks it down into a series of smaller multiplications and subtractions. Person B learned to do division using a calculator, which just involves entering the two numbers and requesting that they be divided.
Trying to do division by blindly giving Person B that series of multiplications and subtractions to do on the calculator is extremely inefficient because Person B knows how to do division easily. But Person B is following Person A’s methods, without knowing that the whole point of this exercise is to just divide the two original numbers. Compilation loses context and intent, which cannot be recovered from decompilation, for non-trivial programs.
Here is an example why source code is useful when it provides context: https://en.m.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code . Very few people would be able to figure out how this works from just the machine code.
follow up, would it be easier to read this context-less source code or stay at assembly? If for example you’d like to modify a closed source app
Probably depends on how comfortable you are at reading assembly instructions for your specific CPU, but I think generally the contextless source code is probably preferable. Either way you’ve got a headache of an investigation in front of you though.
here’s an example of what it might look like with either option
oh wow, I now respect pirates even more. No wonder there are only like 3 guys that can and will do this.
If you decompile you need such an understanding of the language. I could see someone looking at this and going “oh yeah that compares cases”, but then die of old age before finishing the sentance.
And if you don’t decompile you are coding assembly.
Like many things, it’s very fact-intensive, varying in different circumstances. As others have noted, the abilities of the person undertaking the decompilation will influence the decision. But so will strategy: the overall goal can drive how decompilation is approached.
For example, suppose you’re working for an airline company and need to rewrite some software used on an ancient IBM System/360 machine and was written in the COBOL language, for which no source code is available and you cannot find many people who even know COBOL. Here, since the task is to rewrite the code, decompilation is just to tell you how it works and then you’ll want to write the new program in a modern language. It may be useful to decompile to a different language if such a decompiler is available, say to the C language, which you better understand.
Sure, it may be that C isn’t what the new program will be written in, but if your C reading skills are sufficient, then this is a valid strategy.
The skill of a decompiling engineer – or any engineer really – is leveraging your skills and your tools to tractably attack the difficult problem at hand. Many equally-skilled engineers can plausibly approach the same problem differently.