HEX-RAYS DECOMPILER
We are pleased to present our flagship product, the Hex-Rays Decompiler, which brings binary software analysis within reach of millions of programmers. It converts native processor code into a human readable C-like pseudocode text.
In comparison to low level assembly language, high level language representation in the Decompiler has several advantages:
- concise: requires less time to read it
- structured: program logic is more obvious
- dynamic: variable names and types can be changed on the fly
- familiar: no need to learn the assembly language
- cool: the most advanced decompiler ever built!
The pseudocode text is generated on the fly. Our technology is fast enough to analyze 99% of functions within a couple of seconds.
Currently the decompiler supports compiler generated code for the x86, x64, ARM32, ARM64, and PowerPC processors. We plan to port it to other platforms in the future. The programmtic API allows our customers to improve the decompiler output. Vulnerability search, software validation, coverage analysis are the directions that immediately come to mind.
The decompiler runs on MS Windows, Linux, and Mac OS X. The GUI and text IDA versions are supported.
Decompilation vs. disassembly
Below you will find side-by-side comparisons of disassembly and decompilation outputs.
1. Division by two
Just note the difference in size! While the disassemble output requires you not only to know that the compilers generate such convoluted code for signed divisions and modulo operations, but you will also have to spend your time recognizing the patterns. Needless to say, the decompiler makes things really simple.
ASSEMBLER CODE
; =============== S U B R O U T I N E =======================================
; Attributes: bp-based frame
; mod_ll(long long)
public __Z6mod_llx
__Z6mod_llx proc near
var_10 = dword ptr -10h
var_C = dword ptr -0Ch
arg_0 = qword ptr 8
push ebp
mov ebp, esp
push ebx
sub esp, 0Ch
mov ecx, dword ptr [ebp+arg_0]
mov ebx, dword ptr [ebp+arg_0+4]
mov eax, ecx
mov edx, ebx
mov eax, edx
mov edx, eax
sar edx, 1Fh
sar eax, 1Fh
mov eax, edx
mov edx, 0
shr eax, 1Fh
add eax, ecx
adc edx, ebx
shrd eax, edx, 1
sar edx, 1
mov [ebp+var_10], eax
mov [ebp+var_C], edx
mov eax, [ebp+var_10]
mov edx, [ebp+var_C]
shld edx, eax, 1
add eax, eax
sub ecx, eax
sbb ebx, edx
mov [ebp+var_10], ecx
mov [ebp+var_C], ebx
mov eax, [ebp+var_10]
mov edx, [ebp+var_C]
add esp, 0Ch
pop ebx
pop ebp
retn
__Z6mod_llx endp
PSEUDOCODE
__int64 __cdecl mod_ll(__int64 a1)
{
return a1 % 2;
}
2. Simple enough?
Questions like
- What are the possible return values of the function?
- Does the function use any strings?
- What does the function do?
can be answered almost instantaneously looking at the decompiler output. Needless to say that it looks better because I renamed the local variables. In the disassembler, registers are renamed very rarely because it hides the register use and can lead to confusion.
ASSEMBLER CODE
; =============== S U B R O U T I N E =======================================
; int __cdecl sub_4061C0(char *Str, char *Dest)
sub_4061C0 proc near ; CODE XREF: sub_4062F0+15p
; sub_4063D4+21p …
Str = dword ptr 4
Dest = dword ptr 8
push esi
push offset aSmtp_ ; "smtp."
push [esp+8+Dest] ; Dest
call _strcpy
mov esi, [esp+0Ch+Str]
push esi ; Str
call _strlen
add esp, 0Ch
xor ecx, ecx
test eax, eax
jle short loc_4061ED
loc_4061E2: ; CODE XREF: sub_4061C0+2Bj
cmp byte ptr [ecx+esi], 40h
jz short loc_4061ED
inc ecx
cmp ecx, eax
jl short loc_4061E2
loc_4061ED: ; CODE XREF: sub_4061C0+20j
; sub_4061C0+26j
dec eax
cmp ecx, eax
jl short loc_4061F6
xor eax, eax
pop esi
retn
; —————————————————————————
loc_4061F6: ; CODE XREF: sub_4061C0+30j
lea eax, [ecx+esi+1]
push eax ; Source
push [esp+8+Dest] ; Dest
call _strcat
pop ecx
pop ecx
push 1
pop eax
pop esi
retn
sub_4061C0 endp
PSEUDOCODE
for ( i = 0; i < len; ++i )
{
if ( str2[i] == 64 )
break;
}
if ( i < len – 1 )
{
strcat(Dest, &str2[i + 1]);
result = 1;
}
else
{
result = 0;
}
return result;
}
3. Where’s my variable?
IDA highlights the current identifier. This feature turns out to be much more useful with high level output. In this sample, I tried to trace how the retrieved function pointer is used by the function. In the disassembly output, many wrong eax occurrences are highlighted while the decompiler did exactly what I wanted.
ASSEMBLER CODE
; =============== S U B R O U T I N E =======================================
; int __cdecl myfunc(wchar_t *Str, int)
myfunc proc near ; CODE XREF: sub_4060+76p
; .text:42E4p
Str = dword ptr 4
arg_4 = dword ptr 8
mov eax, dword_1001F608
cmp eax, 0FFFFFFFFh
jnz short loc_10003AB6
push offset aGetsystemwindo ; "GetSystemWindowsDirectoryW"
push offset aKernel32_dll ; "KERNEL32.DLL"
call ds:GetModuleHandleW
push eax ; hModule
call ds:GetProcAddress
mov dword_1001F608, eax
loc_10003AB6: ; CODE XREF: myfunc+8j
test eax, eax
push esi
mov esi, [esp+4+arg_4]
push edi
mov edi, [esp+8+Str]
push esi
push edi
jz short loc_10003ACA
call eax ; dword_1001F608
jmp short loc_10003AD0
; —————————————————————————
loc_10003ACA: ; CODE XREF: myfunc+34j
call ds:GetWindowsDirectoryW
loc_10003AD0: ; CODE XREF: myfunc+38j
sub esi, eax
cmp esi, 5
jnb short loc_10003ADD
pop edi
add eax, 5
pop esi
retn
; —————————————————————————
loc_10003ADD: ; CODE XREF: myfunc+45j
push offset aInf_0 ; "\\inf"
push edi ; Dest
call _wcscat
push edi ; Str
call _wcslen
add esp, 0Ch
pop edi
pop esi
retn
myfunc endp
PSEUDOCODE
4. Arithmetics is not a rocket science
Arithmetics is not a rocket science but it is always better if someone handles it for you. You have more important things to focus on.
ASSEMBLER CODE
; =============== S U B R O U T I N E =======================================
; Attributes: bp-based frame
; sgell(__int64, __int64)
public @sgell$qjj
@sgell$qjj proc near
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
arg_8 = dword ptr 10h
arg_C = dword ptr 14h
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
mov edx, [ebp+arg_4]
cmp edx, [ebp+arg_C]
jnz short loc_10226
cmp eax, [ebp+arg_8]
setnb al
jmp short loc_10229
; —————————————————————————
loc_10226: ; CODE XREF: sgell(__int64,__int64)+Cj
setnl al
loc_10229: ; CODE XREF: sgell(__int64,__int64)+14j
and eax, 1
pop ebp
retn
@sgell$qjj endp
PSEUDOCODE
bool __cdecl sgell(__int64 a1, __int64 a2)
{
return a1 >= a2;
}
5. Sample window procedure
The decompiler recognized a switch statement and nicely represented the window procedure. Without this little help the user would have to calculate the message numbers herself. Nothing particularly difficult, just time consuming and boring. What if she makes a mistake?…
ASSEMBLER CODE
; =============== S U B R O U T I N E =======================================
wndproc proc near ; DATA XREF: sub_4010E0+21o
Paint = tagPAINTSTRUCT ptr -0A4h
Buffer = byte ptr -64h
hWnd = dword ptr 4
Msg = dword ptr 8
wParam = dword ptr 0Ch
lParam = dword ptr 10h
mov ecx, hInstance
sub esp, 0A4h
lea eax, [esp+0A4h+Buffer]
push 64h ; nBufferMax
push eax ; lpBuffer
push 6Ah ; uID
push ecx ; hInstance
call ds:LoadStringA
mov ecx, [esp+0A4h+Msg]
mov eax, ecx
sub eax, 2
jz loc_4013E8
sub eax, 0Dh
jz loc_4013B2
sub eax, 102h
jz short loc_401336
mov edx, [esp+0A4h+lParam]
mov eax, [esp+0A4h+wParam]
push edx ; lParam
push eax ; wParam
push ecx ; Msg
mov ecx, [esp+0B0h+hWnd]
push ecx ; hWnd
call ds:DefWindowProcA
add esp, 0A4h
retn 10h
; —————————————————————————
loc_401336: ; CODE XREF: wndproc+3Cj
mov ecx, [esp+0A4h+wParam]
mov eax, ecx
and eax, 0FFFFh
sub eax, 68h
jz short loc_40138A
dec eax
jz short loc_401371
mov edx, [esp+0A4h+lParam]
mov eax, [esp+0A4h+hWnd]
push edx ; lParam
push ecx ; wParam
push 111h ; Msg
push eax ; hWnd
call ds:DefWindowProcA
add esp, 0A4h
retn 10h
; —————————————————————————
loc_401371: ; CODE XREF: wndproc+7Aj
mov ecx, [esp+0A4h+hWnd]
push ecx ; hWnd
call ds:DestroyWindow
xor eax, eax
add esp, 0A4h
retn 10h
; —————————————————————————
loc_40138A: ; CODE XREF: wndproc+77j
mov edx, [esp+0A4h+hWnd]
mov eax, hInstance
push 0 ; dwInitParam
push offset DialogFunc ; lpDialogFunc
push edx ; hWndParent
push 67h ; lpTemplateName
push eax ; hInstance
call ds:DialogBoxParamA
xor eax, eax
add esp, 0A4h
retn 10h
; —————————————————————————
loc_4013B2: ; CODE XREF: wndproc+31j
push esi
mov esi, [esp+0A8h+hWnd]
lea ecx, [esp+0A8h+Paint]
push ecx ; lpPaint
push esi ; hWnd
call ds:BeginPaint
push eax ; HDC
push esi ; hWnd
call my_paint
add esp, 8
lea edx, [esp+0A8h+Paint]
push edx ; lpPaint
push esi ; hWnd
call ds:EndPaint
pop esi
xor eax, eax
add esp, 0A4h
retn 10h
; —————————————————————————
loc_4013E8: ; CODE XREF: wndproc+28j
push 0 ; nExitCode
call ds:PostQuitMessage
xor eax, eax
add esp, 0A4h
retn 10h
wndproc endp
PSEUDOCODE
6. Short-circuit evaluation
This is an excerpt from a big function to illustrate short-circuit evaluation. Complex things happen in long functions and it is very handy to have the decompiler to represent things in a human way. Please note how the code that was scattered over the address space is concisely displayed in two if
statements.
ASSEMBLER CODE
loc_804BCC7: ; CODE XREF: sub_804BB10+A42j
mov [esp+28h+var_24], offset aUnzip ; "unzip"
xor eax, eax
test esi, esi
setnz al
mov edx, 1
mov ds:dword_804FBAC, edx
lea eax, [eax+eax+1]
mov ds:dword_804F780, eax
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jz loc_804C4F1
loc_804BCFF: ; CODE XREF: sub_804BB10+9F8j
mov eax, 2
mov ds:dword_804FBAC, eax
loc_804BD09: ; CODE XREF: sub_804BB10+9FEj
mov [esp+28h+var_24], offset aZ2cat ; "z2cat"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jz loc_804C495
loc_804BD26: ; CODE XREF: sub_804BB10+99Cj
; sub_804BB10+9B9j …
mov eax, 2
mov ds:dword_804FBAC, eax
xor eax, eax
test esi, esi
setnz al
inc eax
mov ds:dword_804F780, eax
………………………… SKIP ……………………….
loc_804C495: ; CODE XREF: sub_804BB10+210j
mov [esp+28h+var_24], offset aZ2cat_0 ; "Z2CAT"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BD26
mov [esp+28h+var_24], offset aZcat ; "zcat"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BD26
mov [esp+28h+var_24], offset aZcat_0 ; "ZCAT"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BD26
jmp loc_804BD3D
; —————————————————————————
loc_804C4F1: ; CODE XREF: sub_804BB10+1E9j
mov [esp+28h+var_24], offset aUnzip_0 ; "UNZIP"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BCFF
jmp loc_804BD09
PSEUDOCODE
dword_804F780 = 2 * (v9 != 0) + 1;
if ( strstr(dword_804FFD4, "unzip") || strstr(dword_804FFD4, "UNZIP") )
dword_804FBAC = 2;
if ( strstr(dword_804FFD4, "z2cat")
|| strstr(dword_804FFD4, "Z2CAT")
|| strstr(dword_804FFD4, "zcat")
|| strstr(dword_804FFD4, "ZCAT") )
{
dword_804FBAC = 2;
dword_804F780 = (v9 != 0) + 1;
}
7. Inlined string operations
The decompiler tries to recognize frequently inlined string functions such as strcmp, strchr, strlen, etc. In this code snippet, calls to the strlen
function has been recognized.
ASSEMBLER CODE
mov eax, [esp+argc]
sub esp, 8
push ebx
push ebp
push esi
lea ecx, ds:0Ch[eax*4]
push edi
push ecx ; unsigned int
call ??2@YAPAXI@Z ; operator new(uint)
mov edx, [esp+1Ch+argv]
mov ebp, eax
or ecx, 0FFFFFFFFh
xor eax, eax
mov esi, [edx]
add esp, 4
mov edi, esi
repne scasb
not ecx
dec ecx
cmp ecx, 4
jl short loc_401064
cmp byte ptr [ecx+esi-4], ‘.’
jnz short loc_401064
mov al, [ecx+esi-3]
cmp al, ‘e’
jz short loc_401047
cmp al, ‘E’
jnz short loc_401064
loc_401047: ; CODE XREF: _main+41j
mov al, [ecx+esi-2]
cmp al, ‘x’
jz short loc_401053
cmp al, ‘X’
jnz short loc_401064
loc_401053: ; CODE XREF: _main+4Dj
mov al, [ecx+esi-1]
cmp al, ‘e’
jz short loc_40105F
cmp al, ‘E’
jnz short loc_401064
loc_40105F: ; CODE XREF: _main+59j
mov byte ptr [ecx+esi-4], 0
loc_401064: ; CODE XREF: _main+32j _main+39j …
mov edi, esi
or ecx, 0FFFFFFFFh
xor eax, eax
repne scasb
not ecx
add ecx, 3
push ecx ; unsigned int
call ??2@YAPAXI@Z ; operator new(uint)
mov edx, eax
PSEUDOCODE
for more information & purchasing