Compiler Specification


Table of Contents

Overview
Varnode Tags
Compiler Specific P-code Interpretation
<context_data>
<callfixup>
<callotherfixup>
<prefersplit>
<aggressivetrim>
Compiler Datatype Organization
<data_organization>
<enum>
<funcptr>
Compiler Scoping and Memory Access
<global>
<readonly>
<nohighptr>
Compiler Special Purpose Registers
<stackpointer>
<returnaddress>
Parameter Passing
Describing Parameters and Allocation Strategies
<default_proto>
<prototype>

Overview

The compiler specification is a required part of a Ghidra language module for supporting disassembly and analysis of a particular processor. Its purpose is to encode information about a target binary which is specific to the compiler that generated that binary. Within Ghidra, the SLEIGH specification allows the decoding of machine instructions for a particular processor, like Intel x86, but more than one compiler can produce those instructions. For a particular target binary, understanding details about the specific compiler used to build it is important to the reverse engineering process. The compiler specification fills this need, allowing concepts like parameter passing conventions and stack mechanisms to be formally described.

A compiler specification is a single file contained in a module's data/languages directory with a ".cspec" suffix. There may be more than one ".cspec" file in the directory, if Ghidra supports multiple compilers for the same processor. The compiler specification is identified by the 5th field of Ghidra's processor id. The id is explicitly linked with the ".cspec" by adding a tag in the root ".ldefs" file for the processor, also in the same directory.

Example 1. 

Defining the processor id x86:LE:32:default:gcc and associating it with the file x86gcc.cspec
<language_definitions>
  ...
  <language processor="x86"
            endian="little"
            size="32"
            variant="default"
            version="2.3"
            slafile="x86.sla"
            processorspec="x86.pspec"
            manualindexfile="../manuals/x86.idx"
            id="x86:LE:32:default">
    <description>Intel/AMD 32-bit x86</description>
    <compiler name="Visual Studio" spec="x86win.cspec" id="windows"/>
   <compiler name="gcc" spec="x86gcc.cspec" id="gcc"/>
    <compiler name="Borland" spec="x86borland.cspec" id="borland"/>
  </language>
  ...
</language_definitions>


A compiler specification is just an XML file, so it needs to start with the usual XML directive and it always has <compiler_spec> as the root XML tag. All specific compiler features are described using subtags to this tag. In principle, all the subtags are optional except the <default_prototype> tag, but there is generally a minimum set of tags that are needed to create a useful specification (See ???). In general, the subtags can appear in any order. The only exceptions are that tags which define names, like <prototype>, must appear before other tags which use that name.

The rest of this document is broken up into sections that roughly correspond with aspects of compiler design, and then subsections within these address particular tags.

Varnode Tags

Many parts of the compiler specification use tags that describe a single varnode. Since architectures frequently name many of their registers or special memory locations, it is convenient for the specification designer to be able to use these names. But in some cases there is no name and the designer must fall back on the defining triple for a varnode: an address space, an offset and a size. Hence there are really two different XML tags that are used to describe varnodes and both are referred to as a varnode tag.

The <register> tag is used to specify formally named registers, usually defined by the SLEIGH specification for the processor. The name must be given in a name attribute for the tag.

The <varnode> tag is used to generically describe any varnode. It must take three attributes: space is a formal name of the address space containing the varnode, offset is an unsigned integer specifying the byte offset of the varnode within the space, and size is an integer specifying the size of the varnode in bytes. The <varnode> tag can be used to describe any varnode, including named registers, global RAM locations, and stack locations. For stack locations, the offset is interpreted relative to the function that is being decompiled or is otherwise in scope. An offset of 0, for instance typically refers to the memory location on the stack being pointed to by the formal stack pointer register, upon entry to the function being analyzed.

Example 2. 

  <register name="EAX"/>
  <register name="r1"/>
  <varnode space="ram" offset="0x1020" size="4"/>
  <varnode space="stack" offset="8" size="8"/>
  <varnode space="stack" offset="0xfffffff8" size="2"/>
  <varnode space="register" offset="0" size="1"/>