Compiler Specific P-code Interpretation

<context_data>

Attributes and Children
<context_set> (0 or more) Set a context variable across a region of memory
<tracked_set> (0 or more) Set default value of register

A <context_data> tag consists of zero or more <context_set> and <tracked_set> subtags, which allow certain values to be assumed by analysis.

<context_set>

Attributes and Children
space Name of address space
first (Optional) Starting offset of range
last (Optional) Ending offset of range
<set> Specify the context variable and the new value
name Name of the context variable
val Integer value being set
description (Optional) Description of what is set

A <context_set> tag sets a SLEIGH context variable over a specified address range. This potentially affects how instructions are disassembled within that range. This is more commonly used in the processor specification file but can also be used here for specific compilers. The attributes space, first, and last describe the range. Omitting first and/or last causes the range to start at the beginning and/or run to the end of the address space respectively. The <set> subtag describes the variable and its setting.

Example 3. 

  <context_data>
    <context_set space="ram">
      <set name="mode16" val="1" description="Set 16-bit mode across all of ram"/>
    </context_set>
  </contextdata>

<tracked_set>

Attributes and Children
space Name of address space
first (Optional) Starting offset of range
last (Optional) Ending offset of range
<set> Specify the register and the new value
name Name of the register
val Integer value being set
description (Optional) Description of what is set

A <tracked_set> tag informs the decompiler that a register takes a specific value for any function whose entry point is in the indicated range. Compilers sometimes know or assume that registers have specific values coming into a function it produces. This tag allows the decompiler to make the same assumption and possibly use constant propagation to make further simplifications.

Example 4. 

  <context_data>
    <tracked_set space="ram">
      <set name="spsr" val="0"/>
    </tracked_set>
  </context_data>

<callfixup>

Attributes and Children
name The identifier for this callfixup
<target> (0 or more) Map this callfixup to a specific symbol
name The specific symbol name
<pcode> Description of p-code to inject.

<pcode>

Attributes and Children
paramshift (Optional) Integer for shifting parameters at the callpoint.
<body> P-code to inject.
text

Compilers frequently make use of special bookkeeping functions that are really internal to the compiler and not a direct reflection of functions in the original source code. During analysis it can be helpful to replace a call to such a function with a snippet of p-code that inlines the behavior, or a portion of the behavior, so that the decompiler can use it during its simplification rather than displaying it as an opaque call. A typical use is to inline prologue functions that help set up a stack frame.

The name attribute can be used to identify the callfixup within the Ghidra CodeBrowser and manually force certain functions to be replaced. The name attribute of the <callfixup> tag and any optional <target> subtags identify function names which will automatically be replaced.

The text of the <body> subtag is fed directly to the SLEIGH semantic expression parser to create the p-code snippet. Identifiers are interpreted as formal registers, if the register exists, but are otherwise interpreted as temporary registers in the unique space of the processor. Its usually best to surround text with the XML <![CDATA[ construct.

Example 5. 

  <callfixup name="get_pc_thunk_bx">
    <target name="__i686.get_pc_thunk.bx"/>
    <pcode>
      <body><![CDATA[
      EBX = * ESP;
      ESP = ESP + 4;
      ]]></body>
    </pcode>
  </callfixup>

<callotherfixup>

Attributes and Children
targetop Name of the CALLOTHER operator to inject.
<pcode> Description of p-code to inject.

<pcode>

Attributes and Children
<input> (0 or more) Description of formal input parameter.
name Name of the specific input symbol.
size Expected size of the parameter in bytes.
<output> (0 or more) Description of formal output parameter.
name Name of the specific output symbol.
size Expected size of output in bytes.
<body> P-code to inject.
text

The <callotherfixup> is similar to a <callfixup> tag but is used to describe injections that replace user-defined p-code operations, rather than CALL operations. User-defined p-code operations, referred to generically as CALLOTHER operations, are black-box operations that a SLEIGH specification can define to encapsulate complicated (or esoteric) actions performed by the processor. The specification must define a unique name for each such operation. The targetop attribute links the p-code described here to the specific operation via this name.

As with any p-code operation, the CALLOTHER takes formal varnodes as inputs and/or outputs. These varnodes can be referred to in the injection <body> by predefining them using <input> or <output> tags. The sequence of <input> tags correspond in order to the input parameters of the CALLOTHER, and a <output> tag corresponds to output varnode if present. The tags listed here must match the number of input and output parameters in the actual p-code operation, or an exception will be thrown during p-code generation. The optional size attribute in each tag will, if present, impose a size restriction on the parameter as well.

As with a <callfixup>, the <body> tag is fed straight to the SLEIGH semantic parser. It can refer to registers via their symbolic name defined in SLEIGH, it can refer to the operator parameters via their <input> or <output> names, and it can also refer to inst_start and inst_next as addresses describing the instruction containing the CALLOTHER.

Example 6. 

  <callotherfixup targetop="saturate">
    <pcode>
      <input name="in1" size="4"/>
      <input name="in2" size="4"/>
      <body><![CDATA[
        in1 = in1 + in2;
        if (in1 < 0x10000) goto <end>;
        in1 = 0xffff;
        <end>
      ]]></body>
    </pcode>
  </callotherfixup>

<prefersplit>

Attributes and Children
style Strategy for splitting: inhalf
<register> or <varnode> (1 or more) varnode tags

This tag is designed to mark specific registers as packed, containing multiple logical values that need to be split. The decompiler attempts to split up any operator that reads or writes the register into multiple p-code operations that operate on each logical value individually.

The tag lists one or more varnode tags describing the registers or other storage locations that need to be split. The style attribute indicates how the storage locations should be split. Currently the only accepted style value is "inhalf", which means that each varnode should be split into two equal pieces.

Splitting a varnode is only possible if the all p-code operations it is involved in don't mix their action across the logical pieces. If this is not possible, the p-code will not be altered for that particular varnode.

Example 7. 

  <prefersplit style="inhalf">
    <register name="xr0"/>
    <register name="xr1"/>
    <register name="xr2"/>
  <prefersplit>

<aggressivetrim>

Attributes and Children
signext (Optional) true if sign-extension should be aggressively trimmed

This tag tells the decompiler that p-code extension operations are likely to be a side-effect of the processor and are obscuring what is just the manipulation of the smaller logical value. The decompiler normally trims extensions and other operations where it can prove that the most significant bytes of the result are unused. This tag lets the decompiler be more aggressive when use of the extended bytes is more indeterminate. It can assume that extensions into sub-function parameters and into the return value are extraneous.

The signext attribute turns the behavior on specifically for the sign-extension operation. Currently there is no toggle for zero-extensions.

Example 8. 

  <aggressivetrim signext="true"/>