Implementing a brainfuck CPU in Ghidra - part 4: Renaming the analyzer and adding a manual
- Setup and disassembly
- Decompilation of pointer and arithmetic instructions
- Decompilation of I/O and control flow instructions
- Renaming the analyzer and adding a manual (current post)
- Recognizing common patterns (future post)
More posts may be added to this series in the future.
Now that our processor module can finally disassemble and decompile brainfuck binaries, we can make some slight improvements to the module. We’ll do two things in this blogpost: first we’ll rename the analyzer and then we add a processor manual
Renaming the analyzer
Right now, the analyzer that resolves branch destinations is called BrainfuckAnalyzer.java
. This name doesn’t really make it clear what it does. Maybe BranchDestinationResolver.java
would be a better name, as it’s more descriptive.
This sounds easy, right? Just right click BrainfuckAnalyzer.java
and rename it to BrainfuckAnalyzer.java
. Don’t forget to also rename the class and constructor to BranchDestinationResolver
. But, it’s not that easy or I wouldn’t have devoted half a blog post to it :)
If we now restart Ghidra, the analyzer has disappeared from analysis window. There’s no BranchDestinationResolver
or BrainfuckAnalyzer
in the list. What happened?
As it turns out, Ghidra only shows analyzers that are valid extension points. An extension point is a class that extends the functionality of Ghidra. There are two requirements for a class to be an extension point:
- It must (directly or indirectly) derive from the
ExtensionPoint
class. - The containing file must have a valid extenstion point suffix.
The first requirement is met. The BranchDestinationResolver
class indirectly extends ExtensionPoint
1. The problem is with the filename. By default only certain suffixes (there are about fifty) are recognized as extension point suffixes, the Analyzer
suffix among them. Resolver
is not a valid extension point suffix, so Ghidra doesn’t recognize BranchDestinationResolver
as an analyzer, while BrainfuckAnalyzer
is fine because it ends with Analyzer
.
We could append a valid suffix to the analyzer name to ensure Ghidra recognizes it, but it would result in an akward name (e.g. BranchDestinationResolverAnalyzer
or BranchDestinationAddressCorrelator
). We could also drop the Resolver
suffix and use the Analyzer
suffix instead (BranchDestinationAnalyzer
sounds better). Instead of doing this, we can also register a new extension point suffix.
To register a new extension point suffix, we create a file in the data
directory of the project called ExtensionPoint.manifest
. It contains only one line:
Resolver
That’s all. Ghidra recognizes the ExtensionPoint.manifest
file and registers all suffixes in the manifest (separated by newlines). This is also how the default extension points suffixes are registered. For example, the Analyzer
suffix is registered in Ghidra/Features/Base/data/ExtensionPoint.manifest
.
If we now start Ghidra and open the analysis window, the analyzer shows up again:
This was a very brief look at extension points. The whole extension point mechanism deserves a post of its own. If you’re interested in how it works, I suggest looking at the ClassSearcher
, and ClassFinder
classes.
Adding a manual
Now something different: adding a manual to the processor module. As said in the first post, a language in the .ldefs
file can have a manualindexfile
attribute pointing to a processor manual index file. A processor manual consists of one or more PDF files that contain documentation for the instruction set. A processor manual index maps instruction mnemonics to their corresponding page in the manual.
The manual index is documented at GhidraDocs/languages/manual_index.txt
, but the documentation is not fully complete. The code that loads the manual index file can be found here.
Suppose we’ve got a dummy manual, bfman.pdf
. It’s manual index file would look like this:
data/manuals/brainfuck.idx
:
@ bfman.pdf [Brainfuck Manual]
>, 1
# the '<' instruction is intentionally omitted
+, 2
-, 2
,, 3
., 3
[, 4
], 4
The first line starting with @
is called a file switch. It sets the current manual file. The [Brainfuck Manual]
part provides a description for this manual. It’s optional and is shown when Ghidra can’t find the manual file, so the user can locate the manual elsewhere. There can be multiple file switches in a manual index. This is useful when the processor manual spans over several volumes.
As stated in the index file, the <
instruction is omitted. This is because the <
character has a special (undocumented) meaning. It can be used to import another index file. There’s no way to escape this character, which makes it impossible to create an entry for the <
instruction. The only way to get around this is to rename the <
instruction. For now, we’ll just omit the <
instruction.
Also good to know (and again undocumented): a #
indicates a comment.
The only thing left to do now is to add the manual index file to the language definition in brainfuck.ldefs
. This is done by adding the following attribute to the language tag: manualindexfile="../manuals/brainfuck.idx"
.
That’s all there is to manual index files. If you now right-click an instruction and click Processor Manual...
, Ghidra will show the documentation for that instruction!
In this post, we’ve improved the usability of the processor module. Meanwhile, the module still produces poor decompilation output. Next time, we’ll look at improving the decompilation of our module. Hopefully, we’ll manage to produce better decompilations.
Footnotes
-
BranchDestinationResolver ⊂ AbstractAnalyzer ⊂ Analyzer ⊂ ExtensionPoint
↩