Class SleighAssemblerBuilder

  • All Implemented Interfaces:
    AssemblerBuilder

    public class SleighAssemblerBuilder
    extends java.lang.Object
    implements AssemblerBuilder
    An AssemblerBuilder capable of supporting almost any SleighLanguage To build an assembler, please use a static method of the Assemblers class. SLEIGH-based assembly is a bit of an experimental feature at this time. Nevertheless, it seems to have come along quite nicely. It's not quite as fast as disassembly, since after all, that's what SLEIGH was designed to do. Overall, the method is fairly simple, though its implementation is a bit more complex. First, we gather every pair of pattern and constructor by traversing the decision tree used by disassembly. We then use the "print pieces" to construct a context-free grammar. Each production is associated with the one-or-more constructors with the same sequence of print pieces. We then build a LALR(1) parser for the generated grammar. This now constitutes a generic parser for the given language. Note that this step takes some time, and may be better suited as a build-time step. Because SLEIGH specifications are not generally concerned with eliminating ambiguity of printed instructions (rather, it only does so for instruction bytes), we must consider that the grammar could be ambiguous. To handle this, the action/goto table is permitted multiple entries per cell, and we allow backtracking. There are also cases where tokens are not actually separated by spaces. For example, in the ia.sinc file, there is JMP ... and J^cc, meaning, the lexer must consider J as a token as well as JMP, introducing another source of possible backtracking. Despite that, parsing is completed fairly quickly. To assemble, we first parse the textual instruction, yielding zero or more parse trees. No parse trees implies an error. For each parse tree, we attempt to resolve the instruction bytes, starting at the leaves and working upwards while tracking and solving context changes. The context changes must be considered in reverse. We read the context register of the children (a disassembler would write). We then assume there is at most one variable in the expression, solve for it, and write the solution to the appropriate field (a disassembler would read). If no solution exists, a semantic error is logged. Since it's possible a production in the parse tree is associated with multiple constructors, different combinations of constructors are explored as we move upward in the tree. If all possible combinations yield semantic errors, then the overall result is an error. Some productions are "purely recursive," e.g., :^instruction lines in the SLEIGH. These are ignored during parser construction. Let such a production be given as I => I. When resolving the parse tree to bytes, and we encounter a production with I on the left hand side, we then consider the possible application of the production I => I and its consequential constructors. Ideally, we could repeat this indefinitely, stopping when all further applications result in semantic errors; however, there is no guarantee in the SLEIGH specification that such an algorithm will actually halt, so a maximum number (default of 1) of applications are attempted. After all the context changes and operands are resolved, we apply the constructor patterns and proceed up the tree. Thus, each branch yields zero or more "resolved constructors," which each specify two masked blocks of data: one for the instruction, and one for the context. These are passed up to the parent production, which, having obtained results from all its children, attempts to apply the corresponding constructors. Once we've resolved the root node, any resolved constructors returned are taken as successfully assembled instruction bytes. If applicable, the corresponding context registers are compared to the context at the target address in the program and filtered for compatibility.
    • Constructor Detail

      • SleighAssemblerBuilder

        public SleighAssemblerBuilder​(SleighLanguage lang)
        Construct an assembler builder for the given SLEIGH language
        Parameters:
        lang - the language
    • Method Detail

      • generateAssembler

        protected void generateAssembler()
                                  throws SleighException
        Do the actual work to construct an assembler from a SLEIGH language
        Throws:
        SleighException - if there's an issue accessing the language
      • invVarnodeList

        protected org.apache.commons.collections4.MultiValuedMap<java.lang.String,​java.lang.Integer> invVarnodeList​(VarnodeListSymbol vnlist)
        Invert a varnode list to a map suitable for use with AssemblyStringMapTerminal
        Parameters:
        vnlist - the varnode list symbol
        Returns:
        the inverted string map
      • invValueMap

        protected java.util.Map<java.lang.Long,​java.lang.Integer> invValueMap​(ValueMapSymbol vm)
        Invert a value map to a map suitable for use with AssemblyNumericMapTerminal
        Parameters:
        vm - the value map symbol
        Returns:
        the inverted numeric map
      • invNameSymbol

        protected org.apache.commons.collections4.MultiValuedMap<java.lang.String,​java.lang.Integer> invNameSymbol​(NameSymbol ns)
        Invert a name table to a map suitable for use with AssemblyStringMapTerminal
        Parameters:
        ns - the name symbol
        Returns:
        the inverted string map
      • getSymbolFor

        protected AssemblySymbol getSymbolFor​(Constructor cons,
                                              OperandSymbol opsym)
        Convert the given operand symbol to an AssemblySymbol For subtables, this results in a non-terminal, for all others, the result in a terminal.
        Parameters:
        cons - the constructor to which the operand belongs
        opsym - the operand symbol to convert
        Returns:
        the converted assembly grammar symbol
      • getBitSize

        protected int getBitSize​(Constructor cons,
                                 OperandSymbol opsym)
        Obtain the size in bits of a textual operand. This is a little odd, since the variables in pattern expressions do not have an explicit size. However, the value exported by a constructor's pCode may have an explicit size given (in bytes). Thus, there is a special case, where a constructor prints just one operand and exports that same operand with an explicit size. In that case, the size of the operand is printed according to that exported size. For disassembly, this information is used simply to truncate the bits before they are displayed. For assembly, we must do two things: 1) Ensure that the provided value fits in the given size, and 2) Mask the goal when solving the pattern expression for the operand.
        Parameters:
        cons - the constructor from which the production is being derived
        opsym - the operand symbol corresponding to the grammatical symbol, whose size we wish to determine.
        Returns:
        the size of the operand in bits
      • buildSubGrammar

        protected AssemblyGrammar buildSubGrammar​(SubtableSymbol subtable)
        Build a portion of the grammar representing a table of constructors
        Parameters:
        subtable - the table
        Returns:
        the partial grammar
      • buildGrammar

        protected void buildGrammar()
        Build the full grammar for the language
      • buildContext

        protected void buildContext()
        Build the default context for the language
      • buildContextGraph

        protected void buildContextGraph()
        Build the context transition graph for the language
      • buildParser

        protected void buildParser()
        Build the parser for the language
      • getGrammar

        protected AssemblyGrammar getGrammar()
        Get the built grammar for the language
        Returns:
        the grammar
      • getParser

        protected AssemblyParser getParser()
        Get the built parser for the language
        Returns:
        the parser