next up previous
Next: About this document ... Up: Compiler construction assignments Previous: General remarks


The Asterix Compiler

This appendix provides a short introduction to the source code of the Asterix compiler.


The design of the compiler is straightforward. It has a lexical analysis, parsing, semantic analysis and code generation phase. The main() function (located in main.c), activates these phases in sequence.

    int main(int argc, char **argv)
        int yyparse(void);
        init(argc, argv);
        if (!error_seen)
        return error_seen ? 1 : 0;

Since the lexical analysis code is called by the parser, the first relevant call in main() is the one to yyparse(), the parser generated by Bison. The result of the semantic actions during parsing is an AST, containing a list of global declarations, which is stored in the variable global_declarations. Next, semantic_analyses() (invoking type_check_declaration_list(global_declarations)) and code_generation() (invoking generate_declaration_list(global_declarations)) are called to perform the semantic analyses and code generation, respectively. These functions start a traversal of the AST by recursively calling other functions to process the branches. The code for the type_check_ and generate_ functions is stored in a number of .c files, for instance, declaration.c for everything to do with declarations. These files also contain functions to create and delete the structs that make up the AST.

Flex and Bison

For the first assignment it is important to understand the way Flex and Bison depend on each other. This is summarized in the following table.

Tool Source Generates Depends on
Flex lex.l lex.yy.c yylval variable
    containing yylex() token #define's
    and yytext  
Bison grammar.y yylex()
    containing yyparse()  
    and yylval  
    containing token #define's  

The code generated by Bison (yyparse()) calls the yylex() function, which is generated by Flex. Flex also creates the yytext variable, containing the string representation of the last recognized token. In a bottom up parser like Bison, all tokens in a rule are consumed before the rule is reduced. Since we may need the actual text of the token (or the position in the source file), Bison creates the variable yylval, in which the C code in the lex file can store this information. The new_token() function in lex.l allocates a structure containing the line number and text of the token. Bison then stores this in the $n parameters, which can be used in the Bison code. In a top down parser, tokens are consumed one by one as we process a rule, so we do not need this mechanism. We can just use new_token() where we need and yytext will contain the last recognized token. Apart from this, the picture when we replace Bison with LLgen is similar:

Tool Source Generates Depends on
Flex lex.l lex.yy.c token #define's
    containing yylex()  
    and yytext  
LLgen grammar.g Lpars.c yylex()
    containing yyparse()  
    containing token #define's  

Note that the names of the generated files are different.


A Makefile is available in the source directory to build the compiler. It can be used to build the compiler either using Bison or using LLgen. Which tool is used can be selected by commenting the appropriate line at the top of the Makefile. To select LLgen:

    #VERSION = Bison
    VERSION = LLgen

Note that the Makefile assumes the grammar file for LLgen to be called grammar.g.

Run time system

Building the compiler produces an executable called cbf. With this we can compile an Asterix source file:

cbf -I$<$include directory$>$ $<$asterix source$>$

This results in two files containing C code: cb.c and cb.h. We can compile these with a normal C compiler like gcc to create an executable file. The compiler also has a run time system, containing code on which cb.c depends. The runtime system consists of the files rts.* and libcb.*. We need to link these files when compiling cb.c. For convenience a script called cbc is available that performs both steps. It compiles an Asterix source file into a.out.

cbc $<$asterix source$>$

Building the syntax tree

The .c files contain new_ and delete_ functions for every type of node in the syntax tree. These return/take as an argument a pointer to the node. Typically, a rule creates one node of the appropriate type. Rules have a parameter of type pointer to pointer of the node type, which is used to store the result of the new_ call in. Rules use a local pointer variables to store results from subrules in. In the following example (LLgen syntax) rulea creates a node using the result of another rule, ruleb.

    rulea( RuleANode **node ) { RuleBNode *nodeb; }:
        ruleb( &nodeb )
            *node = new_ruleanode( nodeb );

An exception to this approach are rules which produce a list. The easiest way to implement these is using two rules, one for the list and one for each individual item. The list rule creates a List node, and passes this to the item rule, which appends the item to the list instead of returning it through a pointer parameter.

    rulelist( List **list ):
            *list = new_list();
        ruleitem( *list )*

    ruleitem( List *list ) { RuleItemNode *item; } :
            item = new_rule_item();
            list_append( list, item ); 

See list.h for details on the available list operations.

File list

Finally, the following table summarizes the purpose of each source file.

lex.l Flex source file
lex.yy.c Flex output file
grammar.y Bison source file* Bison output files
grammar.g LLgen source file
Lpars.* LLgen output files
declaration.* Compiler sources
libcb.* Run time system

next up previous
Next: About this document ... Up: Compiler construction assignments Previous: General remarks
Koen Langendoen 2003-05-09