Download Java Factors Program Analysis and more Study notes Compilers in PDF only on Docsity!
[ITP404/Compiler Theory]
Project II - manual.pdf
Author: Charmgil, Hong / 20200636 CSEE
0. Table of Contents
1. Introduction
2. Construction
3. Compilation
4. Execution
5. Running Results for Given Examples
6. Method List
7. Conclusion
8. Packaged File List
9. References
A. Appendix - minijava.txt
1. Introduction
This object module package includes a LL(1) parser object that was designed for 'Mini Java'
Project, an instructional supplement for ITP404 - Compiler Theory class (winter semester
' Mini Java ' is a simplified set of Java Language built for educational purpose. It is based on
Java's character set, operators, naming rules, and grammars, and does not carry fully
implement class-related capabilities.
For the second part of final term project, I designed a "dynamic" LL(1) parser program ,
which is specially implemented to read an external grammar file written in certain format and
automatically build a parsing table and extras, such as first and follow set and others. With
this feature, a user will be able to regulate one's own grammar rules and utilize those rules
using the parser for syntax analysis of source programs.
In this version of package, you will obtain a LL(1) parser , a parsing table generating
program and an external grammar file that describes the grammar rules of Mini Java. I also
included an updated version of scanner , and symbol table that were originally created for
the previous assignment.
2. Construction
The following shows how the constructor of the parser:
public LLParser( String sourceFile, boolean verbose ) → source file location → verbose mode switch
Note that, by modifying some arguments in the constructor, user can change the aspects
and internal mechanisms of operations used by parser components.
[constructor, LLParser.java]
// instantiate scanner / table mode scanner = new JScan( sourceFile, false, "table" ); false → true/false is switch to turn on/off verbose mode
"table" → "table"/"dfa" indicates the method scanner uses // load grammar grammar = new JGrammarDictionary( "minijava.txt", true ); "minijava.txt" → location of grammar file false → true/false is switch to turn on/off verbose mode
3. Compilation
To try out a driver program, compile the following parser module by typing
the following command on your prompt:
javac LLParser.java
4. Execution
To execute the scanner driver program, type the following command on your
prompt:
java LLParser <source_file_name>
5. Running Screen and Results for the given examples
The following is a sample result screen from driver program in full verbose mode.
(The program may not be garrulous like below if you turn off the verbose mode.)
[output on standard out]
Charmgil-MacBook:parser_LL1 charmgil$ java LLParser
( msg: grammar rules are successfully loaded... ) ← indicates grammar file
is successfully loaded.
( msg: indexing nts + ts is completed... ) ← indicates the parser completed
to recognize all nonterminals and
terminals within the grammar file.
--- new PASS ---
--- new PASS --- ← # times "new pass" string appear indicates #passes to
completely compute first set.
( msg: FirstSets are obatined... )
--- new PASS ---
--- new PASS --- ← Similarly, # times "new pass" string appear indicates
#passes to completely compute follow set.
( msg: FollowSets are obatined... )
( msg: Parsing Table is generated... ) ← notifies that a parsing table is
generated successfully.
[grammar rules]
SOURCE_CODE = IMPORT_STMT CLASS_DCLR ← dump all grammar rules entered.
top(pStack): #e - } :19:cToken - SPECIAL_SYMBOL ← currently, '#e' is at the top
$ } } #e of the stack and '}' is the next token. 19 is line#.
top(pStack): } - } :19:cToken - SPECIAL_SYMBOL
$ } } ← the second rows show the entire parsing stack
top(pStack): } - } :20:cToken - SPECIAL_SYMBOL
syntax OK
( msg: parsing process is done successfully... )
Before the program actually starts the parsing process, the program loads an external grammar file , designated in the LLParser (more specifically, by the LLParsingTable), when it is created. Then it recognizes nonterminals and terminals from the grammar file and, by using a hash table, creates indices of all those grammar symbols. After this pre-process, the parser will compute first set and follow set and build a parsing table accordingly. Once a parsing table is created, the program is ready to start parsing. If a user is using verbose mode, the status of the parsing stack and matching-and-generating processes will be displayed on screen. This release only shows the parsing stack status and matching-and-generating works as preset. For see more detailed information, the user should turn on verbose mode when creating each component. Example source code, proj1_test1.txt , and result from scanner:
[ Source Code ]
//level 1 test data class Factors { void main ( ) { int count = 1, number; println ("Enter a positive number:"); number = 4; println ("The factors are:"); while (count <= number) { if (number%count == 0) println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) println(count); else println(number); } } // method main }
[ Result ]
Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_test1.txt syntax OK
( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj2_result_test1_CharmgilHong20200636.txt Example source code, proj1_test2.txt , and result from scanner:
[ Source Code ]
//level 2 test data import java.io.*; class Factors { public static void main (String[] args) { int count = 1, number; System.out.println ("Enter a positive number:"); number = 4; System.out.println ("The factors are:"); while (count <= number) { if (number%count == 0) System.out.println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) System.out.println(count); else System.out.println(number); } } // method main }
[ Result ]
Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_test2.txt syntax OK ( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj2_result_test2_CharmgilHong20200636.txt Example source code, proj1_test3.txt , and result from scanner:
[ Source Code ]
//level 3 test data: ok import java.io.*; class Factors { public static void main (String[] args) throws IOException { BufferedReader stdin = new BufferedReader (new InputStreamReader(System.in)); int count = 1, number; System.out.println ("Enter a positive number:"); number = Integer.parseInt (stdin.readLine()); System.out.println ("The factors of " + number + " are:");
Detection of duplicated declaration is implemented as well. If the user code has duplicated
variable declarations, the parser will give out the following syntax error message.
Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_bad1.txt proj2_bad1.txt:4: illegal identifier(e:002) 123 - invalid name for variable declarator ^ proj2_bad1.txt:4: illegal identifier(e:002) for - invalid name for variable declarator ^ proj2_bad1.txt:8: identifier is not declared(e:008) undeclared - symbol needs to be declared ^ 3 errors ( msg: parsing process is done with some errors... ) Result in verbose mode is stored in proj2_result_bad1_CharmgilHong20200636.txt Example source code, proj1_bad2.txt , and result from scanner:
[ Source Code ]
//Another test case is: class Factors { void main ( ) { int count $= 1, 123number; println ("Enter a positive number:"); number = 4; println ("The factors are:")@ while (count <= number) { if (number%count == 0) println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) println(count); else println(number); } } // method main }
[ Result ]
On line#4, '$' sign is unnecessarily placed, and harms the grammatical correctness of the
entire code. Also the following identifier, '123number', and a character, '@', on line#7 are detected
as illegal identifiers. In addition, all instances of the identifier, 'number,' makes syntactic errors since
it is not declared.
Sensing these types of errors was a bit tricky. First of all, I added an additional grammar
transition onto the parsing table (according to the section 4.5.2. - Error Recovery in LL(1) Parsers, in
our textbook). Specifically, I added 'VAR_INIT → POP VAR_INIT' onto where the nonterminal,
'VAR_INIT', and the terminal, '$', cross (I have marked on the parsing table). I also had to add some
code in my parser, that is, if the top of the stack has POP, then, regardless of what the token may
contain, I simply ignored the content of token.
Similarly '@' could be removed, but this does not modify the parsing table since my parsing
table does not possess a field for '@'. Instead of modifying the parsing table, I could reach a good
point within my parser to resolve it, namely where the control is waiting for the correct token.
Between line# 269 and 282, you can see how the program works. When the program runs this code
section, the program should be already known, since the flow of logic signifies the meaning. Thus,
by simply eliminating the illegal token with unmatched grammar symbol, we can keep the program
running until the end of the file.
The illegality of '123number' is actually caught at the scanner level. Since '123number'
violates the naming rules, the scanner marks the identifier as illegal and lets the parser know of it
through handing 'TokenInfoBlock,' which is 'cTib' in my parser.
Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_bad2.txt proj2_bad2.txt:4: syntax error(e:010) $ - unresolved token ^ proj2_bad2.txt:4: illegal identifier(e:002) 123number - invalid name for variable declarator ^ proj2_bad2.txt:6: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:7: illegal control(e:009) @ - ';', is expected ^ proj2_bad2.txt:8: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:9: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:13: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:13: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:14: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:17: identifier is not declared(e:008) number - symbol needs to be declared ^ 10 errors ( msg: parsing process is done with some errors... )
Result in verbose mode is stored in proj2_result_bad2_CharmgilHong20200636.txt
6. Method List
Users can use the following public methods when using LLParser object in the project.
Detailed descriptions and usages -- such as parameters and returns -- are given in comments
within the code.
LLParser.java contains the following methods:
public LLParser( String sourceFile, boolean verbose ); public boolean parse(); public void dumpPStack();
bottom. So I decided to make my parser do the whole processes of top-down parsing, which
include computing first and follow sets and building a parsing table from the first and follow
sets. I thought that was the best way to escape from series of errors that human is prone to
make. Overall, I believed that I would make myself understand the materials more profoundly
and more proficiently.
As I expected, I came to deeply appreciate how LL(1) parsers work, how syntax check is
done in top-down method and how a parsing table is built with a given grammar, principles
and techniques for setting grammars. I am glad that I created a well-refined set of grammars
for me to use in the future. I expect that it will be extremely helpful for further work: building a
bottom-up -- LR(1) or SLR(1) -- parser.
8. Packaged File List
The package contains the following files:
source codes: LLParser.java
LLParsingTable.java
LLParsingStack.java
JGrammarDictionary.java
JScan.java
JSymbolTable.java
grammar file: minijava.txt
example sources: proj2_test1.txt
proj2_test2.txt
proj2_test3.txt
defected codes: proj2_bad1.txt
proj2_bad2.txt
results w/ stack: proj2_result_test1_CharmgilHong_20200636.txt
proj2_result_test2_CharmgilHong_20200636.txt
proj2_result_test3_CharmgilHong_20200636.txt
proj2_result_bad1_CharmgilHong_20200636.txt
proj2_result_bad2_CharmgilHong_20200636.txt
verbose sample: proj2_verbose_result_test1_CharmgilHong_20200636.txt
proj2_verbose_result_test2_CharmgilHong_20200636.txt
proj2_verbose_result_test3_CharmgilHong_20200636.txt
documentation: BNF_grammars_CharmgilHong_20200636.pdf
LLParser_UML_diagram_CharmgilHong_20200636.pdf
LLParsingTable_CharmgilHong_20200636.xls
manual_CharmgilHong_20200636.pdf
total 24 of them.
9. References
Textbook
Compiler Construction. Kenneth C. Louden. PWS. 1997.
Web Documents
Sun's Java Official Documentations - for detail and correctness regarding API, operators,
characters, variables and conventions.
http://java.sun.com/j2se/1.5.0/docs/api/
http://java.sun.com/docs/codeconv/html/CodeConventions.doc4.html
http://java.sun.com/docs/books/tutorial/java/nutsandbolts/variables.html
http://java.sun.com/docs/books/tutorial/java/nutsandbolts/operators.html
http://java.sun.com/docs/books/tutorial/java/data/characters.html
Regular Expression References
http://www.regular-expressions.info/reference.html
Tools
Visual Automata Simulator
http://www.cs.usfca.edu/~jbovet/vas.html
A. Appendix - minijava.txt
In this section, I attached a full list of grammars from minijava.txt in order to describe the
syntactic rules used in the source program. Before showing the grammar file, however, I
would like to note some key principles that were used when setting grammar rules:
1. Nonterminals are in CAPITAL LETTERS, and terminals are in lowercase letters.
2. Symbols that start with '#' indicate the token type.
3. The character, '=', was used to separate the left hand side and the right hand side of the
rules
4. To distinguish grammar transitions and equal signs, use ':=' for the equal signs.
5. To indicate an Epsilon transition, use '#e' for portability.
6. Do not leave blank lines between rules. (This will result in error while program is
running.)
Now, the following is the list of grammars.
SOURCE_CODE = IMPORT_STMT CLASS_DCLR
SOURCE_CODE = CLASS_DCLR
IMPORT_STMT = import #api_identifier * ; CLASS_DCLR = class #identifier { CLASS_DEF } CLASS_DEF = MAIN_METHOD CLASS_DEF = #e MAIN_METHOD = ACCESS STATIC TYPE main ( PARAM ) EXCEPTIONS { STMT } ACCESS = public ACCESS = private ACCESS = protected ACCESS = #e STATIC = static STATIC = #e PARAM = TYPE PARAMPARAM = #e PARAM
= #identifier [ ] PARAM` = [ ] #identifier TYPE = int TYPE = void TYPE = String TYPE = BufferedReader
TEST = EXPR TEST_OP TEST_NORM
TEST_NORM = #identifier TEST_NORM = #number TEST_NORM = ( EXPR ) TEST_OP = < TEST_OP = > TEST_OP = <= TEST_OP = >= TEST_OP = == WHILE_STMT = while ( TEST ) STMT_SINGLE FOR_STMT = for ( ASSIGN_STMT ; TEST ; U_EXPR ) STMT_SINGLE U_EXPR = #identifier ++ U_EXPR = #identifier -- U_EXPR = ++ #identifier U_EXPR = -- #identifier
End of Document.
[ITP404/Compiler Theory]
Project III - manual.pdf
Author: Charmgil, Hong / 20200636 CSEE
0. Table of Contents
1. Introduction
2. Construction
3. Compilation
4. Execution
5. Running Results for given examples
6. Method List
7. Conclusion
8. Packaged File List
9. References
A. Appendix 1 - First and Follow Set of Grammar Symbols
B. Appendix 2 - minijava.txt
1. Introduction
This object module package includes a SLR(1) parser object that was designed for 'Mini
Java' Project, an instructional supplement for ITP404 - Compiler Theory class (winter
semester 2008-2009).
' Mini Java ' is a simplified set of Java Language built for educational purpose. It is based on
Java's character set, operators, naming rules, and grammars, and does not carry fully
implement class-related capabilities.
For the final term project, I designed a Simple LR(1) parser program , which reads an
external parsing table and error handlers to parse given source in bottom-up method. (For
further research purpose, I intentionally placed the parsing table and error handlers in this
document)
You will obtain a set of SLR(1) parser , scanner , and symbol table with this package.
2. Construction
The following shows how the constructor of the parser:
public SLRParser( String sourceFile, boolean verbose ) → source file location → verbose mode switch
By declaring the above line in your program, you will be able to adjust the parser program in
accordance with your own program. The verbose flag, the right parameter, is a switch to
display the parsing stack contents.
There is also a debug mode, which will let you inspect the program in detail. The flag is
given as boolean deb_mode a follow:
private final boolean deb_mode = true;
By turning the flag on, you will see current tokens and the top of the parsing stack, as well
[ Source Code ]
//level 2 test data import java.io.*; class Factors { public static void main (String[] args) { int count = 1, number; System.out.println ("Enter a positive number:"); number = 4; System.out.println ("The factors are:"); while (count <= number) { if (number%count == 0) System.out.println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) System.out.println(count); else System.out.println(number); } } // method main }
[ Result ]
Charmgil-MacBook:parser_SLR1 charmgil$ java SLRParser proj3_test2.txt syntax OK ( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj3_result_test2_CharmgilHong20200636.txt Example source code, proj3_test3.txt , and result from scanner:
[ Source Code ]
//level 3 test data: ok import java.io.*; class Factors { public static void main (String[] args) throws IOException { BufferedReader stdin = new BufferedReader (new InputStreamReader(System.in)); int count = 1, number; System.out.println ("Enter a positive number:"); number = Integer.parseInt (stdin.readLine()); System.out.println ("The factors of " + number + " are:"); while (count <= (number/2)) { if (number%count == 0) System.out.println (count); count = count + 1; } for (count = 0; count <6 ; count ++) number = number + count;
if (count < number ) System.out.println(count); else System.out.println(number); } // method main }
[ Result ]
Charmgil-MacBook:parser_SLR1 charmgil$ java SLRParser proj3_test3.txt syntax OK ( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj3_result_test3_CharmgilHong20200636.txt The following source programs have errors in their codes. Example source code, proj3_bad1.txt , and result from scanner:
[ Source Code ]
//bad code with some illegal words class Factors { void main ( ) { int count = 1, 123 , number, for; println ("Enter a positive number:"); number = 4; println ("The factors are:"); whileA (count <= undeclared) { if (number%count == 0) println (count); count = count + 1; } } // method main }
[ Result ]
To detect illegal identifiers, I used the same method used in the previous assignment.
Since my scanner program is able to tell the parser what the current token type is, the
parser will not take any numerics or reserved words even while the grammar is waiting for
an identifier. To detect undeclared variable, I used hash table for symbol table.
However, to catch "whileA" was tricky. In all honesty, I was not able to catch it in the
manner I desired; I simply hard-coded a "whileA"-catching section. The implementation is
rudimentary at best but there was a very good reason for doing what I did. When I tried to
compile the code using java compiler, it failed to compile and printed out the following error
message:
Exception in thread "main" java.lang.Error: Unresolved compilation problems: The method whileA(boolean) is undefined for the type al Syntax error, insert ";" to complete Statement
It became evident that people at Java also do not have a solution for the same problem!
I was not able to catch this error procedurally. Since WhileA will be recognized as an
Charmgil-MacBook:parser_SLR1 charmgil$ java SLRParser proj3_bad2.txt proj3_bad2.txt:4: syntax error(e:006) $ - illegal syntax ^ proj3_bad2.txt:4: illegal identifier(e:001) 123number - invalid name for variable declarator ^ proj3_bad2.txt:6: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:7: illegal identifier(e:003) @ - unresolved token ^ proj3_bad2.txt:8: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:9: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:13: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:13: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:14: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:17: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:19: syntax error(e:006) else - illegal syntax ^ proj3_bad2.txt:19: identifier is not declared(e:004) number - symbol needs to be declared ^ 12 errors ( msg: parsing process is done with some errors... )
Result in verbose mode is stored in proj3_result_bad2_CharmgilHong20200636.txt
6. Method List
Users can use the following public methods when using SLRParser object in the project.
Detailed descriptions and usages -- such as parameters and returns -- are given in comments
within the code.
SLRParser.java contains the following methods:
public SLRParser( String sourceFile, boolean verbose ); public boolean parse(); public void dumpPStack(); public static void main(String[] args);
SLRParsingTable.java contains the following methods:
public SLRParsingTable( boolean verbose ); public String getTransition( int state, int symbol ); public boolean isComplete( int state ); public boolean isError( int state ); public boolean isAccepted( int state ); public void dumpParsingTable();
SLRParsingUtils.java contains the following methods:
public SLRParserUtils( JGrammarDictionary gDic, boolean verbose ); public void automatedInit(); public void dumpFirstSet(); public void dumpFollowSet(); public void dumpNtListIndexTable(); public void dumpTListIndexTable(); public void dumpAll(); public char actionOf( String transition ); public int moveTo( String transition ); public Integer getIndex( String category, String key ); public void getEnter(); public boolean isNonterminal( String grammarSymbol ); public boolean isPattern( String grammarSymbol );
JGrammarDictionary.java contains the same methods from its previous releases.
JScan.java contains the same methods of its previous releases.
JSymbolTable.java contains the same methods from its previous releases.
7. Conclusion
While carrying out the assignment, I was able to gain a more profound understanding of
materials regarding bottom-up parsing mechanism and SLR(1) parser. I found the entire
procedure - considering use cases, visualizing them into DFA, building a parsing table from
the DFA, and finally creating a parser program from the parsing table- complex and precision-