Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Java Factors Program Analysis, Study notes of Compilers

Source code examples and results from scanning for three different java programs. The first program is a factor finding program with correct syntax, the second program contains illegal identifiers, and the third program has a missing 'while' keyword. The document also includes the contents of the llparser, llparsingtable, llparsingstack, jgrammardictionary, and jscan java classes, which are used for parsing and analyzing the source code.

Typology: Study notes

2023/2024

Uploaded on 04/06/2024

arbit-saha
arbit-saha 🇮🇳

1 document

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
---------------------------------------------------------------------
[ITP404/Compiler Theory]
Project II - manual.pdf
Author: Charmgil, Hong / 20200636 CSEE
---------------------------------------------------------------------
0. Table of Contents
1. Introduction
2. Construction
3. Compilation
4. Execution
5. Running Results for Given Examples
6. Method List
7. Conclusion
8. Packaged File List
9. References
A. Appendix - minijava.txt
1. Introduction
This object module package includes a LL(1) parser object that was designed for 'Mini Java'
Project, an instructional supplement for ITP404 - Compiler Theory class (winter semester
2008-2009).
'Mini Java' is a simplified set of Java Language built for educational purpose. It is based on
Java's character set, operators, naming rules, and grammars, and does not carry fully
implement class-related capabilities.
For the second part of final term project, I designed a "dynamic" LL(1) parser program,
which is specially implemented to read an external grammar file written in certain format and
automatically build a parsing table and extras, such as first and follow set and others. With
this feature, a user will be able to regulate one's own grammar rules and utilize those rules
using the parser for syntax analysis of source programs.
In this version of package, you will obtain a LL(1) parser, a parsing table generating
program and an external grammar file that describes the grammar rules of Mini Java. I also
included an updated version of scanner, and symbol table that were originally created for
the previous assignment.
2. Construction
The following shows how the constructor of the parser:
public LLParser( String sourceFile, boolean verbose )
source file location verbose mode switch
Note that, by modifying some arguments in the constructor, user can change the aspects
and internal mechanisms of operations used by parser components.
[constructor, LLParser.java]
...
// instantiate scanner / table mode
scanner = new JScan( sourceFile, false, "table" );
false true/false is switch to turn on/off verbose mode
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Java Factors Program Analysis and more Study notes Compilers in PDF only on Docsity!

[ITP404/Compiler Theory]

Project II - manual.pdf

Author: Charmgil, Hong / 20200636 CSEE

0. Table of Contents

1. Introduction

2. Construction

3. Compilation

4. Execution

5. Running Results for Given Examples

6. Method List

7. Conclusion

8. Packaged File List

9. References

A. Appendix - minijava.txt

1. Introduction

This object module package includes a LL(1) parser object that was designed for 'Mini Java'

Project, an instructional supplement for ITP404 - Compiler Theory class (winter semester

' Mini Java ' is a simplified set of Java Language built for educational purpose. It is based on

Java's character set, operators, naming rules, and grammars, and does not carry fully

implement class-related capabilities.

For the second part of final term project, I designed a "dynamic" LL(1) parser program ,

which is specially implemented to read an external grammar file written in certain format and

automatically build a parsing table and extras, such as first and follow set and others. With

this feature, a user will be able to regulate one's own grammar rules and utilize those rules

using the parser for syntax analysis of source programs.

In this version of package, you will obtain a LL(1) parser , a parsing table generating

program and an external grammar file that describes the grammar rules of Mini Java. I also

included an updated version of scanner , and symbol table that were originally created for

the previous assignment.

2. Construction

The following shows how the constructor of the parser:

public LLParser( String sourceFile, boolean verbose ) → source file location → verbose mode switch

Note that, by modifying some arguments in the constructor, user can change the aspects

and internal mechanisms of operations used by parser components.

[constructor, LLParser.java]

// instantiate scanner / table mode scanner = new JScan( sourceFile, false, "table" ); false → true/false is switch to turn on/off verbose mode

"table" → "table"/"dfa" indicates the method scanner uses // load grammar grammar = new JGrammarDictionary( "minijava.txt", true ); "minijava.txt" → location of grammar file false → true/false is switch to turn on/off verbose mode

3. Compilation

To try out a driver program, compile the following parser module by typing

the following command on your prompt:

javac LLParser.java

4. Execution

To execute the scanner driver program, type the following command on your

prompt:

java LLParser <source_file_name>

5. Running Screen and Results for the given examples

The following is a sample result screen from driver program in full verbose mode.

(The program may not be garrulous like below if you turn off the verbose mode.)

[output on standard out]

Charmgil-MacBook:parser_LL1 charmgil$ java LLParser

( msg: grammar rules are successfully loaded... ) ← indicates grammar file

is successfully loaded.

( msg: indexing nts + ts is completed... ) ← indicates the parser completed

to recognize all nonterminals and

terminals within the grammar file.

--- new PASS ---

--- new PASS --- ← # times "new pass" string appear indicates #passes to

completely compute first set.

( msg: FirstSets are obatined... )

--- new PASS ---

--- new PASS --- ← Similarly, # times "new pass" string appear indicates

#passes to completely compute follow set.

( msg: FollowSets are obatined... )

( msg: Parsing Table is generated... ) ← notifies that a parsing table is

generated successfully.

[grammar rules]

SOURCE_CODE = IMPORT_STMT CLASS_DCLR ← dump all grammar rules entered.

top(pStack): #e - } :19:cToken - SPECIAL_SYMBOL ← currently, '#e' is at the top

$ } } #e of the stack and '}' is the next token. 19 is line#.

top(pStack): } - } :19:cToken - SPECIAL_SYMBOL

$ } } ← the second rows show the entire parsing stack

top(pStack): } - } :20:cToken - SPECIAL_SYMBOL

syntax OK

( msg: parsing process is done successfully... )

Before the program actually starts the parsing process, the program loads an external grammar file , designated in the LLParser (more specifically, by the LLParsingTable), when it is created. Then it recognizes nonterminals and terminals from the grammar file and, by using a hash table, creates indices of all those grammar symbols. After this pre-process, the parser will compute first set and follow set and build a parsing table accordingly. Once a parsing table is created, the program is ready to start parsing. If a user is using verbose mode, the status of the parsing stack and matching-and-generating processes will be displayed on screen. This release only shows the parsing stack status and matching-and-generating works as preset. For see more detailed information, the user should turn on verbose mode when creating each component. Example source code, proj1_test1.txt , and result from scanner:

[ Source Code ]

//level 1 test data class Factors { void main ( ) { int count = 1, number; println ("Enter a positive number:"); number = 4; println ("The factors are:"); while (count <= number) { if (number%count == 0) println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) println(count); else println(number); } } // method main }

[ Result ]

Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_test1.txt syntax OK

( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj2_result_test1_CharmgilHong20200636.txt Example source code, proj1_test2.txt , and result from scanner:

[ Source Code ]

//level 2 test data import java.io.*; class Factors { public static void main (String[] args) { int count = 1, number; System.out.println ("Enter a positive number:"); number = 4; System.out.println ("The factors are:"); while (count <= number) { if (number%count == 0) System.out.println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) System.out.println(count); else System.out.println(number); } } // method main }

[ Result ]

Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_test2.txt syntax OK ( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj2_result_test2_CharmgilHong20200636.txt Example source code, proj1_test3.txt , and result from scanner:

[ Source Code ]

//level 3 test data: ok import java.io.*; class Factors { public static void main (String[] args) throws IOException { BufferedReader stdin = new BufferedReader (new InputStreamReader(System.in)); int count = 1, number; System.out.println ("Enter a positive number:"); number = Integer.parseInt (stdin.readLine()); System.out.println ("The factors of " + number + " are:");

Detection of duplicated declaration is implemented as well. If the user code has duplicated

variable declarations, the parser will give out the following syntax error message.

Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_bad1.txt proj2_bad1.txt:4: illegal identifier(e:002) 123 - invalid name for variable declarator ^ proj2_bad1.txt:4: illegal identifier(e:002) for - invalid name for variable declarator ^ proj2_bad1.txt:8: identifier is not declared(e:008) undeclared - symbol needs to be declared ^ 3 errors ( msg: parsing process is done with some errors... ) Result in verbose mode is stored in proj2_result_bad1_CharmgilHong20200636.txt Example source code, proj1_bad2.txt , and result from scanner:

[ Source Code ]

//Another test case is: class Factors { void main ( ) { int count $= 1, 123number; println ("Enter a positive number:"); number = 4; println ("The factors are:")@ while (count <= number) { if (number%count == 0) println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) println(count); else println(number); } } // method main }

[ Result ]

On line#4, '$' sign is unnecessarily placed, and harms the grammatical correctness of the

entire code. Also the following identifier, '123number', and a character, '@', on line#7 are detected

as illegal identifiers. In addition, all instances of the identifier, 'number,' makes syntactic errors since

it is not declared.

Sensing these types of errors was a bit tricky. First of all, I added an additional grammar

transition onto the parsing table (according to the section 4.5.2. - Error Recovery in LL(1) Parsers, in

our textbook). Specifically, I added 'VAR_INIT → POP VAR_INIT' onto where the nonterminal,

'VAR_INIT', and the terminal, '$', cross (I have marked on the parsing table). I also had to add some

code in my parser, that is, if the top of the stack has POP, then, regardless of what the token may

contain, I simply ignored the content of token.

Similarly '@' could be removed, but this does not modify the parsing table since my parsing

table does not possess a field for '@'. Instead of modifying the parsing table, I could reach a good

point within my parser to resolve it, namely where the control is waiting for the correct token.

Between line# 269 and 282, you can see how the program works. When the program runs this code

section, the program should be already known, since the flow of logic signifies the meaning. Thus,

by simply eliminating the illegal token with unmatched grammar symbol, we can keep the program

running until the end of the file.

The illegality of '123number' is actually caught at the scanner level. Since '123number'

violates the naming rules, the scanner marks the identifier as illegal and lets the parser know of it

through handing 'TokenInfoBlock,' which is 'cTib' in my parser.

Charmgil-MacBook:parser_LL1 charmgil$ java LLParser proj2_bad2.txt proj2_bad2.txt:4: syntax error(e:010) $ - unresolved token ^ proj2_bad2.txt:4: illegal identifier(e:002) 123number - invalid name for variable declarator ^ proj2_bad2.txt:6: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:7: illegal control(e:009) @ - ';', is expected ^ proj2_bad2.txt:8: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:9: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:13: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:13: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:14: identifier is not declared(e:008) number - symbol needs to be declared ^ proj2_bad2.txt:17: identifier is not declared(e:008) number - symbol needs to be declared ^ 10 errors ( msg: parsing process is done with some errors... )

Result in verbose mode is stored in proj2_result_bad2_CharmgilHong20200636.txt

6. Method List

Users can use the following public methods when using LLParser object in the project.

Detailed descriptions and usages -- such as parameters and returns -- are given in comments

within the code.

LLParser.java contains the following methods:

public LLParser( String sourceFile, boolean verbose ); public boolean parse(); public void dumpPStack();

bottom. So I decided to make my parser do the whole processes of top-down parsing, which

include computing first and follow sets and building a parsing table from the first and follow

sets. I thought that was the best way to escape from series of errors that human is prone to

make. Overall, I believed that I would make myself understand the materials more profoundly

and more proficiently.

As I expected, I came to deeply appreciate how LL(1) parsers work, how syntax check is

done in top-down method and how a parsing table is built with a given grammar, principles

and techniques for setting grammars. I am glad that I created a well-refined set of grammars

for me to use in the future. I expect that it will be extremely helpful for further work: building a

bottom-up -- LR(1) or SLR(1) -- parser.

8. Packaged File List

The package contains the following files:

source codes: LLParser.java

LLParsingTable.java

LLParsingStack.java

JGrammarDictionary.java

JScan.java

JSymbolTable.java

grammar file: minijava.txt

example sources: proj2_test1.txt

proj2_test2.txt

proj2_test3.txt

defected codes: proj2_bad1.txt

proj2_bad2.txt

results w/ stack: proj2_result_test1_CharmgilHong_20200636.txt

proj2_result_test2_CharmgilHong_20200636.txt

proj2_result_test3_CharmgilHong_20200636.txt

proj2_result_bad1_CharmgilHong_20200636.txt

proj2_result_bad2_CharmgilHong_20200636.txt

verbose sample: proj2_verbose_result_test1_CharmgilHong_20200636.txt

proj2_verbose_result_test2_CharmgilHong_20200636.txt

proj2_verbose_result_test3_CharmgilHong_20200636.txt

documentation: BNF_grammars_CharmgilHong_20200636.pdf

LLParser_UML_diagram_CharmgilHong_20200636.pdf

LLParsingTable_CharmgilHong_20200636.xls

manual_CharmgilHong_20200636.pdf

total 24 of them.

9. References

Textbook

Compiler Construction. Kenneth C. Louden. PWS. 1997.

Web Documents

Sun's Java Official Documentations - for detail and correctness regarding API, operators,

characters, variables and conventions.

http://java.sun.com/j2se/1.5.0/docs/api/

http://java.sun.com/docs/codeconv/html/CodeConventions.doc4.html

http://java.sun.com/docs/books/tutorial/java/nutsandbolts/variables.html

http://java.sun.com/docs/books/tutorial/java/nutsandbolts/operators.html

http://java.sun.com/docs/books/tutorial/java/data/characters.html

Regular Expression References

http://www.regular-expressions.info/reference.html

Tools

Visual Automata Simulator

http://www.cs.usfca.edu/~jbovet/vas.html

A. Appendix - minijava.txt

In this section, I attached a full list of grammars from minijava.txt in order to describe the

syntactic rules used in the source program. Before showing the grammar file, however, I

would like to note some key principles that were used when setting grammar rules:

1. Nonterminals are in CAPITAL LETTERS, and terminals are in lowercase letters.

2. Symbols that start with '#' indicate the token type.

3. The character, '=', was used to separate the left hand side and the right hand side of the

rules

4. To distinguish grammar transitions and equal signs, use ':=' for the equal signs.

5. To indicate an Epsilon transition, use '#e' for portability.

6. Do not leave blank lines between rules. (This will result in error while program is

running.)

Now, the following is the list of grammars.

SOURCE_CODE = IMPORT_STMT CLASS_DCLR

SOURCE_CODE = CLASS_DCLR

IMPORT_STMT = import #api_identifier * ; CLASS_DCLR = class #identifier { CLASS_DEF } CLASS_DEF = MAIN_METHOD CLASS_DEF = #e MAIN_METHOD = ACCESS STATIC TYPE main ( PARAM ) EXCEPTIONS { STMT } ACCESS = public ACCESS = private ACCESS = protected ACCESS = #e STATIC = static STATIC = #e PARAM = TYPE PARAMPARAM = #e PARAM = #identifier [ ] PARAM` = [ ] #identifier TYPE = int TYPE = void TYPE = String TYPE = BufferedReader

TEST = EXPR TEST_OP TEST_NORM

TEST_NORM = #identifier TEST_NORM = #number TEST_NORM = ( EXPR ) TEST_OP = < TEST_OP = > TEST_OP = <= TEST_OP = >= TEST_OP = == WHILE_STMT = while ( TEST ) STMT_SINGLE FOR_STMT = for ( ASSIGN_STMT ; TEST ; U_EXPR ) STMT_SINGLE U_EXPR = #identifier ++ U_EXPR = #identifier -- U_EXPR = ++ #identifier U_EXPR = -- #identifier


End of Document.

[ITP404/Compiler Theory]

Project III - manual.pdf

Author: Charmgil, Hong / 20200636 CSEE

0. Table of Contents

1. Introduction

2. Construction

3. Compilation

4. Execution

5. Running Results for given examples

6. Method List

7. Conclusion

8. Packaged File List

9. References

A. Appendix 1 - First and Follow Set of Grammar Symbols

B. Appendix 2 - minijava.txt

1. Introduction

This object module package includes a SLR(1) parser object that was designed for 'Mini

Java' Project, an instructional supplement for ITP404 - Compiler Theory class (winter

semester 2008-2009).

' Mini Java ' is a simplified set of Java Language built for educational purpose. It is based on

Java's character set, operators, naming rules, and grammars, and does not carry fully

implement class-related capabilities.

For the final term project, I designed a Simple LR(1) parser program , which reads an

external parsing table and error handlers to parse given source in bottom-up method. (For

further research purpose, I intentionally placed the parsing table and error handlers in this

document)

You will obtain a set of SLR(1) parser , scanner , and symbol table with this package.

2. Construction

The following shows how the constructor of the parser:

public SLRParser( String sourceFile, boolean verbose ) → source file location → verbose mode switch

By declaring the above line in your program, you will be able to adjust the parser program in

accordance with your own program. The verbose flag, the right parameter, is a switch to

display the parsing stack contents.

There is also a debug mode, which will let you inspect the program in detail. The flag is

given as boolean deb_mode a follow:

private final boolean deb_mode = true;

By turning the flag on, you will see current tokens and the top of the parsing stack, as well

[ Source Code ]

//level 2 test data import java.io.*; class Factors { public static void main (String[] args) { int count = 1, number; System.out.println ("Enter a positive number:"); number = 4; System.out.println ("The factors are:"); while (count <= number) { if (number%count == 0) System.out.println (count); count = count + 1; for (count = 0; count <6 ; count ++) number = number + count; if (count < number ) System.out.println(count); else System.out.println(number); } } // method main }

[ Result ]

Charmgil-MacBook:parser_SLR1 charmgil$ java SLRParser proj3_test2.txt syntax OK ( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj3_result_test2_CharmgilHong20200636.txt Example source code, proj3_test3.txt , and result from scanner:

[ Source Code ]

//level 3 test data: ok import java.io.*; class Factors { public static void main (String[] args) throws IOException { BufferedReader stdin = new BufferedReader (new InputStreamReader(System.in)); int count = 1, number; System.out.println ("Enter a positive number:"); number = Integer.parseInt (stdin.readLine()); System.out.println ("The factors of " + number + " are:"); while (count <= (number/2)) { if (number%count == 0) System.out.println (count); count = count + 1; } for (count = 0; count <6 ; count ++) number = number + count;

if (count < number ) System.out.println(count); else System.out.println(number); } // method main }

[ Result ]

Charmgil-MacBook:parser_SLR1 charmgil$ java SLRParser proj3_test3.txt syntax OK ( msg: parsing process is done successfully... ) Result in verbose mode is stored in proj3_result_test3_CharmgilHong20200636.txt The following source programs have errors in their codes. Example source code, proj3_bad1.txt , and result from scanner:

[ Source Code ]

//bad code with some illegal words class Factors { void main ( ) { int count = 1, 123 , number, for; println ("Enter a positive number:"); number = 4; println ("The factors are:"); whileA (count <= undeclared) { if (number%count == 0) println (count); count = count + 1; } } // method main }

[ Result ]

To detect illegal identifiers, I used the same method used in the previous assignment.

Since my scanner program is able to tell the parser what the current token type is, the

parser will not take any numerics or reserved words even while the grammar is waiting for

an identifier. To detect undeclared variable, I used hash table for symbol table.

However, to catch "whileA" was tricky. In all honesty, I was not able to catch it in the

manner I desired; I simply hard-coded a "whileA"-catching section. The implementation is

rudimentary at best but there was a very good reason for doing what I did. When I tried to

compile the code using java compiler, it failed to compile and printed out the following error

message:

Exception in thread "main" java.lang.Error: Unresolved compilation problems: The method whileA(boolean) is undefined for the type al Syntax error, insert ";" to complete Statement

It became evident that people at Java also do not have a solution for the same problem!

I was not able to catch this error procedurally. Since WhileA will be recognized as an

Charmgil-MacBook:parser_SLR1 charmgil$ java SLRParser proj3_bad2.txt proj3_bad2.txt:4: syntax error(e:006) $ - illegal syntax ^ proj3_bad2.txt:4: illegal identifier(e:001) 123number - invalid name for variable declarator ^ proj3_bad2.txt:6: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:7: illegal identifier(e:003) @ - unresolved token ^ proj3_bad2.txt:8: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:9: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:13: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:13: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:14: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:17: identifier is not declared(e:004) number - symbol needs to be declared ^ proj3_bad2.txt:19: syntax error(e:006) else - illegal syntax ^ proj3_bad2.txt:19: identifier is not declared(e:004) number - symbol needs to be declared ^ 12 errors ( msg: parsing process is done with some errors... )

Result in verbose mode is stored in proj3_result_bad2_CharmgilHong20200636.txt

6. Method List

Users can use the following public methods when using SLRParser object in the project.

Detailed descriptions and usages -- such as parameters and returns -- are given in comments

within the code.

SLRParser.java contains the following methods:

public SLRParser( String sourceFile, boolean verbose ); public boolean parse(); public void dumpPStack(); public static void main(String[] args);

SLRParsingTable.java contains the following methods:

public SLRParsingTable( boolean verbose ); public String getTransition( int state, int symbol ); public boolean isComplete( int state ); public boolean isError( int state ); public boolean isAccepted( int state ); public void dumpParsingTable();

SLRParsingUtils.java contains the following methods:

public SLRParserUtils( JGrammarDictionary gDic, boolean verbose ); public void automatedInit(); public void dumpFirstSet(); public void dumpFollowSet(); public void dumpNtListIndexTable(); public void dumpTListIndexTable(); public void dumpAll(); public char actionOf( String transition ); public int moveTo( String transition ); public Integer getIndex( String category, String key ); public void getEnter(); public boolean isNonterminal( String grammarSymbol ); public boolean isPattern( String grammarSymbol );

JGrammarDictionary.java contains the same methods from its previous releases.

JScan.java contains the same methods of its previous releases.

JSymbolTable.java contains the same methods from its previous releases.

7. Conclusion

While carrying out the assignment, I was able to gain a more profound understanding of

materials regarding bottom-up parsing mechanism and SLR(1) parser. I found the entire

procedure - considering use cases, visualizing them into DFA, building a parsing table from

the DFA, and finally creating a parser program from the parsing table- complex and precision-