Lexical Analysis
Lexical Analysis

Lexical Analysis: Terminology
_ This information is given using an attribute
Overview
_ Main task: to read input characters and group them into
“tokens.”
_ Secondary tasks:
_ Skip comments and whitespace;
_ Correlate error messages with source program (e.g., line number of error).


Lexical Analysis: Terminology
_ token: a name for a set of input strings with related
structure.
Example: “identifier,” “integer constant”
_ pattern: a rule describing the set of strings
associated with a token.
Example: “a letter followed by zero or more letters, digits, or
underscores.”
_ lexeme: the actual input string that matches a
pattern.
Example: count
Examples
Input: count = 123
Tokens:
identifier : Rule: “letter followed by …”
Lexeme: count
assg_op : Rule: =
Lexeme: =
integer_const : Rule: “digit followed by …”
Lexeme: 123
Attributes for Tokens
_ If more than one lexeme can match the pattern for a
token, the scanner must indicate the actual lexeme
that matched.
_ This information is given using an attribute
associated with the token.