Practical Parsing: A Linguistic Approach - CSCI 225

Parsing Basics

What is parsing?
According to Dick Grune and Ceriel J. H. Jacobs, "parsing is the process of structuring a linear representation in accordance with a given grammar." This seems somewhat abstract, but it is intended to be general and encompassing. The "linear representation" can be anything from a sentence to a line of digits in a file, from a musical phrase to a knitting pattern broken into a specific stitch sequence. (I bet you didn't think your grandma knew how to parse!)

Minksy, Hickey and Madhavapeddy describe parsing in this way: "Many programming tasks start with the interpretation of some form of structured textual data. Parsing is the process of converting such data into data structures that are easy to program against." One long string of data is difficult to manipulate, search through, organize, et cetera. When we parse that data into tokens, we can see each item as a specific piece of data. Those specific tokens can be manipulated as needed at that point.

Lexing and Parsing
Lexical analysis is " a kind of simplified parsing phase that converts a stream of characters into a stream of logical tokens." (Minsky). Lexing is part of parsing. It is the step where long inputs are broken up into smaller pieces.

Parsing is "converting a stream of tokens into the final representation." (Minsky). That conversion could be anything from organizing the data into a search tree, hiding prohibited words in a chat setting, encrypting a credit card number so it cannot be easily hacked, compiling high level language code into machine language, or even knowing to knit one and purl two.

It sometimes gets a little confusing because parsing describes both the entire process (breaking down the data, organizing, compiling and outputting the data) as well as the specific step in the process of the organizing or converting the data.

How does a parser work?
There are four main parts of a parser. Each of these parts may have many smaller parts working within it. Furthermore, there is a large variety of possible ways each step can be executed. This is a very general process of parsing.

1. Input the source string. This can be a file to be analyzed, keyboard input, or even sheet music in front of a musician.

2. Perform lexical analysis to create tokens. The larger source string is broken down into smaller tokens which can be manipulated and processed. This could be turning a sentence into individual words, a large block of numbers into smaller groups or individual digits, or turning an entire song into measures or individual notes.

3. Analyze and manipulate tokens. There are many, many ways to analyze and manipulate the tokens. This can also be done multiple times for the same token. For example, a sentence could be broken down into words, then the individual words converted to all capitals, then searched against a dictionary of prohibited word, then censored if necessary, and so on. A few other possibilities are creating search trees for data, encrypting data, or transposing notes into another key in music.

4. Output final structure. Once all of the analysis and manipulation is done for the entire string that is to be parsed, output the results. That could be displaying on the screen, writing to a file, or playing the music.

Chryssy JoskiandJack Ward

Parsing Basics

Chryssy Joski
and
Jack Ward