Codebase list golang-github-pointlander-peg / fresh-snapshots/main

Tree @fresh-snapshots/main (Download .tar.gz)

# PEG, an Implementation of a Packrat Parsing Expression Grammar in Go

[![Go Report Card](](

A [Parsing Expression Grammar]( ( hence `peg`) is a way to create grammars similar in principle to [regular expressions]( but which allow better code integration. Specifically, `peg` is an implementation of the [Packrat]( parser generator originally implemented as [peg/leg]( by [Ian Piumarta]( in C. A Packrat parser is a "descent recursive parser" capable of backtracking and negative look-ahead assertions which are problematic for regular expression engines . 

## See Also

* <>
* <>
* <>

## Installing

`go get -u`

## Building

### Using Pre-Generated Files

`go install`

### Generating Files Yourself
You should only need to do this if you are contributing to the library, or if something gets messed up.

`go run build.go` or `go generate`

With tests:

`go run build.go test`

## Usage

peg [<option>]... <file>

Usage of peg:
      parse rule inlining
      disable AST
  -output string
      specify name of output file
      directly dump the syntax tree
      treat compiler warnings as errors
      replace if-else if-else like blocks with switch blocks
      print out the syntax tree
      print the version and exit

## Sample Makefile

This sample `Makefile` will convert any file ending with `.peg` into a `.go` file with the same name. Adjust as needed.

.SUFFIXES: .peg .go

	peg -noast -switch -inline -strict -output $@ $<

all: grammar.go

Use caution when picking your names to avoid overwriting existing `.go` files. Since only one PEG grammar is allowed per Go package (currently) the use of the name `grammar.peg` is suggested as a convention:


## PEG File Syntax

First declare the package name and any import(s) required:

package <package name>

import <import name>

Then declare the parser:

type <parser name> Peg {
	<parser state variables>

Next declare the rules. Note that the main rules are described below but are based on the [peg/leg rules]( which provide additional documentation.

The first rule is the entry point into the parser:

<rule name> <- <rule body>

The first rule should probably end with `!.` to indicate no more input follows. 

first <- . !.

This is often set to `END` to make PEG rules more readable:

END <- !.

`.` means any character matches. For zero or more character matches, use:

repetition <- .*

For one or more character matches, use:

oneOrMore <- .+

For an optional character match, use:

optional <- .?

If specific characters are to be matched, use single quotes:

specific <- 'a'* 'bc'+ 'de'?

This will match the string `"aaabcbcde"`.

For choosing between different inputs, use alternates:

prioritized <- 'a' 'a'* / 'bc'+ / 'de'?

This will match `"aaaa"` or `"bcbc"` or `"de"` or `""`. The matches are attempted in order.

If the characters are case insensitive, use double quotes:

insensitive <- "abc"

This will match `"abc"` or `"Abc"` or `"ABc"` and so on.

For matching a set of characters, use a character class:

class <- [a-z]

This will match `"a"` or `"b"` or all the way to `"z"`.

For an inverse character class, start with a caret:

inverse <- [^a-z]

This will match anything but `"a"` or `"b"` or all the way to `"z"`.

If the character class is case insensitive, use double brackets:

insensitive <- [[A-Z]]

(Note that this is not available in regular expression syntax.)

Use parentheses for grouping:

grouping <- (rule1 / rule2) rule3

For looking ahead a match (predicate), use:

lookAhead <- &rule1 rule2

For inverse look ahead, use:

inverse <- !rule1 rule2

Use curly braces for Go code:

gocode <- { fmt.Println("hello world") }

For string captures, use less than and greater than:

capture <- <'capture'> { fmt.Println(text) }

Will print out `"capture"`. The captured string is stored in `buffer[begin:end]`.

## Testing Complex Grammars

Testing a grammar usually requires more than the average unit testing with multiple inputs and outputs. Grammars are also usually not for just one language implementation. Consider maintaining a list of inputs with expected outputs in a structured file format such as JSON or YAML and parsing it for testing or using one of the available options for Go such as Rob Muhlestein's [`tinout`]( package.

## Files

* `bootstrap/main.go` - bootstrap syntax tree of peg
* `tree/peg.go` - syntax tree and code generator
* `peg.peg` - peg in its own language

## Author

Andrew Snodgrass

## Projects That Use `peg`

Here are some projects that use `peg` to provide further examples of PEG grammars:

* <> -  natural date/time parsing
* <> - easy date/time formats with duration spans
* <> - scientific names parsing