Go and Parsing
Go is one of the best languages to write a parser and tools that need some form of parsing in. This is mainly due to:
- Great string and regexp functions in the stdlib for parsing
- Easy and safe introspection for blank interfaces (Go's equivalent of a QVariant or a void pointer)
- Labels. You have both gotos and the ability to break and continue deeply nested loops, which is great for handwritten parsers.
- Fast compilation makes for fast iteration.
Screw XML
XML is unwieldy to write and obtuse to read. Unfortunately, things like Wayland use it for protocol descriptions. Fortunately, Go can be used to author tools that generate XML from a more human-readable format.
Introducing the Participle
Participle is a Go library that makes writing and parsing data into ASTs extremely easy. I'll demonstrate a simple usage of it for authoring a better Wayland protocol syntax that can transpile to XML.
One: Designing a syntax
This is mostly up to your opinion: I like the aesthetic of Go, so I went with a very Go-like aesthetic:
protocol appmenu
interface zxdg_appmenu_v1 {
version 1
request set_address(service_name string, object_path string)
}
Simple, yet descriptive.
Two: Building trees
Participle by default uses the tokens that form the Go language itself, which is important to know. A grammar has to play by Go rules if you stick with the default tokens.
Let's start by defining a simple protocol struct:
type Protocol struct {
}
It's empty, which isn't very useful. Let's give it a name element since we want to be able to name our protocol.
type Protocol struct {
Name string
}
This looks like a nice start to our tree, but how does the parsing work? We add some metadata.
type Protocol struct {
Name string `"protocol" @Ident`
}
This will tell Participle two things:
- It should look for the string
protocol
in our protocol grammar - It should grab the next Identifier token and put it into the field
Now, we probably want to add a hook for an interface, as a protocol without interfaces is useless. Let's write that in:
type Protocol struct {
Name string `"protocol" @Ident`
Interfaces []Interface `{ @@ }`
}
The { @@ }
will instruct the parser to capture as many interfaces as it can and stuff them into the array.
Now let's write a description for what we want an interface to look like, starting with a name.
type Interface struct {
Name string `"interface" @Ident "{"`
// Put the goodies here!
Ending struct{} `"}"`
}
The purpose of the Ending field is to make sure that our interfaces end with a closing bracket.
An interface is composed of requests. Let's take a closer look at what our design looked like:
request set_address(service_name string, object_path string)
^
|
| always "request"
request set_address(service_name string, object_path string)
^
|
| Must be a valid identifier
request set_address(service_name string, object_path string)
^^^^^^^^^^^^^^^^^^^
|
| One unit with two parts: identifier and type
request set_address(service_name string, object_path string)
^ ^
| |
| these surround our arguments |
request set_address(service_name string, object_path string)
^
|
this separates our arguments |
Describing this will roughly look like this:
"request" @Ident "(" argument, argument ")"
Let's put that into a struct:
type Request struct {
Name string `"request" @Ident "("`
Arguments []Argument `{ @@ [","] } ")"`
}
{ @@ [","] }
is a fancy way of making the Arguments field say “capture as many of me as possible, and we might have a comma separating us.”
Now let's write an Argument struct.
type Argument struct {
Name string `@Ident`
Type string `@Ident`
}
Since this is basically just a tuple of identifiers, that's exactly what we made this struct.
Because an interface can have multiple requests, we add the following field to our Interface struct:
Requests []Request `{ @@ }
Like above, { @@ }
will try and capture as many Requests as possible.
Put together, all our structs look like this:
type Interface struct {
Name string `"interface" @Ident "{"`
Requests []Request `{ @@ }`
Ending struct{} `"}"`
}
type Protocol struct {
Name string `"protocol" @Ident`
}
type Request struct {
Name string `"request" @Ident "("`
Arguments []Argument `{ @@ [","] } ")"`
}
type Argument struct {
Name string `@Ident`
Type string `@Ident`
}
Three: parsing trees
Now that we have our AST designed, let's hook it up to Particple.
parser := participle.MustBuild(&ProtocolDescription{})
protocol := Protocol{}
parser.Parse(os.Stdin, &protocol)
That's easy, eh? Since building XML output is fairly straightforward (just build structs corresponding to the XML output and marshal the AST into them and marshal the structs into XML), I won't be covering that here.
From Here
Some links you may find useful:
blankInterface: A more complete Wayland protocol parser and XML generator.
Participle: The parser library used.
encoding/xml: XML library in Go's stdlib.
Tags: #libre