Textantrieb | UText/1 | UText/1.2 Manual

Alternative Parsers

The language UTL is a general-purpose way of coding the Universal-Text, but there are many other possible formats and languages, that are frequently fit for a particular purpose. The Universal-Text Interpreter provides a way to use them inside a UTL expression or input file. With an alternative parser you can express the text in an arbitrary language or format and it gets parsed as Universal-Text, thus integrating completely with the rest of the text and being susceptible to be navigated and queried.

Calling a Parser by Name

Inside a UTL expression a parser is invoked with a line containing an opening square bracket, an asterisk and the parser name. All lines until the next line containing just a closing square bracket are parsed by this parser. Example calling the parser named ”my-format“:

[* my-format
    ... some contents with a custom format here ...
]

Calling a Type Parser

You can parse a unit with an alternative parser defined for its type. To parse a particular unit with its implicit parser one puts square brackets instead of curly brackets around its children.

For example if you have this text:

~webpage =index Overview {
    ~content
    ~h1 My Site
    Welcome to my site!
    This site is under construction.
}

Supposing there is a parser defined for the type "webpage" you could put this:

~webpage =index Overview [
    ... some contents with an alternative language or format here ...
]

The lines between [ and ] are not UTL but an alternative encoding that the parser recognises.

Embedded Parser Calls

An explicit parser block can be embedded inside another parser block. Example:

[*script
  ... script instructions here...
  [*settings 
    ... settings here...
  ]
  ... script instructions here...
]

The embedded parser is recognized by the interpreter and called by it, not by the outer parser.

If this notation conflicts with the syntax of a particular parser, this parser can disable embedded blocks by setting the property EMBEDDED_PARSERS to 0:

$self->{EMBEDDED_PARSERS} = 0

Implementing a Parser

To implement a parser, one writes a Perl function that parses the format, say parseWebpage, and then binds it to the type as parser:

^webpage {
    ~parser main::parseWebpage
    ^title : string
    ^content {
        ^p : string
        ^h1 : string
    }
}

An explicit parser call with [* can invoke any parser, whatever type it is bound to.

When the interpreter comes to the text in square brackets, it calls the function parseWebpage, in order to get it parse the lines. A parser has this form:

sub my_parser
{
my ($ut,$uid) = @_;
[...]
 my $lin = $ut->readline;
[...]
}

The parser receives a UText object for feeding text. It gets also the unit Id of the parser being called, this is useful to implement a family of parsers with a single Perl function.

The class UText exposes the function readline for parsers to get the next line to be parsed from the source file.

$ut->readline

It returns a string with the next line or undefined if the end of the region to be parsed is reached. The parser does not see the line marked ] that closes the parse region.

Note that the returned string can be an empty string if there is an empty line to be parsed. Thus when checking for the end of the input lines one cannot test for if(!$ut->readline) but for if(undefined($ut->readline)) instead.

Back to our example. If we want to enter some websites in this format:

first line containing the header
second line containing the first paragraph
new paragraph
etc.

Our parser could look like this:

sub parseWebpage
{
my $ut=shift;
my $n=0;
$ut->enter();
$ut->set({role=>'content'});
$ut->enter();
while(defined(my $lin=$ut->readline)) {
    my $role = $n++==0 ? 'h1' : 'p';
    $ut->set({role=>$role,bin=>$lin});
}
$ut->leave();
$ut->leave();
}

The function does this:

Now we can enter our webpages with this code:

~webpage =index Overview [
My Site
Welcome to my site!
This site is under construction.
]
~webpage =contact Contact [
Contact Me
You can write to me at me@myweb.org
You can also contact me with the form below.
]

The generated text looks like this:

=index ~webpage {
        ~title Overview
        ~content {
                ~h1 My Site
                ~p Welcome to my site!
                ~p This site is under construction.
        }
}
=contact ~webpage {
        ~title Contact
        ~content {
                ~h1 Contact Me
                ~p You can write to me at me@myweb.org
                ~p You can also contact me with the form below.
       }
}

There is a file ”parser.pl“ containing this example at the distribution files under the directory samples.

Parsers provided with UText

UText ships with the following alternative parsers:

The *script parser that interprets UText Script at read time.

The *settings parser that reads a format similar to common configuration files.

A type-bound parser to enter tables one row at a line, provided by the Types add-in module.