Textantrieb | UText/1 | UText/1.2 Manual

Output Processors

Expanding output tags

The Output Function

A string can contain embedded tags, which are marks in square brackets. The function out expands the tags according to their respective binding.

Let us begin with an example.

~webpage =index
 ~title My amazing Website
 ~content
  ~h1 [v title]

If you execute this in a UText script:

out [v h1]

or this in Perl:

print $ut->out($ut->get_field('h1'));

you will get ”My amazing Website“, which is the binary data of the unit that plays the role title.

To generate an HTML page one can write in a UText script something like this:

out begin
<html>
<title>[v title]</title>
<body>
 <h1>[v content.h1]</h1>
[...]
end

Now let us see the output tags in detail.

Tag Syntax

Overview

An output tag is enclosed in balanced square brackets. For example the string

my own [tag] is great

contains a tag named ”tag“. The output processor will expand the tag at evaluation time and let the rest of the string unchanged.

Tag names are case sensitive, being [x] and [X] two different tag names that can be bound to separate output processors.

A tag can receive a parameter. The string

my own [tag by me] is great

will be expanded into ”my own [...] is great“ and the bound output processor will receive the tag ”tag“ with the parameter string ”by me“.

A tag can span over a substring as in:

my own tag [tag/](I did it myself)[/tag] is great

This will be expanded as ”my own tag [...] is great“. The bound output processor will receive the content string ”(I did it myself)“. The opening and closing clause of a tag can be in separate lines.

A tag can accept both a parameter and a content string at the same time:

my own tag [tag/ by me](I did it myself)[/tag] is great

A tag can also be affected by a modifier. A string such as

my own tag [tag.small]

causes the output processor to be called with the tag ”tag“ and the modifer ”small“.

Tags can be embedded in other tags, both as parameter and as contents.

[cite/]
 [author/]Immanuel Kant[/author]:
 [work/]Kritique Of Pure Reason[/work]
 Our age is, in especial degree, the age of criticism,
 and to criticism everything must submit.
[/cite]

The same content could alternatively be expressed as:

[cite/
 [author/]Immanuel Kant[/author]:
 [work/]Kritique Of Pure Reason[/work]
 ]
 Our age is, in especial degree, the age of criticism,
 and to criticism everything must submit.
[/cite]

The signs [ and ] are interpreted always as tag markers. If you want to obtain square brackets at your document, you use the tag [sb]. For example if you want to output [not a tag] you write [sb not a tag]. You get an opening bracket [ with [sb.o] and a closing bracket ] with [sb.c].

Specification

The general syntax of an output tag, depending on if it is a standalone or a tag with content, is this:

[name.modifier parameters]
[name.modifier/ parameters]content[/name]

name — A tag name consists of a string of digits and/or letters, no whitespace and no other signs.

modifier — A modifier is a string of digits and/or letters, no whitespace but possibly other signs (including period), which are passed through to the output processor.

parameters and content — These are arbitrary strings, possibly including whitespace and embedded tags. The content can span over multiple lines.

At evaluation time the tag syntax is checked and the program aborts if it is not correct. The program aborts also if at evaluation time there is no bound processor to handle a found tag. The tags must be balanced, the last opened tag must be the first one to be closed. Square brackets can only occur as tag delimiters, use [sb] to output them literally.

Tag Semantics

The output tags define segments in a string to be expanded. The content of the tag expansion is defined by the bound output processor, allowing the same source string to produce different output in script commands depending on the binding context.

The tags are parsed from left to right, the bound processor is called for the leftmost tag until the very end of the tag and including all embeded tags and the results appended to the output string.

When evaluating a tag with embedded tags inside, the inner tags are not necessarly evaluated. This is up to the particular output processor. This grants complete flexibility in the evaluation and allows an output processor to treat parameters and contents as literal strings, even if they contain substrings that could be interpreted as output tags.

Output Processing

Some tags with basic functionality are bound by the base system through modules as Tags.pm or Script.pm, others are provided by add-in modules such as cms. You can add your own tags with a script or even override the predefined ones. A custom Perl script or add-in module can define tags, too.

Let us see how to set output processors using UText Script. Each tag must initially be declared.

declare tag <name>

To define how a tag will expand, one uses:

set tag <name> to <value>

or

set tag <name> do <script instruction>

or

set tag <name> begin <multiline script instructions> end

The keyword tag can be missing in the set instruction if currently there is no other setting with the same name.

A declare and a set can be joined in a single instruction:

declare tag <name> to <value>

If a value is given, the tag expands to this value. If there are tags inside, they are expanded, too.

If some script instructions are given, these are executed by the UText Script interpreter, and the tag expands as their concatenated output.

Note that when binding values or instructions to a tag with UText Script the set command (as any other script command) is expanded before being executed and thus any tags, settings or functions that should expand not at binding time, but at evaluation time, must be escaped, either with a backslash for settings such as in \%param or with additional square brackets such as in [sb v name].

The arguments available for tag expansion are the following:

The tags bound with declare are set by default under the module ”main“. If you want to collect them under a specific module name, you can set the variable current module before declaring the tag:

set current module to HTML
declare tag b to <b>\%str</b>
declare tag i to <i>\%str</i>

This way you can remove all related bindings at once:

unbind HTML

A Perl Processor Function

An output processor can also be defined in Perl with a function that looks like that:

sub <function name>
{
 my ($self,$all,$op,$mod,$param,$str) = @_;
 my $results;
 [...]
 return $results;
}

The output processor is called each time a tag must be evaluated. The parameters of the call have these meanings:

$op - operation

This is the tag name. For example for [mytag] it has the value "mytag".

$mod - the tag modifier

For example on [mytag.a] it is "a". If no modifier is present, its value is an empty string.

$param - the tag parameters

The string contained in the tag begin mark between the tag name and the closing square bracket. Example: ”alfa“ when expanding [tag alfa] or even [tug/ beta][fox][/tug] if the string being processed looks like [tag/ [tug/ beta][fox][/tug]]something[/tag].

If $param admits tagged contents that are to be expanded, the processor function should do:

$param = $self->out($param) if $param;

Otherwise the tags contained in $param will remain unexpanded.

$str - the content enclosed between opening and closing tag

For example ”hello“ when processing [tag/]hello[/tag].

If $str admits tagged contents that are to be expanded, the processor function should do:

$str = $self->out($str) if $str;

$self - the current UText object

This object can be used to get context-sensitive information to the current position and to trigger the evaluation of the parameter and/oder the content string.

$all the whole string being expanded including tags

This contains the whole tag with parameters and contents.

Binding the processor

In Perl one can bind an output processor to a tag with this method from the UText object:

$ut->set_binding(<tag name>,<processor function>)

For example after $ut->set_binding('n',\&out_name); all embedded tags [n name] or [n/ name]contents[/name] will be resolved by out_name().

There are two pseudotags named ”.PRE“ and ”.POST“ whose output processors are called respectively before and after each string is output.

To unset the bindings there are two functions:

$ut->remove_binding(<tag name>)
$ut->remove_bindings()

The first one deletes the binding for a particular tag, the second one deletes all bindings of the current module.

The bindings are set and removed for the current module, that is the Perl package where these functions are called from.

To operate on bindings for a specific module there are the follwing functions available:

$ut->set_out_binding(<module name>,<tag name>,<processor function>)
$ut->remove_out_binding(<module name>,<tag name>)
$ut->remove_out_bindings(<module name>)

Sample: Wikipedia References

Let us see an example. Suppose we want to define a tag [wp] to refer to Wikipedia articles. We can use the following UText Script instruction:

declare tag wp to                         \
  [sb url/ en.wikipedia.org/wiki/\%param  \
  \%param on Wikipedia]\%str              \
  [sb z/ \%str]\%param[sb /z][sb /url]

The escape characters \ before % and the square bracket tags [sb] above are required in order for the tags and parameters to be evaluated not when declaring the tag, but when expanding it. The tag gets expanded as (without line breaks):

  [url/ en.wikipedia.org/wiki/%param  %param on Wikipedia]
   %str[z/ %str]%param[/z]
  [/url]

The semantic of the tag above is the following:

For example, [wp Computer] expands to:

[url/ en.wikipedia.org/wiki/Computer]Computer[/url]

and [wp/ Computer]Computation[/wp] expands to:

[url/ en.wikipedia.org/wiki/Computer]Computation[/url]

We could implement the same tag in Perl. In this case we would write a function say out_wikipedia that returns a string with a tag url:

sub out_wikipedia
{
my ($self,$all,$op,$mod,$param,$str) = @_;
$str = $self->out($str) if $str;
$param = $self->out($param) if $param;
my $caption = $str || $param;
return $self->out("[url/ en.wikipedia.org/wiki/$param]${caption}[/url]");
}

We would then bind the function to our UText object before the website generation begins:

$ut->set_binding('wp',\&out_wikipedia);

Note that this tag does not necessarily set an HTML hyperlink to the Wikipedia website. In order to generate web pages, you will bind the tags from the cms add-in module, so that url expands as an <a href=...> element. But you can bind url to other values to get a LaTeX or some other kind of cross-reference formatting from a single source.

Tag List

For a complete list of the tags provided by the Universal-Text Interpreter see the predefined tag list.