Textantrieb | UText/1 | UText/1.2 Manual

Universal-Text Script

This page describes the Universal-Text Script as implemented in UText/1.2.

The script language provides access to the Universal-Text interpreter in order to read source UTL files, transform and generate text units and produce output files. The script can define some simple output tags and extend itself with custom functions, but it currently does not replace the need for Perl for more demanding functionality in tags and functions and for building custom add-in modules.

Interfaces

To access the script interpreter one can 1. embed script instructions in UTL, 2. use the UText Shell, or 3. use a Perl script.

Embedded Script

The module Script.pm declares a parser named script that can be called in order for UText script instructions to be executed once as they are read, being each instruction executed before reading the next one:

[* script
 [... some script instructions here ...]
]

A script can also be embedded in UTL through an output tag:

~p ""
this is a string
[script/]
 [... some script instructions here ...]
[/script]
""

See Tags for the details about the tag [script].

One difference between the tag and the parser is that a script inside a tag [script] is executed each time the string is evaluated (possibly more than once), while a script in [*script] is executed exactly once at read time.

The second difference lies in the handling of the script's output.

When a tag [script] is evaluated, the tag expression is replaced with its expanded value.

When a script is executed inside [*script], it returns a string that in turn gets parsed by the UTL parser. Thus it is possible to generate text units programmatically by using commands such as

out ~title My Amazing Article

that produce UTL expressions that are fed at read time. Each instruction of the script is executed in turn and the result captured. After executing all instructions in a [*script] block, a string is built concatenating the results of every instruction call. This string is then parsed by the UTL parser.

The UText Shell

The Perl program UText Shell is located at the main directory of the universaltext interpreter distribution, UText. It is a standalone batch processor and interactive shell for UText scripts with a command line user interface. It shows a prompt ut> to indicate that it is waiting for user input.

The examples here assume a Unix-like shell with an alias ut that points to

[... Perl library directory ...]/UText/utshell.pl

Example starting a session:

francesc@pc64:~$ ut
Universal-Text Interpreter v1.2 beta; +http://u-tx.net
(c) 2004-2010 Francesc Hervada-Sala
Welcome to the UText Shell. Enter '.' to quit, 'help' for usage.
ut>

After the user introduces a script instruction and hits ENTER, the interpreter executes it and prints out its result. If the instruction spans over more than one line, the shell asks you for the next line with the prompt ut...> until the end of the instruction is reached, then it executes it.

ut> read web
ut> select website
=cp ~website
ut> select website begin
ut...> v title
ut...> ln
ut...> v author
ut...> end
Computing Pages
Francesc Hervada-Sala
ut>

To abort a multiline instruction, enter a single dot. To quit the interpreter, one can enter a dot, too.

ut> select website begin
ut...> .
ut> .
francesc@pc64:~$

To quit the shell one can also use the control character ^D (Ctrl+D) as an end of file signal.

The shell receives the instructions through the standard input stream. It can be thus combined under Unix with other commands through a pipe or execute a script contained in a file:

ut < myscript.scr

The shell can also be instructed to read some files at startup:

ut book.utl

This will perform read book.utl and after that begin an interactive session. If you want no interactive session to take place, use the flag -b for batch:

ut book.utl -b

You can also load some modules at startup with the flag -m:

ut book.utl -m odt -b

There are some more parameters that can be seen invoking the shell with the command line option -h.

francesc@sil10 ~$ ut -h
Universal-Text Interpreter v1.2 beta; +http://u-tx.net
(c) 2004-2010 Francesc Hervada-Sala
Usage: ut file... [-m module]... [-b] [-r] [-v] [-h]
        file: one or more utl file to read
        -m one or more add-in module to load
        -b batch mode (no interactive session)
        -r show names of files when reading
        -v verbose log
        -h show these usage lines and quit
francesc@sil10 ~$

See UText Shell for more about the UText Shell.

Executing Scripts in Perl

Apart from embedding scripts in UTL or using the shell, you can also use a script interpreter object in your Perl scripts.

use UText::Script;
my $scr = new Script();
$scr->execute('load odt');

The script interpreter object executes the instructions immediately. The method execute returns the last result. Example:

$scr->execute('read geneaweb.utl');
print $scr->execute('select <family do ln v name');

this prints out:

Clark
Smith
Smith
Smithereen
Smithers

There is also the method evaluate that receives a string and some arguments and interpolates settings preceded with a percent sign. Example:

print $scr->evaluate('this is a %what from %whom', {what=>'web',whom=>'me'});

prints ”this is a web from me“.

See Script.pm for a reference of all Script methods.

The UText Script Language

All interfaces to the script interpreter can use the script language without any restrictions. Let us see now how the script language looks like.

Syntax

Each line of a script contains an instruction. A single instruction can span over multiple lines with begin ... end, and a single line can contain more than one instruction separating them with a colon : with both leading and trailing white space.

instruction1
instruction2 : instruction3
instruction4 begin
 part of instruction 4
 [...]
 part of instruction 4
end

All instructions have this structure:

<operation> <parameters> do <body>

Or for instructions that may span over more than one line:

<operation> <parameters> begin <body lines> end

The keyword do has the same meaning as the keyword pair begin ... end, but do terminates at the end of the current line (or at ”:“), while more than one line can be between a begin and an end.

If the parameters or body of an instruction contain the words begin or end, they must be escaped. Example:

feed begin
~p It is not easy to \begin with something.
end

For inline instructions (not between begin and end) the same applies to the words do and a standalone colon.

~feed ^ person \: string

If the line ends with \ the same instruction continues at the next line:

the instruction \
  continues here

Some functions accept item lists. A list is comma separated and can span over more than one line. Example:

for article in preface, first chapter,
  second chapter, third chapter,
  forth chapter do generate \%article

Operations

An operation is identified by a single word. A modifier can be given with a period. For example, read <file> reads a plain text file, read.odt <file> reads a word processor file in OpenDocument Format.

Parameters and body can have more than one word or be empty. For example, the function help of the interactive shell gets no parameters and has no body:

help

The function select has both of them, the parameter being a selector and the body being executed for each unit that the selector returns.

select article
select article do v title

Note that the structure of the script instructions, consisting of operation, modifier, parameters and body, corresponds exactly to the structure of the output tags.

Each output tag can act as a script instruction. For example, the tag [v title] can be called in a script with:

v title

And the tag [v/ title]default value[/v] can be expressed in a script as:

v title begin default value end

Apart from the output tags, there are some script functions (these can not be used as as tags). Some of them are predefined, any script or module can define new ones. Let us now see some commonly used operations.

The operation read reads a UTL file and feeds its contents into the repository.

read geneaweb.utl

The file geneaweb.utl, that is used also in the following examples, can be found in the samples directory of the distribution files.

The above can also be written without the file extension, because ”utl“ is the default one.

read geneaweb

This reads the file geneaweb.utl which is expected to be in the current working directory. If the file is in another directory, one can prepend it to the file name:

read ../samples/geneaweb

or change the current directory with cd:

cd ../samples
read geneaweb

The path naming conventions depend on the operating system the script is running on. The initial current working directory of a script is the directory of the file that contains it.

The operation select chooses some text units according to a given selector and applies a transformation to them. For example:

select family do ln v name

This returns the binary contents of the child unit with role name for each text unit that has the role family. The operation ln adds a new line after each name. Example from the UText Shell:

ut> select family do ln v name
Smith
Smithers
Smithereen
Clark
Smith

The script function out outputs the given string using the Perl function $ut->out to expand all tags. Example:

ut> select family do out no. [cnum]: Family "[v name]"[lf]
no. 1: Family "Smith"
no. 2: Family "Smithers"
no. 3: Family "Smithereen"
no. 4: Family "Clark"
no. 5: Family "Smith"

One can apply multiple operations to each unit, just use begin...end instead of do:

select family begin
  [... some instructions here ...]
end

The operation select can get any selector as parameter. See Text Selectors for all possibilites.

The simplest procedure to generate target files is to first read some source files, then traverse the source text with one or more select instructions and output it in the desired format. The result can then be stored into a file with the operation save. Example:

read geneaweb
select website.webpage begin
  save [u].html begin
 [... generate webpage here ...]
  end
end

A save operation without a body saves the results of the last command into the file. This is useful in interactive shell sessions.

ut> select family do ln v name
Smith
Smithers
Smithereen
Clark
Smith
ut> save families.txt

The file families.txt contains now these lines:

Smith
Smithers
Smithereen
Clark
Smith

For the interactive use at the shell the select operation can be called without a body. In this case it shows the names (if any), types, roles and a summary of binary data (if present) of the selected units.

ut> select family
=Smith ~family
=Smithers ~family
=Smithereen ~family
=Clark ~family
=Smith2 ~family

Select without parameters shows the current unit:

ut> select
^unit

In an interactive session, it can be also useful to see the contents of some unit expressed as UTL, this can be done with the operation utl:

ut> utl =Smith
=Smith ~family {
        ~name Smith
        =Mary ~parent :woman 
        =John ~parent :man 
}

To go to a particular text unit, use the operation cu (change unit):

ut> cu website
ut> select
=geneaweb ~website

For a full list of the predefined operations see Operation Index.

New functions can be defined by a custom Perl script or add-in module. See Functions.pm for more information.

Settings

UText Script has a general mechanism for defining named entities which receive a value that can change over time. It is called settings. There are some setting providers, each of which has a specific purpose. The predefined setting providers are:

A Perl script can add custom setting providers, see Settings.pm. To get all available setting providers, use show:

ut> show
Usage: 'show <provider>' lists all current settings by the provider.
Example: 'show tags' lists the current output bindings, 'show vars' lists the current variables.
Providers: argument, var, system, tag, function, module

Variables can be defined and used in a script this way:

ut> declare var my variable
ut> set my variable to hello
ut> out my variable says %{my variable}
my variable says hello

If the variable name is just one word, curly brackets are not needed:

declare var a
out my variable contains %a

To set an output tag, one proceeds similarly. Sample session:

ut> declare tag name
ut> set name to Jane
ut> out Hello, [name]
Hello, Jane
ut> set name to John
ut> out Hello, [name]
Hello, John
ut>

See Output Processors for more information about defining tags.

To see what tags are currently defined and by which modules, use show tags.

ut> show tags
.BASE: .POST .PRE bind cnum dump feed foreach if [...]
Script: get script
main: help

The function show can list all entities defined currently by a specific settings provider: show vars shows variables, show functions shows the UText script functions, and show modules the currently loaded add-in modules.

The setting provider argument supports arguments for tags and functions that are expanded at run time. They are not explicitly declared and assigned by a script, but implicitly. For example, inside each tag and function the argument %op is set to the operation name, %mod to its modifier, %param to the parameters and %str to the body. Sample:

ut> declare tag x to op:\%op, mod:\%mod
ut> x
op:x, mod:
ut> x.doc
op:x, mod:doc
ut> set x to (\%param) "\%str"
ut> out [x/ alfa]some words[/x]
(alfa) "some words"

Note that the percent sign is escaped with \ when defining the tag above, because otherwise it would be expanded by the interpreter when setting the tag and not when expanding it.

Finally the system setting provider is used to define things such as the output language or some debug options. For example:

set lang to de
set debug to files

Above, the current language is set to German (this is used by some tags such as d to output a date) and the debug level is set to files, which means that the names of the files is printed out at the console as they are being read.

One can use get to see the current value of any setting:

ut> get lang
de
ut> get preprocess out
0

For a complete description of the predefined setting providers and of the functions to create new ones in a custom add-in module see Settings.pm.

Config File Format

Values can be assigned to settings in a common config file format with the parser [*settings]. For example:

[*settings
      lang = de
      debug = files
      preprocess out = 0
]

The provider name can be omitted, if unambiguous. To avoid repeating parts of the setting names one can put them in sections under square brackets. For example, instead of writing:

[*settings
      navigator item page selector = category
      navigator item selector = entry
      navigator selector = menu
      navigator title field = caption
]

one can write:

[*settings
      [navigator item]
      page selector = category
      selector = entry
      [navigator]
      selector = menu
      title field = caption
]

This section groups are not registered elsewhere, they can be used at will. When the parser finds a new section, it appends simply its description at the beginning of every following variable name until a new section is found.