Textantrieb | UText/1 | UText/1.2 Manual

UText.pm

This module contains the kernel of the interpreter. It defines the class UText, which is the main class that is instanciated by the Perl scripts. It provides support for parsing UTL strings and files and for programmatically writing into the text repository. It also supports add-in modules and hooks.

Object management

new

$ut = new UText; is the standard UText constructor.

Can be called as $ut = new UText($debug);

If $debug == 1 then filesOn is activated.

If $debug == 2 then sayOn is activated.

clone

$ut2 = clone $ut is a constructor that clones the UText object $ut and creates a UText object $ut2 with this internal state:

abort

If a script or an add-in module finds an error, it should call this function to abort execution.

$ut->abort("File not found: $file_in");

If autodump is active, before aborting, the complete current text is saved in UTL format under the file out.utl and the memory contents are dumped to the file out.dump, both files on the current work directory.

Then the add-in hook aborting is triggered, in order for each add-in to react to the abort.

At last the given message is output to the standard error stream with additional information about the current context and the execution of the script stops.

Write Methods

These are the main methods to create a text. These methods are used explicitally normally only in utext parsers.

set

Called as $ut->set({role=>...}). Hash Parameters:

bin is a scalar or a reference to a scalar (auto recognised)

ref references another unit:

A call to set() does the following: If unit already exists, selects it; if not, creates it. It returns $id of the set unit, which becomes the current one.

def

Called as $ut->def({def=>...}). Hash Parameters:

If no type is given, it gets the same as def.

This method defines a new text unit.

enter

With $ut->enter() the current unit gets parent for following set() instructions. There is no current unit more. Children appended after last child. If entered unit has ”loadfile“ files, they are read before returning.

leave

$ut->leave() returns to state prior to last enter().

parse

Invoke a parser with $ut->parse('parser-name');. The following source lines until ] will be parsed by it instead of the UTL parser.

A call $ut->parse() invokes the default parser for the current unit's type. It fails if there is none. To declare a default parser for a unit one adds a child ~parser <Perl sub name>.

A parser is called with a parameter being the current UText object and as second parameter the unit Id of the parser being called. The lines to be parsed are read with readline. See Alternative Parsers for more information.

readline

This function is for parsers to get the next line to parse. It returns a string with the source line or undefined when there are no more. The lines do not have a newline char \n at the end. The line ending the parse region ] is not being returned by this function. The parser can left lines unread.

transform

A call $ut->transform(tform(1),tform(2),...,tform(N),unit) performs the following steps:

UTL Parsing

readfile

$ut->readfile($filename) creates a text reading the given file and parsing it. Example:

$ut = new UText;
$ut->readfile('geneaweb.utl');

It admits more than one file name and relative or absolute directory names:

$ut->readfile('geneaweb.utl','families/smiths.utl');

When reading a file, the current working directory is set to the directory where the file is placed.

readfiles

$ut->readfiles($filepattern) creates a text reading all files that match the pattern and parsing them. Example:

$ut = new UText;
$ut->readfiles('*.utl');

The files are read in alphabetical order by their name. File names ending with ~ or .bak are by default ignored (see SKIPFILE).

getfile

$ut->getfile($type,$utlpattern,@files) is an extended readfile.

This function requires the Script module to be already loaded. If it is not available, getfile fails.

files

One can read one file or more than one with a single call. Each file name can contain wildcards * or ? which are expanded by the OS, for example *.utl. When using wildcards, the files are read in alphabetical order by their name. File names ending with ~ or .bak are by default ignored (see SKIPFILE).

UTL pattern

It is possible to embed the file in a UTL expression that defines the context.

Example:

$ut->getfile('',<<END,'smith.utl');
=geneaweb
=smith ~family {
%content
}
END

The parser feeds the file contents as the children units of =smith. The following arguments are available:

Type

If the type is ”utl“ or empty, getfile expects a plain text file. If the add-in module odt is loaded, it is possible to read word processor documents in OpenDocument Format with type ”odt“.

In order for an add-in module to support reading other file types, it just needs to catch the hook getfile.

Add-In Modules Support

Functions to use existing modules and activate hooks.

load

UText::load(<module 1>, <module 2>, ...);

Loads the given modules. If they were already load, it does nothing.

is_load

$ut->is_load(<module>);

Returns wether the given module was loaded.

call_modules

$ut->call_modules(<function>, <parameter1>, <parameter 2>, ...);

Calls the function with the given name in all loaded add-in modules that have it. The modules are called in the order they were loaded. If there are no modules with this function, nothing happens.

The function is called with the UText object $ut as the first parameter and the given parameter 1 etc. as the next ones.

The function returns a list of all the return values, one scalar value for each module that was called and returned a defined value.

can_module

$ut->can_module(<module>, <function>);

Returns whether a module supports a function.

may_call_module

$ut->may_call_module(<module>, <function>, <parameter 1>, <parameter 2>, ...);

If the module supports the function, it is called (see call_module below). If the module does not support it, it does nothing.

call_module

$ut->call_module(<module>, <function>, <parameter 1>, <parameter 2>, ...);

Calls the function with the given name at the given module.

The function is called with the UText object $ut as the first parameter and the given parameter 1 etc. as the next ones.

The function returns the scalar value returned by the function call.

call_module fails if the module is not loaded or if it does not support the called function.

get_modules

UText::get_modules();

Returns a list of loaded modules.

UText::print_modules();

Prints the loaded modules names at the standard output stream.

Add-In Hooks

The following hooks are set by this module (s. Add-In Hooks for details):

Script Interpreter

get_script

$scr = $ut->get_script();

Returns the script interpreter object (class Script) that is currently bound to the UText object.

If no script interpreter is currently bound, a new one is instantiated.

Properties

INDEXFILE

You can set with $UText::INDEXFILE=$filename the name of the root UTL file to be read at startup. By default it is root.utl.

This file gets read when the first UText object is instanciated. If the variable is empty or no such a file exists, it is ignored. The file must be at the same directory where the interpreter module UText.pm is located or at the current working directory. They are both read in that order, if both exist.

For the usage of the root files see Feeding Text.

SKIPFILE

$UText::SKIPFILE contains a Perl regular expression. Files with matching names are ignored when reading files with wildcard patterns. By default files ending with ~ or .bak are ignored.

For example, if you set:

$UText::SKIPFILE='(~|\.bak|\.old)$';

From now on all read * calls will not read files ending with ~, .bak or .old. Autodump files are also automatically excluded from being read through patterns.

GENERATOR

$UText::GENERATOR returns a string identifying the current version of the interpreter.

Debug Support

autodump

With autodump active, if the UText interpreter aborts execution because of an error, it dumps out the repository contents before exiting.

Autodump is by default active, except in an interactive shell session or a script object instantiated with new Script. To deactivate it in Perl, set $ut->{autodump}=0.

The names of the autodump files can be set through these variables:

$UText::AutodumpUtlFilename = 'somename.utl';
$UText::AutodumpListFilename = 'somename.txt';

The default names of these files are ”out.utl“ and ”out.dump“ respectively.

Setting debug

There is a setting debug that can be used to set the debug level in a UText script. Example:

set debug to files

The possible values are:

filesOn

$$ut{dbg}->filesOn() from now on the names of the files being read are shown at the console. (Note: showhide With this method you activate filesOn on an existing UText object. It is not possible with it to see the root UTL file being read, because it gets read on instantiating the first object. To see the root file name one can instantiate the first object with new UText(1).)

$$ut{dbg}->filesOff() deactivates file names output.

sayOn

$$ut{dbg}->sayOn() activates the verbose debug mode on the $ut utext object.

$$ut{dbg}->sayOff() deactivates the verbose debug mode on the $ut utext object.

statusOn

$$ut{dbg}->statusOn() activates a minimal debug mode on the $ut utext object.

$$ut{dbg}->statusOff() deactivates the debug mode on the $ut utext object. This is the default state for new UText objects.

TraceOutputCalls

Setting set trace out to 1 all bound method calls from the out() method are printed out to the console. In Perl:

$UText::TraceOutputCalls=1

TraceBindingCalls

Setting set trace bind to 1 all tag bind/unbind operations are printed out to the console. In Perl:

$UText::TraceBindingCalls=1

TraceFunctionCalls

Setting set trace functions to 1 all script function calls are printed out to the console. In Perl:

$UText::TraceFunctionCalls=1

list

UText::list($filename[,$withTime])

Dumps the internal text repository contents under the given filename. If $withTime == 1 the creation and update times af all nodes is also output.