Textantrieb | UText/1 | UText/1.2 Manual
UText.pm
This module contains the kernel of the interpreter. It defines the class UText, which is the main class that is instanciated by the Perl scripts. It provides support for parsing UTL strings and files and for programmatically writing into the text repository. It also supports add-in modules and hooks.
Object management
new
$ut = new UText;
is the standard UText constructor.
Can be called as $ut = new UText($debug);
If $debug == 1
then filesOn
is activated.
If $debug == 2
then sayOn
is activated.
clone
$ut2 = clone $ut
is a constructor that clones the UText object $ut and creates a UText object $ut2 with this internal state:
- The reading position (current unit ID) is kept, but opened cursors are lost.
- The bound output processors for output tags are kept.
- The debug mode is kept (if verbose debug or autodump was on, it is still on in the cloned object, etc.)
-
The writing position is reset, that is, new appended units are children of
unit
.
abort
If a script or an add-in module finds an error, it should call this function to abort execution.
$ut->abort("File not found: $file_in");
If autodump is active, before aborting, the complete current text is saved in UTL format under the file out.utl
and the memory contents are dumped to the file out.dump
, both files on the current work directory.
Then the add-in hook aborting
is triggered, in order for each add-in to react to the abort.
At last the given message is output to the standard error stream with additional information about the current context and the execution of the script stops.
Write Methods
These are the main methods to create a text. These methods are used explicitally normally only in utext parsers.
set
Called as $ut->set({role=>...})
. Hash Parameters:
- role => name of the role
- type => name of the type
- ref => reference to a unit
- unit => name of the unit
- bin => binary data
bin is a scalar or a reference to a scalar (auto recognised)
ref references another unit:
-
To set a reference by name:
ref => ref_unit_name
-
To set a reference returned by a transformation use an array:
ref => (tform1, tform2, ..., tformN, ref_unit_name)
. This executes transformations (see below transform()) and sets a reference to its results.
A call to set() does the following: If unit already exists, selects it; if not, creates it. It returns $id of the set unit, which becomes the current one.
def
Called as $ut->def({def=>...})
. Hash Parameters:
- def => name of both the unit and its role
-
type, ref, bin => same as on method
set
(s. above)
If no type
is given, it gets the same as def
.
This method defines a new text unit.
enter
With $ut->enter()
the current unit gets parent for following set()
instructions. There is no current unit more. Children appended after last child. If entered unit has ”loadfile“ files, they are read before returning.
leave
$ut->leave()
returns to state prior to last enter()
.
parse
Invoke a parser with $ut->parse('parser-name');
. The following source lines until ]
will be parsed by it instead of the UTL parser.
A call $ut->parse()
invokes the default parser for the current unit's type. It fails if there is none. To declare a default parser for a unit one adds a child ~parser <Perl sub name>
.
A parser is called with a parameter being the current UText object and as second parameter the unit Id of the parser being called. The lines to be parsed are read with readline
. See Alternative Parsers for more information.
readline
This function is for parsers to get the next line to parse. It returns a string with the source line or undefined
when there are no more. The lines do not have a newline char \n at the end. The line ending the parse region ]
is not being returned by this function. The parser can left lines unread.
transform
A call $ut->transform(tform(1),tform(2),...,tform(N),unit)
performs the following steps:
- (1) Executes transformation tform(N) on unit;
- (2) executes transformation tform(N-1) on the results from (1);
- (3) executes transformation tform(N-2) on the results from (2);
- ...
- (N-1) executes transformation tform(2) on the results from (N-2);
- (N) executes transformation tform(1) on the results from (N-1);
- returns results from (N).
UTL Parsing
readfile
$ut->readfile($filename)
creates a text reading the given file and parsing it. Example:
$ut = new UText;
$ut->readfile('geneaweb.utl');
It admits more than one file name and relative or absolute directory names:
$ut->readfile('geneaweb.utl','families/smiths.utl');
When reading a file, the current working directory is set to the directory where the file is placed.
readfiles
$ut->readfiles($filepattern)
creates a text reading all files that match the pattern and parsing them. Example:
$ut = new UText;
$ut->readfiles('*.utl');
The files are read in alphabetical order by their name. File names ending with ~
or .bak
are by default ignored (see SKIPFILE).
getfile
$ut->getfile($type,$utlpattern,@files)
is an extended readfile
.
This function requires the Script module to be already loaded. If it is not available, getfile
fails.
files
One can read one file or more than one with a single call. Each file name can contain wildcards *
or ?
which are expanded by the OS, for example *.utl
. When using wildcards, the files are read in alphabetical order by their name. File names ending with ~
or .bak
are by default ignored (see SKIPFILE).
UTL pattern
It is possible to embed the file in a UTL expression that defines the context.
Example:
$ut->getfile('',<<END,'smith.utl');
=geneaweb
=smith ~family {
%content
}
END
The parser feeds the file contents as the children units of =smith
. The following arguments are available:
- %content — the file contents
- %file — full file name
- %name — base name of the file without extension
- %timestamp — the modification time of the file (gmt) according to the OS
Type
If the type is ”utl“ or empty, getfile
expects a plain text file. If the add-in module odt
is loaded, it is possible to read word processor documents in OpenDocument Format with type ”odt“.
In order for an add-in module to support reading other file types, it just needs to catch the hook getfile
.
Add-In Modules Support
Functions to use existing modules and activate hooks.
load
UText::load(<module 1>, <module 2>, ...);
Loads the given modules. If they were already load, it does nothing.
is_load
$ut->is_load(<module>)
;
Returns wether the given module was loaded.
call_modules
$ut->call_modules(<function>, <parameter1>, <parameter 2>, ...)
;
Calls the function with the given name in all loaded add-in modules that have it. The modules are called in the order they were loaded. If there are no modules with this function, nothing happens.
The function is called with the UText object $ut
as the first parameter and the given parameter 1 etc. as the next ones.
The function returns a list of all the return values, one scalar value for each module that was called and returned a defined value.
can_module
$ut->can_module(<module>, <function>)
;
Returns whether a module supports a function.
may_call_module
$ut->may_call_module(<module>, <function>, <parameter 1>, <parameter 2>, ...)
;
If the module supports the function, it is called (see call_module
below). If the module does not support it, it does nothing.
call_module
$ut->call_module(<module>, <function>, <parameter 1>, <parameter 2>, ...)
;
Calls the function with the given name at the given module.
The function is called with the UText object $ut
as the first parameter and the given parameter 1 etc. as the next ones.
The function returns the scalar value returned by the function call.
call_module
fails if the module is not loaded or if it does not support the called function.
get_modules
UText::get_modules()
;
Returns a list of loaded modules.
print_modules
UText::print_modules()
;
Prints the loaded modules names at the standard output stream.
Add-In Hooks
The following hooks are set by this module (s. Add-In Hooks for details):
- load: loading module
- init: instantiating UText object
- clone: cloning UText object
- getfile: reading a UTL source file
- aborting: aborting script execution
Script Interpreter
get_script
$scr = $ut->get_script()
;
Returns the script interpreter object (class Script) that is currently bound to the UText object.
If no script interpreter is currently bound, a new one is instantiated.
Properties
INDEXFILE
You can set with $UText::INDEXFILE=$filename
the name of the root UTL file to be read at startup. By default it is root.utl
.
This file gets read when the first UText
object is instanciated. If the variable is empty or no such a file exists, it is ignored. The file must be at the same directory where the interpreter module UText.pm
is located or at the current working directory. They are both read in that order, if both exist.
For the usage of the root files see Feeding Text.
SKIPFILE
$UText::SKIPFILE
contains a Perl regular expression. Files with matching names are ignored when reading files with wildcard patterns. By default files ending with ~
or .bak
are ignored.
For example, if you set:
$UText::SKIPFILE='(~|\.bak|\.old)$';
From now on all read *
calls will not read files ending with ~
, .bak
or .old
. Autodump files are also automatically excluded from being read through patterns.
GENERATOR
$UText::GENERATOR
returns a string identifying the current version of the interpreter.
Debug Support
autodump
With autodump active, if the UText interpreter aborts execution because of an error, it dumps out the repository contents before exiting.
Autodump is by default active, except in an interactive shell session or a script object instantiated with new Script
. To deactivate it in Perl, set $ut->{autodump}=0
.
The names of the autodump files can be set through these variables:
$UText::AutodumpUtlFilename = 'somename.utl';
$UText::AutodumpListFilename = 'somename.txt';
The default names of these files are ”out.utl“ and ”out.dump“ respectively.
Setting debug
There is a setting debug
that can be used to set the debug level in a UText script. Example:
set debug to files
The possible values are:
-
files
: same asfilesOn
-
say
: same assayOn
-
status
: same asstatusOn
-
none
: default value, same asstatusOff
filesOn
$$ut{dbg}->filesOn()
from now on the names of the files being read are shown at the console. (Note: With this method you activate filesOn on an existing UText object. It is not possible with it to see the root UTL file being read, because it gets read on instantiating the first object. To see the root file name one can instantiate the first object with new UText(1)
.)
$$ut{dbg}->filesOff()
deactivates file names output.
sayOn
$$ut{dbg}->sayOn()
activates the verbose debug mode on the $ut utext object.
$$ut{dbg}->sayOff()
deactivates the verbose debug mode on the $ut utext object.
statusOn
$$ut{dbg}->statusOn()
activates a minimal debug mode on the $ut utext object.
$$ut{dbg}->statusOff()
deactivates the debug mode on the $ut utext object. This is the default state for new UText objects.
TraceOutputCalls
Setting set trace out to 1
all bound method calls from the out()
method are printed out to the console. In Perl:
$UText::TraceOutputCalls=1
TraceBindingCalls
Setting set trace bind to 1
all tag bind/unbind operations are printed out to the console. In Perl:
$UText::TraceBindingCalls=1
TraceFunctionCalls
Setting set trace functions to 1
all script function calls are printed out to the console. In Perl:
$UText::TraceFunctionCalls=1
list
UText::list($filename[,$withTime])
Dumps the internal text repository contents under the given filename. If $withTime == 1
the creation and update times af all nodes is also output.