Vibra Documentation
Vibra-script (vibra) is a perl-script that provides customary
operations for manipulating ASCII-arrays. At the moment available
operations are projection, sorting, selection and counting.
Datafile consists of two logical parts (header
and actual data). Bassist's output follows syntax of vibra-script.
Source code is split to files
vibra.pl and
read_header2.pl . Vibra uses environment variable "TASAHOME", that
should be set to point directory, that contains files read_header2.pl
and vibra.pl. In the examples below it is assumed that you have
written a short script that is named as vibra as follows
#!/bin/csh
setenv TASAHOME "path that contains vibra.pl and read_header2.pl"
vibra.pl $*
If you are from RNI, this is not necessary.
Author of the vibra-script is
Mika Klemettinen .
Header
It is assumed that input file consists of header part and actual
data part. Header defines number of attributes, names of the
attributes, record separator and field separator. Necessary
information about the structure of the data is declared between
markers "header", "attributes", "field separator"
and "record separator". Syntax of the header follows HTML. Header
starts from marker
#<header>
and it ends to marker
#</header>
Names of the attributes are listed between "attributes" markers
#<attributes> attribute_1_name attribute_2_name ... attribute_n_name </attributes>
Record separator is declared between "record separator" markers
#<record separator> '\n' <\record separator>
and field separator is declared in a similar way
#<field separator> ' ' <\field separator>
According to above declarations it would be assumed
that records are separated by new-line character
and fields are separated by one space-bar character.
Line that begins with "#"-character is assumed
to be a comment line, if it is not followed
by any marker and it is within "header" markers.
For instance a complete datafile might be as follows
#<header>
#<attributes> C1 C2 C3 C4 C5 </attributes>
#<record separator> '\n' <\record separator>
#<field separator> ' ' <\field separator>
# My comment line
#</header>
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
50 51 52 53 54
60 61 62 63 64
70 71 72 73 74
Operations
General command-line syntax is
vibra [-C | -O | -S | -P] [-h] {<parameter>} < input > output
Arguments: -C - count mode
-O - sort mode
-S - select mode
-P - project mode
-h - usage help and script information
Help for different modes: vibra [-C | -O | -S | -P] -h
Projection
vibra -P [-h] [-r] <attribute_1> ... <attribute_n> < input > output
Arguments: attr - picked attribute, the given attributes
and their order define the output order
-h - usage help and script information
-r - option for restrictive picking: if it
is given the script leaves the given
attributes out
Remark that projection does not project only those attributes that are given as an
argument. Projection projects also all those attributes, whose name has as a substring
at least one of the arguments. For instance, if we
have attributes "AirTemp" and "Temp", then command
"vibra -P Temp < inputfile " projects
both "Temp" and "AirTemp". We do not always want to
project all those attributes that have a common substring
in their names. Characters "^" and "$" are reserved in order
to bypass this problem.
- vibra -P ^foo < inputfile
Projects all the attributes that begins with substring "foo".
- vibra -P foo$ < inputfile
Projects all the attributes that ends to substring "foo".
# <header>
# <attributes> Ca Cambigua Catchare </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
0.470000 0 3.566930
0.476230 0 2.705830
0.698130 0 3.170660
0.300100 0 3.879520
1.332370 0.833330 3.840540
Command: vibra -P -r a$ < inputfile (project all the attributes, that do not end to character "a").
Output:
#<header>
#<attributes> Catchare </attributes>
#<record separator> '\n' </record separator>
#<field separator> ' ' </field separator>
#</header>
3.566930
2.705830
3.170660
3.879520
3.840540
Selection
Syntax: vibra -S [-h] {<attr<n>> <op<n>> <value<n>>} <input >output
Arguments: attr - a name of attribute in the table
op - a perl operator: ==, <, >, <=, >=, != for
INTEGERS and correspondingly eq, lt, gt, le,
ge, ne for STRINGS. You can also use =~ and
!~ for pattern matching from STRINGS.
Operator select provides a mechanism to test a logical condition against
each record. If condition is true record is selected, otherwise
record is discarded. It is also possible to filter records
through a regular expressions using perl-like operators
"=~" and "!~".
# <header>
# <attributes> Lake Temp Latitude </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
lak00001 2.708050 66.919998
lak00002 2.639060 67.820000
lak00003 2.564950 67.849998
lak00004 2.541600 67.980003
lak00005 2.533700 68.010002
Command: vibra -S Lake eq lak00001 < inputfile
Output:
# <header>
# <attributes> Lake Temp Latitude </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
lak00001 2.708050 66.919998
Command: vibra -S "Temp < 2.70" < inputfile | vibra -S "Temp > 2.55"
Output:
# <header>
# <attributes> Lake Temp Latitude </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
lak00002 2.639060 67.820000
lak00003 2.564950 67.849998
Command: vibra -S "Latitude =~ /98/" < inputfile"
Output:
# <header>
# <attributes> Lake Temp Latitude </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
lak00001 2.708050 66.919998
lak00003 2.564950 67.849998
lak00004 2.541600 67.980003
Count
Syntax: vibra -C [-h] <attr 1> ... <attr n> < input >output
Arguments: attr - a name of attribute in the input
stream file
-h - usage help and script information
Operator count calculates number of similar occurences of fields that
are given as arguments.
# <header>
# <attributes> SSN Name Employer Room </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
1 Uolevi TKTL A318
2 Helga RNI B532
1 Uolevi RNI B555
3 Helga TKK T222
4 Esko TKTL A321
Command: vibra -C SSN Name Employer < inputfile
Output:
# <header>
# <attributes> SSN Name Employer Room </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
####################################################
# Counted attributes are (in order of appearance): #
####################################################
# SSN (total: 5, unique: 4)
# SSN -- Name
# SSN -- Name -- Employer
####################################################
#
1: 2
1 -- Uolevi: 2
1 -- Uolevi -- RNI: 1
1 -- Uolevi -- TKTL: 1
2: 1
2 -- Helga: 1
2 -- Helga -- RNI: 1
3: 1
3 -- Helga: 1
3 -- Helga -- TKK: 1
4: 1
4 -- Esko: 1
4 -- Esko -- TKTL: 1
Sort
Syntax: vibra -O [-h] <attr 1> [-n] ... <attr n> [-n] < input >output
Arguments: attr - a name of attribute in the input
stream file
-h - usage help and script information
-n - sorting attribute followed by -n
is sorted as NUMBER, default is
STRING (text)
Operation sort sorts records according to given attributes to alphabetical
order by default. If you are willing to sort according to numerical
order use option "-n". Sorting is carried out primarily according
to attribute that is declared leftmost in command-line (and
secondary according to second leftmost and so on...).
# <header>
# <attributes> SSN Name Employer Room </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
1 Uolevi TKTL A318
2 Helga RNI B532
1 Uolevi RNI B555
3 Helga TKK T222
4 Esko TKTL A321
Command: vibra -O SSN Room < inputfile
Output:
# <header>
# <attributes> SSN Name Employer Room </attributes>
# <record separator> '\n' </record separator>
# <field separator> ' ' </field separator>
# </header>
1 Uolevi TKTL A318
1 Uolevi RNI B555
2 Helga RNI B532
3 Helga TKK T222
4 Esko TKTL A321