--- branches/CPAN/README	2000/04/28 15:41:10	11
+++ branches/CPAN/README	2000/05/09 11:29:45	19
@@ -1,6 +1,6 @@
-                             WAIT 1.6
+                             WAIT 1.8
 
-                  Copyright (c) 1996, Ulrich Pfeifer
+                  Copyright (c) 1996-2000, Ulrich Pfeifer
 
 ------------------------------------------------------------------------
     This program is free software; you can redistribute it and/or
@@ -11,48 +11,81 @@
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
 ------------------------------------------------------------------------
 
-This software is not actively maintained by it's author.
+News:
 
-For more two years now I tried to steal some time to clean this up
-without any luck. So I decided to pass the baton on. I consider the
-input part pretty satisfying. The query part - despite being operable
-and useful - needs a major overhaul. To provide a forum for further
-discussions an to coordinate further developement, I did setup a
-mailinglist.  Drop me a line if you want to participate.
+Locking
+=======
+
+WAIT now supports some basic locking.
+
+Speed
+=====
+
+Searching large collections is now considerably faster:
+
+        $table->search({attr  => 'text', 
+                        cont  => $query, 
+                        top   => 1, 
+                        picky => 0});
+
+Table indices may now be tuned to improve search performance.  The
+index tuning can be switched on and off using $table->set(top=>1/0) to
+allow for bulk inserts.
+
+Documentation
+=============
+
+WAIT is still not documented really.  But Andreas König took the
+trouble to comment the example scripts.  This will help you
+implementing your own applications.  I added some tiny scripts to
+index e.g. your .yow file or the fourtune databases.
+
+SourceForge
+===========
+
+WAIT is registered on SourceForge now:
+
+        http://wait.sourceforge.net/
+        https://sourceforge.net/project/?group_id=4814
+
+I will keep the CVS repository up to date.  If you have some spare
+tuits, feel free to contribute.
 
 Ulrich Pfeifer <upf@wait.de> 
 
 ------------------------------------------------------------------------
 NAME
-    WAIT - a rewrite of the freeWAIS-sf engine in Perl
+    WAIT - a rewrite of the freeWAIS-sf engine in Perl and XS
+
+SYNOPSIS
+    A Synopsis is not yet available.
 
 Status of this document
-    I started writing down some information about the implementation
-    before I forget them in my spare time. The stuff is incomplete
-    at least. Any additions, corrections, ... welcome.
+    I started writing down some information about the implementation before
+    I forget them in my spare time. The stuff is incomplete at least. Any
+    additions, corrections, ... welcome.
 
 PURPOSE
-    As you might know, I developed and maintained freeWAIS-sf (with
-    the help of many people in The Net). FreeWAIS-sf is based on
-    freeWAIS maintained by the Clearing House for Network
-    Information Retrieval (CNIDR) which in turn is based on wais-8-
-    b5 implemented by Thinking Machine et al. During this long
-    history - implementation started about 1989 - many people
-    contributed to the distribution and added features not foreseen
-    by the original design. While the system fulfills its task now,
-    the code has reached a state where adding new features is nearly
-    impossible and even fixing longstanding bugs and removing
-    limitations has become a very time consuming task.
-
-    Therefore I decided to pass the maintenance to WSC Inc. and
-    built a new system from scratch. For obvious reasons I choosed
-    Perl as implementation language.
+    As you might know, I developed and maintained freeWAIS-sf (with the help
+    of many people in The Net). FreeWAIS-sf is based on freeWAIS maintained
+    by the Clearing House for Network Information Retrieval (CNIDR) which in
+    turn is based on wais-8-b5 implemented by Thinking Machine et al. During
+    this long history - implementation started about 1989 - many people
+    contributed to the distribution and added features not foreseen by the
+    original design. While the system fulfills its task now, the code has
+    reached a state where adding new features is nearly impossible and even
+    fixing longstanding bugs and removing limitations has become a very time
+    consuming task.
+
+    Therefore I decided to pass the maintenance to WSC Inc. and built a new
+    system from scratch. For obvious reasons I choosed Perl as
+    implementation language.
 
 DESCRIPTION
     The central idea of the system is to provide a framework and the
-    building blocks for any indexing and search system the users
-    might want to build. Obviously the framework limits the class of
-    system which can be build.
+    building blocks for any indexing and search system the users might want
+    to build. Obviously the framework limits the class of system which can
+    be build.
 
            +------+     +-----+     +------+
        ==> |Access| ==> |Parse| ==> |      |
@@ -64,33 +97,30 @@
        <= |Display| <== |Query| <-> |      |
           +-------+     +-----+     +------+
 
-    A collection (aka table) is defined by the instances of the
-    access and parse module together with the filter definitions. At
-    query time in addition a query and a display module must be
-    choosen.
+    A collection (aka table) is defined by the instances of the access and
+    parse module together with the filter definitions. At query time in
+    addition a query and a display module must be choosen.
 
   Access
 
-    The access module defines which documents where members of a
-    database. Usually an access module is a tied hash, whose keys
-    are the Ids of the documents (did = document id) and whose
-    values are the documents themselves. The indexing process loops
-    over the keys using `FIRSTKEY' and `NEXTKEY'. Documents are
-    retrieved with `FETCH'.
-
-    By convention access modules should be members of the
-    `WAIT::Document' hierarchy. Have a look at the
-    `WAIT::Document::Split' module to get the idea.
+    The access module defines which documents are members of a database.
+    Usually an access module is a tied hash, whose keys are the Ids of the
+    documents (did = document id) and whose values are the documents
+    themselves. The indexing process loops over the keys using `FIRSTKEY'
+    and `NEXTKEY'. Documents are retrieved with `FETCH'.
+
+    By convention access modules should be members of the `WAIT::Document'
+    hierarchy. Have a look at the `WAIT::Document::Split' module to get the
+    idea.
 
   Parse
 
-    The task parse module is to split the documents into logical
-    parts via the `split' method. E.g. the `WAIT::Parse::Nroff'
-    splits manuals piped through nroff(1) into the sections *name*,
-    *synopsis*, *options*, *description*, *author*, *example*,
-    *bugs*, *text*, *see*, and *environment*. Here is the
-    implementation of `WAIT::Parse::Base' which handes documents
-    with a pretty simple tagged format:
+    The task of the parse module is to split the documents into logical
+    parts via the `split' method. E.g. the `WAIT::Parse::Nroff' splits
+    manuals piped through nroff(1) into the sections *name*, *synopsis*,
+    *options*, *description*, *author*, *example*, *bugs*, *text*, *see*,
+    and *environment*. Here is the implementation of `WAIT::Parse::Base'
+    which handles documents with a pretty simple tagged format:
 
       AU: Pfeifer, U.; Fuhr, N.; Huynh, T.
       TI: Searching Structured Documents with the Enhanced Retrieval
@@ -106,7 +136,7 @@
       sub split {                     # called as method
         my %result;
         my $fld;
-      
+
         for (split /\n/, $_[1]) {
           if (s/^(\S+):\s*//) {
             $fld = lc $1;
@@ -114,20 +144,19 @@
           $result{$fld} .= $_ if defined $fld;
         }
         return \%result;
-      } 
+      }
 
-    Since the original document cannot be reconstructed from its
-    attributes, we need a second method (*tag*) which marks the
-    regions of the document with tags for the different attributes.
-    This tagged form is used by the display module to hilight search
-    terms in the documents. Besides the tags for the attributes, the
-    method might assign the special tags `_b' and `_i' for
-    indicating bold and italic regions.
+    Since the original document cannot be reconstructed from its attributes,
+    we need a second method (*tag*) which marks the regions of the document
+    with tags for the different attributes. This tagged form is used by the
+    display module to hilight search terms in the documents. Besides the
+    tags for the attributes, the method might assign the special tags `_b'
+    and `_i' for indicating bold and italic regions.
 
       sub tag {
         my @result;
         my $tag;
-        
+
         for (split /\n/, $_[1]) {
           next if /^\w\w:\s*$/;
           if (s/^(\S+)://) {
@@ -141,65 +170,43 @@
           }
         }
         return @result;               # we don't go for speed
-      } 
+      }
 
-    Obviously one could implement `split' via `tag'. The reason for
-    having two functions is speed. We need to call `split' for each
-    document when indexing a collection. Therefore speed is
-    essential. On the other hand, `tag' is called in order to
-    display a single document and may be a little slower. It may
-    care about tagging bold and italic regions. See
+    Obviously one could implement `split' via `tag'. The reason for having
+    two functions is speed. We need to call `split' for each document when
+    indexing a collection. Therefore speed is essential. On the other hand,
+    `tag' is called in order to display a single document and may be a
+    little slower. It may care about tagging bold and italic regions. See
     `WAIT::Parse::Nroff' how this might decrease performance.
 
   Filter definition
 
-    From the Information Retrieval perspective, the hardest part of
-    the system is the filter module. The database administrator
-    defines for each attribute, how the contents should be processed
-    before it is stored in the index. Usually the processing
-    contains steps to restrict the character set, case
-    transformation, splitting to words and transforming to word
-    stems. In WAIT these steps are defined naturally as a pipeline
-    of processing steps. The pipelines are made up by functions in
-    the package WAIT::Filter which is pre-populated by the most
-    common functions but may be extended any time.
+    From the Information Retrieval perspective, the hardest part of the
+    system is the filter module. The database administrator defines for each
+    attribute, how the contents should be processed before it is stored in
+    the index. Usually the processing contains steps to restrict the
+    character set, case transformation, splitting to words and transforming
+    to word stems. In WAIT these steps are defined naturally as a pipeline
+    of processing steps. The pipelines are made up by functions in the
+    package WAIT::Filter which is pre-populated by the most common functions
+    but may be extended any time.
 
-    The equivalent for a typical freeWAIS-sf processing would be
-    this pipeline:
+    The equivalent for a typical freeWAIS-sf processing would be this
+    pipeline:
 
             [ 'isotr', 'isolc', 'split2', 'stop', 'Stem']
 
-    The function `isotr' replaces unknown characters by blanks.
-    `isolc' transforms to lower case. `split2' splits into words and
-    removes words shorter than two characters. `stop' removes the
-    freeWAIS-sf stopwords and `Stem' applies the Porter algorithm
-    for computing the stem of the words.
-
-    The filter definition for a collection defines a set of piplines
-    for the attributes and modifies the pipelines which should be
-    used for prefix and interval searches.
-
-    Here is a complete example:
-
-      my $stem  = [{
-                    'prefix'    => ['unroff', 'isotr', 'isolc'],
-                    'intervall' => ['unroff', 'isotr', 'isolc'],
-                   },'unroff', 'isotr', 'isolc', 'split2', 'stop', 'Stem'];
-      my $text  = [{
-                    'prefix'    => ['unroff', 'isotr', 'isolc'],
-                    'intervall' => ['unroff', 'isotr', 'isolc'],
-                   },
-                    'unroff', 'isotr', 'isolc', 'split2', 'stop'];
-      my $sound = ['unroff', 'isotr', 'isolc', 'split2', 'Soundex'];
-      
-      my $spec  = [
-          'name'         => $stem,
-          'synopsis'     => $stem,
-          'bugs'         => $stem,
-          'description'  => $stem,
-          'text'         => $stem,
-          'environment'  => $text,
-          'example'      => $text,  'example' => $stem,
-          'author'       => $sound, 'author'  => $stem,
-         ]
+    The function `isotr' replaces unknown characters by blanks. `isolc'
+    transforms to lower case. `split2' splits into words and removes words
+    shorter than two characters. `stop' removes the freeWAIS-sf stopwords
+    and `Stem' applies the Porter algorithm for computing the stem of the
+    words.
+
+    The filter definition for a collection defines a set of pipelines for
+    the attributes and modifies the pipelines which should be used for
+    prefix and interval searches.
+
+    Several complete working examples come with WAIT in the script
+    directory. It is recommended to follow the pattern of the scripts
+    smakewhatis and sman.