--- trunk/README 2005/07/06 11:47:56 29 +++ trunk/README.pod 2006/05/09 22:55:42 51 @@ -1,41 +1,70 @@ -1. pgestraier - search Hyper Estraier indexes from PostgreSQL +=head1 pgestraier - search Hyper Estraier indexes from PostgreSQL This package is essentially PostgreSQL C function which calls Hyper Estraier API and returns results in user defined format. -2. Why is it written? +=head1 Why is it written? Aside from providing single API to query your RDBMS and full text index (using any language that has PostgreSQL client libraries), real power is hidden in ability to join results from full text index and structured data in RDBMS. -3. How to install +=head1 How to install Installation should be simple. However, you will have to have following software already installed before you try this function: - * PostgreSQL (tested with version 7.4.8) with development libraries - * Hyper Estraier (tested with versions 0.3.9 and 0.3.10) +=over + +=item * + +PostgreSQL (tested with versions 7.4 and 8.0) with development libraries + +=item * + +Hyper Estraier (tested with 0.5.0-1.0.0+, version newer than 0.9.6 are +recommended) + +=back To run tests you will also need: - * working perl installation - * perl modules DBI, DBD::Pg, Test::More - * trivia.list.gz from Internet Movie Database in data/ directory - * database "test" with permissions for current user +=over + +=item * + +working perl installation + +=item * + +perl modules C, C, C and optionally C + +=item * + +C from Internet Movie Database in C directory + +=item * + +PostgreSQL database C with permissions for current user + +=item * + +Hyper Estraier node C with permissions for C user. + +=back If you have all that, you should be able to type make and see sample results. You will be asked your password once (via sudo) to -install pgest.so shared library in system-wide location so that PostgreSQL +install C shared library in system-wide location so that PostgreSQL could access it. Next, you will have to create test index. You have two options: -3.1. Create index using estcmd +=head2 Create index using estcmd This will create temporary files on disk and index them using estcmd gather @@ -43,15 +72,31 @@ make index cd .. -3.2. Create index using Hyper Estraier perl bindings - -For this, you will have to install perl bindings from - - http://tokuhirom.dnsalias.org/~tokuhirom/archive/hyper_estraier_wrappers-0.0.6.tar.gz - -If you installed bindings as documented in README file, you can issue -following commands to create index about three times faster than using -estcmd: +B this method is incomplete and won't create node index needed +to run last examples in C correctly. Solution is simple: either +symlink your newly created index to Hyper Estraier C<_node> directory or +create node and fill re-create index using C. + +=head2 Create index using Hyper Estraier perl bindings + +Perl bindings for Hyper Estraier are available at + +L + +However, they don't support node API (yet), so you will have to use +my modified version which is available at +L in C repository. + +If you installed bindings as documented in README file, you can use +perl binding to create index about three times faster than using C +(to be fair, I must say that creation of intermediate files take most time, +not indexing). + +However, you will first need to create node I using Hyper Estraier's +administration interface at L. You will also +need user C with password C because those values are +hard-coded in C. If you want to use different user on index +name, feel free to change script. cd data make perl @@ -63,10 +108,69 @@ See also included file test.sql for more examples of usage. -4. Who wrote this? +=head1 Usage of pgest from SQL + +C PostgreSQL function has two different prototypes (number of arguments) depending on usage. + + SELECT + -- columns to return (defined later) + id,title,size + FROM pgest( + -- path to index OR URL to node, user-name and password + -- you will need JUST ONE of following two lines, depending + -- on your usage described below, for direct access + '/full/path/to/casket', + -- or for node API specify node URI, login, password + -- and depth of search + 'http://localhost:1978/node/trivia', 'admin', 'admin', 42, + -- query + 'blade runner', + -- additional attributes, use NULL or '' to disable + -- multiple attributes conditions can be separated by {{!}} + '@title ISTRINC blade', + -- order results by + '@title STRA', + -- limit, use NULL or 0 to disable + null, + -- offset, use NULL or 0 to disable + null, + -- attributes to return as columns + ARRAY['@id','@title','@size'] + ) AS ( + -- specify names and types of returned attributes + id text, title text, size text + ); + +=head2 Accessing database directly + +If you want to access database directly (without running C process), first argument is full path to database file. + +Have in mind that C user under which PostgreSQL is running must +have read permission on Hyper Estraier database files. + +This will work a bit faster on really small indexes. However, when your +index grows bigger, you might consider using node API to remove overhead of +database opening on each query. + +B + +=head2 Using index via C server process + +If first argument is URL to node (like C) +and there are two additional parameters (user-name and password) after it, +C will use node API and access index through C process which should be running on (local or remote) machine. + +This will remove database opening overhead, at a cost of additional network +traffic. However, you can have Hyper Estraier C process running on +different machine or update index while doing searches, so benefits of this +approach are obvious. + +=head1 Who wrote this? Hyper Estraier is written by Mikio Hirabayashi. +Perl bindings for Hyper Estraier are written by MATSUNO Tokuhiro. + PostgreSQL is written by hackers calling themselves PostgreSQL Global Development Group.