/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Diff of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 51 by dpavlin, Tue May 9 22:55:42 2006 UTC revision 87 by dpavlin, Fri Jul 13 10:12:54 2007 UTC
# Line 1  Line 1 
1  =head1 pgestraier - search Hyper Estraier indexes from PostgreSQL  =head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
2    
3  This package is essentially PostgreSQL C function which calls Hyper Estraier  This package is essentially composed of two different parts:
4  API and returns results in user defined format.  
5    =over 4
6    
7    =item search function
8    
9    PostgreSQL function to search Hyper Estraier full-text index, using
10    full-text queries and attribute filtering to return user-specified
11    table of results.
12    
13    This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
14    functionality much faster than using those SQL constructs on search
15    results.
16    
17    =item trigger function
18    
19    PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
20    It triggers after insert, update or delete and update full-text index
21    accordingly.
22    
23    =back
24    
25    Both functions are written in C, while test framework and supporting
26    utilities are written in perl.
27    
28    You can use just one of those functions. If you want just to search existing
29    Hyper Estraier index or generate it off-line (after nightly batch jobs, for
30    example), just use search function.
31    
32    On the other hand, if you want just to keep your Hyper Estraier index in
33    sync with PostgreSQL data, you can use just trigger function to achieve that.
34    
35  =head1 Why is it written?  =head1 Why is it written?
36    
37  Aside from providing single API to query your RDBMS and full text index  Aside from providing single query language (SQL) to RDBMS and full text index
38  (using any language that has PostgreSQL client libraries), real power is  (using any language that has PostgreSQL client libraries), real power is
39  hidden in ability to join results from full text index and structured data  hidden in ability to join results from full text index and structured data
40  in RDBMS.  in RDBMS.
41    
42    For simple real-life example which address problem
43    C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
44    is slow see L<Tutorial> and L<pgest-index> documentation.
45    
46  =head1 How to install  =head1 How to install
47    
48  Installation should be simple. However, you will have to have following  Installation should be simple. However, you will have to have following
49  software already installed before you try this function:  software already installed before you try this functions:
50    
51  =over  =over
52    
53  =item *  =item *
54    
55  PostgreSQL (tested with versions 7.4 and 8.0) with development libraries  PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
56    
57    =item *
58    
59    Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
60    with development headers
61    
62  =item *  =item *
63    
64  Hyper Estraier (tested with 0.5.0-1.0.0+, version newer than 0.9.6 are  working C compiler (tested with gcc)
 recommended)  
65    
66  =back  =back
67    
68  To run tests you will also need:  If you want to use helper script to create consistency triggers to keep
69    Hyper Estraier in sync with PostgreSQL database, you will also need:
70    
71  =over  =over
72    
# Line 38  working perl installation Line 76  working perl installation
76    
77  =item *  =item *
78    
79  perl modules C<DBI>, C<DBD::Pg>, C<Test::More> and optionally C<HyperEstraier>  perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
80    
81    =back
82    
83    To run tests you will also need:
84    
85    =over
86    
87    =item *
88    
89    perl module C<Test::More>
90    
91  =item *  =item *
92    
93  C<trivia.list.gz> from Internet Movie Database in C<data/> directory  C<trivia.list.gz> from Internet Movie Database in C<data/> directory.
94    You can download it from L<http://www.imdb.com/interfaces>
95    
96  =item *  =item *
97    
# Line 50  PostgreSQL database C<test> with permiss Line 99  PostgreSQL database C<test> with permiss
99    
100  =item *  =item *
101    
102  Hyper Estraier node C<trivia> with permissions for C<admin> user.  Hyper Estraier C<estmaster> running with permissions for C<admin> user
103    to create C<trivia> node.
104    
105  =back  =back
106    
# Line 62  and see sample results. You will be aske Line 112  and see sample results. You will be aske
112  install C<pgest.so> shared library in system-wide location so that PostgreSQL  install C<pgest.so> shared library in system-wide location so that PostgreSQL
113  could access it.  could access it.
114    
115  Next, you will have to create test index. You have two options:  =head2 Create sample index using Hyper Estraier perl bindings
116    
117  =head2 Create index using estcmd  Perl bindings for Hyper Estraier are available at CPAN:
118    
119  This will create temporary files on disk and index them using estcmd gather  L<http://search.cpan.org/~dpavlin/Search-Estraier/>
120    
   cd data  
   make index  
   cd ..  
121    
122  B<Warning:> this method is incomplete and won't create node index needed  After installing C<Search::Estraier> you can create index using following commands:
 to run last examples in C<test.sql> correctly. Solution is simple: either  
 symlink your newly created index to Hyper Estraier C<_node> directory or  
 create node and fill re-create index using C<estcall>.  
   
 =head2 Create index using Hyper Estraier perl bindings  
   
 Perl bindings for Hyper Estraier are available at  
   
 L<http://hyperestraier.sourceforge.net/binding/>  
   
 However, they don't support node API (yet), so you will have to use  
 my modified version which is available at  
 L<http://svn.rot13.org/> in C<hyperestraier_wrappers> repository.  
   
 If you installed bindings as documented in README file, you can use  
 perl binding to create index about three times faster than using C<estcmd>  
 (to be fair, I must say that creation of intermediate files take most time,  
 not indexing).  
   
 However, you will first need to create node I<trivia> using Hyper Estraier's  
 administration interface at L<http://localhost:1978/masterui>. You will also  
 need user C<admin> with password C<admin> because those values are  
 hard-coded in C<indexer.pl>. If you want to use different user on index  
 name, feel free to change script.  
123    
124    cd data    cd data
125    make perl    make index
126    cd ..    cd ..
127    
128  To run tests (which require that you have estcmd in your $PATH) issue  To run tests (which require that you have estcmd in your $PATH) issue
129    
130    make test    make test
131    
132  See also included file test.sql for more examples of usage.  See also included file C<test.sql> for more examples of usage.
133    
134  =head1 Usage of pgest from SQL  =head1 Usage of search function pgest from SQL
135    
136  C<pgest> PostgreSQL function has two different prototypes (number of arguments) depending on usage.  C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
137    
138          SELECT          SELECT
139                  -- columns to return (defined later)                  -- columns to return (defined later)
140                  id,title,size                  id,title,size
141          FROM pgest(          FROM pgest(
142                  -- path to index OR URL to node, user-name and password                  -- node URI, login, password and depth of search
                 -- you will need JUST ONE of following two lines, depending  
                 -- on your usage described below, for direct access  
                 '/full/path/to/casket',  
                 -- or for node API specify node URI, login, password  
                 -- and depth of search  
143                  'http://localhost:1978/node/trivia', 'admin', 'admin', 42,                  'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
144                  -- query                  -- query
145                  'blade runner',                  'blade runner',
# Line 141  C<pgest> PostgreSQL function has two dif Line 159  C<pgest> PostgreSQL function has two dif
159                  id text, title text, size text                  id text, title text, size text
160          );          );
161    
162    You should note that Hyper Estraier uses UTF-8 encoding, while your
163    PostgreSQL installation might use different encoding. To fix that, use
164    C<convert> function in PostgreSQL to convert encodings.
165    
166    =head2 Using index via C<estmaster> server process
167    
168    This is default and recommended way to use C<pgest> functionality. In this
169    case, C<pgest> will use node API and access index through C<estmaster>
170    process which should be running on (local or remote) machine.
171    
172    This will remove database opening overhead, at a cost of (small) additional network
173    traffic. However, you can have Hyper Estraier C<estmaster> process running on
174    different machine or update index while doing searches, so benefits of this
175    approach are obvious.
176    
177  =head2 Accessing database directly  =head2 Accessing database directly
178    
179  If you want to access database directly (without running C<estmaster> process), first argument is full path to database file.  B<Please note that direct access to database is depreciated.> As such, it's
180    not stated in example, and it's kept just for backward compatibility, but it
181    will probably be removed in future versions of C<pgest>.
182    
183    If you want to access database directly (without running C<estmaster> process), you
184    have to replace node URI, login, password and depth with full path to database file.
185    
186  Have in mind that C<postgres> user under which PostgreSQL is running must  Have in mind that C<postgres> user under which PostgreSQL is running must
187  have read permission on Hyper Estraier database files.  have read permission on Hyper Estraier database files.
# Line 152  This will work a bit faster on really sm Line 190  This will work a bit faster on really sm
190  index grows bigger, you might consider using node API to remove overhead of  index grows bigger, you might consider using node API to remove overhead of
191  database opening on each query.  database opening on each query.
192    
193  B<Please note that direct access to database is depriciated.>  =head1 Usage of trigger function pgest_trigger from SQL
194    
195  =head2 Using index via C<estmaster> server process  Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
196    create triggers because it already supports following steps automatically:
197    
198  If first argument is URL to node (like C<http://localhost:1978/node/trivia>)  =over
 and there are two additional parameters (user-name and password) after it,  
 C<pgest> will use node API and access index through C<estmaster> process which should be running on (local or remote) machine.  
199    
200  This will remove database opening overhead, at a cost of additional network  =item begin transaction
201  traffic. However, you can have Hyper Estraier C<estmaster> process running on  
202  different machine or update index while doing searches, so benefits of this  Transaction is needed to catch updates which might happen while creation
203  approach are obvious.  of full-text index is in progress (and on huge collections this can take a while,
204    just like normal index creation in PostgreSQL).
205    
206    =item insert all existing data in full-text index
207    
208    This will be done directly from PostgreSQL database to Hyper Estraier index.
209    This is somewhat faster than waiting for trigger to fire for each existing
210    row.
211    
212    =item create insert, update and delete triggers
213    
214    Which will keep data in sync later
215    
216    =item commit transaction
217    
218    =back
219    
220    If you still want to do that manually, you will need to know format of
221    C<pgest_trigger> function:
222    
223            CREATE TRIGGER pgest_trigger_insert AFTER INSERT
224                    ON table FOR EACH ROW
225                    EXECUTE PROCEDURE pgest_trigger(
226                            -- node URI, login and password
227                            'http://localhost:1978/node/trivia', 'admin', 'admin',
228                            -- name of primary key column
229                            'id',
230                            -- names of all other columns to index (one or more)
231                            'column', 'another_one', 'and_another'
232                    )
233    
234    You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
235    
236  =head1 Who wrote this?  =head1 Who wrote this?
237    
238  Hyper Estraier is written by Mikio Hirabayashi.  Hyper Estraier is written by Mikio Hirabayashi.
239    
 Perl bindings for Hyper Estraier are written by MATSUNO Tokuhiro.  
   
240  PostgreSQL is written by hackers calling themselves PostgreSQL Global  PostgreSQL is written by hackers calling themselves PostgreSQL Global
241  Development Group.  Development Group.
242    
243  This small C function is written by Dobrica Pavlinusic, dpavlin@rot13.org.  This small C functions are written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.
244    
245    =head1 See also
246    
247    =over
248    
249    =item *
250    
251    L<Tutorial> - how to create first full-text index in under 10 minutes!
252    
253    =item *
254    
255    L<ChangeLog> - what has changed since last version
256    
257    =item *
258    
259    L<pgest-index> - helper script to create index and triggers
260    
261    =item *
262    
263    L<pgFoundry|http://pgfoundry.org/projects/pgestraier/> hosts home page of this project
264    
265    =item *
266    
267    L<Hyper Estraier user guide|http://hyperestraier.sourceforge.net/uguide-en.html#searchcond>
268    has a documentaton about query format. C<pgestraier> is using noraml queries (with
269    C<AND>, C<OR> etc.) and not simplified queryies (with C<|>).
270    
271    =back

Legend:
Removed from v.51  
changed lines
  Added in v.87

  ViewVC Help
Powered by ViewVC 1.1.26