/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Diff of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/README revision 23 by dpavlin, Thu May 26 20:22:44 2005 UTC trunk/README.pod revision 81 by dpavlin, Wed Aug 9 17:25:31 2006 UTC
# Line 1  Line 1 
1  1. pgestraier - search Hyper Estraier indexes from PostgreSQL  =head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
2    
3  This package is essentially PostgreSQL C function which calls Hyper Estraier  This package is essentially composed of two different parts:
 API and returns results in user defined format.  
4    
5  2. Why is it written?  =over 4
6    
7  Aside from providing single API to query your RDBMS and full text index  =item search function
8    
9    PostgreSQL function to search Hyper Estraier full-text index, using
10    full-text queries and attribute filtering to return user-specified
11    table of results.
12    
13    This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
14    functionality much faster than using those SQL constructs on search
15    results.
16    
17    =item trigger function
18    
19    PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
20    It triggers after insert, update or delete and update full-text index
21    accordingly.
22    
23    =back
24    
25    Both functions are written in C, while test framework and supporting
26    utilities are written in perl.
27    
28    You can use just one of those functions. If you want just to search existing
29    Hyper Estraier index or generate it off-line (after nightly batch jobs, for
30    example), just use search function.
31    
32    On the other hand, if you want just to keep your Hyper Estraier index in
33    sync with PostgreSQL data, you can use just trigger function to achieve that.
34    
35    =head1 Why is it written?
36    
37    Aside from providing single query language (SQL) to RDBMS and full text index
38  (using any language that has PostgreSQL client libraries), real power is  (using any language that has PostgreSQL client libraries), real power is
39  hidden in ability to join results from full text index and structured data  hidden in ability to join results from full text index and structured data
40  in RDBMS.  in RDBMS.
41    
42  3. How to install  For simple real-life example which address problem
43    C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
44    is slow see L<Tutorial> and L<pgest-index> documentation.
45    
46    =head1 How to install
47    
48  Installation should be simple. However, you will have to have following  Installation should be simple. However, you will have to have following
49  software already installed before you try this function:  software already installed before you try this functions:
50    
51    =over
52    
53    =item *
54    
55    PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
56    
57    =item *
58    
59    Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
60    with development headers
61    
62    =item *
63    
64    working C compiler (tested with gcc)
65    
66    =back
67    
68    If you want to use helper script to create consistency triggers to keep
69    Hyper Estraier in sync with PostgreSQL database, you will also need:
70    
71   * PostgreSQL (tested with version 7.4.8) with development libraries  =over
72   * Hyper Estraier (tested with versions 0.3.9 and 0.3.10)  
73    =item *
74    
75    working perl installation
76    
77    =item *
78    
79    perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
80    
81    =back
82    
83  To run tests you will also need:  To run tests you will also need:
84    
85   * working perl installation  =over
86   * perl modules DBI, DBD::Pg, Test::More  
87   * trivia.list.gz from Internet Movie Database in data/ directory  =item *
88   * database "test" with permissions for current user  
89    perl module C<Test::More>
90    
91    =item *
92    
93    C<trivia.list.gz> from Internet Movie Database in C<data/> directory
94    
95    =item *
96    
97    PostgreSQL database C<test> with permissions for current user
98    
99    =item *
100    
101    Hyper Estraier C<estmaster> running with permissions for C<admin> user
102    to create C<trivia> node.
103    
104    =back
105    
106  If you have all that, you should be able to type  If you have all that, you should be able to type
107    
108    make    make
109    
110  and see sample results. You will be asked your password once (via sudo) to  and see sample results. You will be asked your password once (via sudo) to
111  install pgest.so shared library in system-wide location so that PostgreSQL  install C<pgest.so> shared library in system-wide location so that PostgreSQL
112  could access it.  could access it.
113    
114    =head2 Create sample index using Hyper Estraier perl bindings
115    
116    Perl bindings for Hyper Estraier are available at CPAN:
117    
118    L<http://search.cpan.org/~dpavlin/Search-Estraier/>
119    
120    
121    After installing C<Search::Estraier> you can create index using following commands:
122    
123      cd data
124      make index
125      cd ..
126    
127  To run tests (which require that you have estcmd in your $PATH) issue  To run tests (which require that you have estcmd in your $PATH) issue
128    
129    make test    make test
130    
131  See also included file test.sql for more examples of usage.  See also included file C<test.sql> for more examples of usage.
132    
133  4. Who wrote this?  =head1 Usage of search function pgest from SQL
134    
135    C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
136    
137            SELECT
138                    -- columns to return (defined later)
139                    id,title,size
140            FROM pgest(
141                    -- node URI, login, password and depth of search
142                    'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
143                    -- query
144                    'blade runner',
145                    -- additional attributes, use NULL or '' to disable
146                    -- multiple attributes conditions can be separated by {{!}}
147                    '@title ISTRINC blade',
148                    -- order results by
149                    '@title STRA',
150                    -- limit, use NULL or 0 to disable
151                    null,
152                    -- offset, use NULL or 0 to disable
153                    null,
154                    -- attributes to return as columns
155                    ARRAY['@id','@title','@size']
156            ) AS (
157                    -- specify names and types of returned attributes
158                    id text, title text, size text
159            );
160    
161    You should note that Hyper Estraier uses UTF-8 encoding, while your
162    PostgreSQL installation might use different encoding. To fix that, use
163    C<convert> function in PostgreSQL to convert encodings.
164    
165    =head2 Using index via C<estmaster> server process
166    
167    This is default and recommended way to use C<pgest> functionality. In this
168    case, C<pgest> will use node API and access index through C<estmaster>
169    process which should be running on (local or remote) machine.
170    
171    This will remove database opening overhead, at a cost of (small) additional network
172    traffic. However, you can have Hyper Estraier C<estmaster> process running on
173    different machine or update index while doing searches, so benefits of this
174    approach are obvious.
175    
176    =head2 Accessing database directly
177    
178    B<Please note that direct access to database is depreciated.> As such, it's
179    not stated in example, and it's kept just for backward compatibility, but it
180    will probably be removed in future versions of C<pgest>.
181    
182    If you want to access database directly (without running C<estmaster> process), you
183    have to replace node URI, login, password and depth with full path to database file.
184    
185    Have in mind that C<postgres> user under which PostgreSQL is running must
186    have read permission on Hyper Estraier database files.
187    
188    This will work a bit faster on really small indexes. However, when your
189    index grows bigger, you might consider using node API to remove overhead of
190    database opening on each query.
191    
192    =head1 Usage of trigger function pgest_trigger from SQL
193    
194    Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
195    create triggers because it already supports following steps automatically:
196    
197    =over
198    
199    =item begin transaction
200    
201    Transaction is needed to catch updates which might happen while creation
202    of full-text index is in progress (and on huge collections this can take a while,
203    just like normal index creation in PostgreSQL).
204    
205    =item insert all existing data in full-text index
206    
207    This will be done directly from PostgreSQL database to Hyper Estraier index.
208    This is somewhat faster than waiting for trigger to fire for each existing
209    row.
210    
211    =item create insert, update and delete triggers
212    
213    Which will keep data in sync later
214    
215    =item commit transaction
216    
217    =back
218    
219    If you still want to do that manually, you will need to know format of
220    C<pgest_trigger> function:
221    
222            CREATE TRIGGER pgest_trigger_insert AFTER INSERT
223                    ON table FOR EACH ROW
224                    EXECUTE PROCEDURE pgest_trigger(
225                            -- node URI, login and password
226                            'http://localhost:1978/node/trivia', 'admin', 'admin',
227                            -- name of primary key column
228                            'id',
229                            -- names of all other columns to index (one or more)
230                            'column', 'another_one', 'and_another'
231                    )
232    
233    You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
234    
235    =head1 Who wrote this?
236    
237  Hyper Estraier is written by Mikio Hirabayashi.  Hyper Estraier is written by Mikio Hirabayashi.
238    
239  PostgreSQL is written by hackers calling themselves PostgreSQL Global  PostgreSQL is written by hackers calling themselves PostgreSQL Global
240  Development Group.  Development Group.
241    
242  This small C function is written by Dobrica Pavlinusic, dpavlin@rot13.org.  This small C functions are written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.
243    
244    =head1 See also
245    
246    =over
247    
248    =item *
249    
250    L<Tutorial> - how to create first full-text index in under 10 minutes!
251    
252    =item *
253    
254    L<ChangeLog> - what has changed since last version
255    
256    =item *
257    
258    L<pgest-index> - helper script to create index and triggers
259    
260    =item *
261    
262    L<pgFoundry|http://pgfoundry.org/projects/pgestraier/> hosts home page of this project
263    
264    =item *
265    
266    L<Hyper Estraier user guide|http://hyperestraier.sourceforge.net/uguide-en.html#searchcond>
267    has a documentaton about query format. C<pgestraier> is using noraml queries (with
268    C<AND>, C<OR> etc.) and not simplified queryies (with C<|>).
269    
270    =back

Legend:
Removed from v.23  
changed lines
  Added in v.81

  ViewVC Help
Powered by ViewVC 1.1.26