/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 87 - (hide annotations)
Fri Jul 13 10:12:54 2007 UTC (16 years, 9 months ago) by dpavlin
File size: 7725 byte(s)
added IMDB download link for test database (trivia)
1 dpavlin 64 =head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
2 dpavlin 23
3 dpavlin 64 This package is essentially composed of two different parts:
4 dpavlin 23
5 dpavlin 64 =over 4
6    
7     =item search function
8    
9     PostgreSQL function to search Hyper Estraier full-text index, using
10     full-text queries and attribute filtering to return user-specified
11     table of results.
12    
13     This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
14     functionality much faster than using those SQL constructs on search
15     results.
16    
17     =item trigger function
18    
19     PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
20     It triggers after insert, update or delete and update full-text index
21     accordingly.
22    
23     =back
24    
25     Both functions are written in C, while test framework and supporting
26     utilities are written in perl.
27    
28     You can use just one of those functions. If you want just to search existing
29     Hyper Estraier index or generate it off-line (after nightly batch jobs, for
30     example), just use search function.
31    
32     On the other hand, if you want just to keep your Hyper Estraier index in
33     sync with PostgreSQL data, you can use just trigger function to achieve that.
34    
35 dpavlin 33 =head1 Why is it written?
36 dpavlin 23
37 dpavlin 64 Aside from providing single query language (SQL) to RDBMS and full text index
38 dpavlin 23 (using any language that has PostgreSQL client libraries), real power is
39     hidden in ability to join results from full text index and structured data
40     in RDBMS.
41    
42 dpavlin 59 For simple real-life example which address problem
43 dpavlin 64 C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
44 dpavlin 75 is slow see L<Tutorial> and L<pgest-index> documentation.
45 dpavlin 59
46 dpavlin 33 =head1 How to install
47 dpavlin 23
48     Installation should be simple. However, you will have to have following
49 dpavlin 64 software already installed before you try this functions:
50 dpavlin 23
51 dpavlin 33 =over
52 dpavlin 23
53 dpavlin 33 =item *
54    
55 dpavlin 64 PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
56 dpavlin 33
57     =item *
58    
59 dpavlin 64 Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
60     with development headers
61 dpavlin 33
62 dpavlin 64 =item *
63    
64     working C compiler (tested with gcc)
65    
66 dpavlin 33 =back
67    
68 dpavlin 64 If you want to use helper script to create consistency triggers to keep
69     Hyper Estraier in sync with PostgreSQL database, you will also need:
70 dpavlin 23
71 dpavlin 33 =over
72 dpavlin 23
73 dpavlin 33 =item *
74    
75     working perl installation
76    
77     =item *
78    
79 dpavlin 64 perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
80 dpavlin 33
81 dpavlin 64 =back
82    
83     To run tests you will also need:
84    
85     =over
86    
87 dpavlin 33 =item *
88    
89 dpavlin 64 perl module C<Test::More>
90    
91     =item *
92    
93 dpavlin 87 C<trivia.list.gz> from Internet Movie Database in C<data/> directory.
94     You can download it from L<http://www.imdb.com/interfaces>
95 dpavlin 33
96     =item *
97    
98 dpavlin 43 PostgreSQL database C<test> with permissions for current user
99 dpavlin 33
100 dpavlin 43 =item *
101    
102 dpavlin 59 Hyper Estraier C<estmaster> running with permissions for C<admin> user
103     to create C<trivia> node.
104 dpavlin 43
105 dpavlin 33 =back
106    
107 dpavlin 23 If you have all that, you should be able to type
108    
109     make
110    
111     and see sample results. You will be asked your password once (via sudo) to
112 dpavlin 43 install C<pgest.so> shared library in system-wide location so that PostgreSQL
113 dpavlin 23 could access it.
114    
115 dpavlin 54 =head2 Create sample index using Hyper Estraier perl bindings
116 dpavlin 29
117 dpavlin 54 Perl bindings for Hyper Estraier are available at CPAN:
118 dpavlin 29
119 dpavlin 54 L<http://search.cpan.org/~dpavlin/Search-Estraier/>
120 dpavlin 29
121    
122 dpavlin 54 After installing C<Search::Estraier> you can create index using following commands:
123 dpavlin 42
124 dpavlin 29 cd data
125 dpavlin 54 make index
126 dpavlin 29 cd ..
127    
128 dpavlin 23 To run tests (which require that you have estcmd in your $PATH) issue
129    
130     make test
131    
132 dpavlin 54 See also included file C<test.sql> for more examples of usage.
133 dpavlin 23
134 dpavlin 64 =head1 Usage of search function pgest from SQL
135 dpavlin 42
136 dpavlin 54 C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
137 dpavlin 42
138     SELECT
139     -- columns to return (defined later)
140     id,title,size
141     FROM pgest(
142 dpavlin 54 -- node URI, login, password and depth of search
143 dpavlin 49 'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
144 dpavlin 42 -- query
145     'blade runner',
146     -- additional attributes, use NULL or '' to disable
147     -- multiple attributes conditions can be separated by {{!}}
148     '@title ISTRINC blade',
149     -- order results by
150     '@title STRA',
151     -- limit, use NULL or 0 to disable
152     null,
153     -- offset, use NULL or 0 to disable
154     null,
155     -- attributes to return as columns
156     ARRAY['@id','@title','@size']
157     ) AS (
158     -- specify names and types of returned attributes
159     id text, title text, size text
160     );
161    
162 dpavlin 54 You should note that Hyper Estraier uses UTF-8 encoding, while your
163     PostgreSQL installation might use different encoding. To fix that, use
164     C<convert> function in PostgreSQL to convert encodings.
165    
166     =head2 Using index via C<estmaster> server process
167    
168     This is default and recommended way to use C<pgest> functionality. In this
169     case, C<pgest> will use node API and access index through C<estmaster>
170     process which should be running on (local or remote) machine.
171    
172     This will remove database opening overhead, at a cost of (small) additional network
173     traffic. However, you can have Hyper Estraier C<estmaster> process running on
174     different machine or update index while doing searches, so benefits of this
175     approach are obvious.
176    
177 dpavlin 42 =head2 Accessing database directly
178    
179 dpavlin 54 B<Please note that direct access to database is depreciated.> As such, it's
180     not stated in example, and it's kept just for backward compatibility, but it
181     will probably be removed in future versions of C<pgest>.
182 dpavlin 42
183 dpavlin 54 If you want to access database directly (without running C<estmaster> process), you
184     have to replace node URI, login, password and depth with full path to database file.
185    
186 dpavlin 42 Have in mind that C<postgres> user under which PostgreSQL is running must
187     have read permission on Hyper Estraier database files.
188    
189     This will work a bit faster on really small indexes. However, when your
190     index grows bigger, you might consider using node API to remove overhead of
191     database opening on each query.
192    
193 dpavlin 64 =head1 Usage of trigger function pgest_trigger from SQL
194    
195     Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
196     create triggers because it already supports following steps automatically:
197    
198     =over
199    
200     =item begin transaction
201    
202     Transaction is needed to catch updates which might happen while creation
203     of full-text index is in progress (and on huge collections this can take a while,
204     just like normal index creation in PostgreSQL).
205    
206     =item insert all existing data in full-text index
207    
208     This will be done directly from PostgreSQL database to Hyper Estraier index.
209     This is somewhat faster than waiting for trigger to fire for each existing
210     row.
211    
212     =item create insert, update and delete triggers
213    
214     Which will keep data in sync later
215    
216     =item commit transaction
217    
218     =back
219    
220     If you still want to do that manually, you will need to know format of
221     C<pgest_trigger> function:
222    
223     CREATE TRIGGER pgest_trigger_insert AFTER INSERT
224     ON table FOR EACH ROW
225     EXECUTE PROCEDURE pgest_trigger(
226     -- node URI, login and password
227     'http://localhost:1978/node/trivia', 'admin', 'admin',
228     -- name of primary key column
229     'id',
230     -- names of all other columns to index (one or more)
231     'column', 'another_one', 'and_another'
232     )
233    
234     You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
235    
236 dpavlin 33 =head1 Who wrote this?
237 dpavlin 23
238     Hyper Estraier is written by Mikio Hirabayashi.
239    
240     PostgreSQL is written by hackers calling themselves PostgreSQL Global
241     Development Group.
242    
243 dpavlin 75 This small C functions are written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.
244    
245     =head1 See also
246    
247 dpavlin 81 =over
248    
249     =item *
250    
251 dpavlin 75 L<Tutorial> - how to create first full-text index in under 10 minutes!
252    
253 dpavlin 81 =item *
254    
255 dpavlin 75 L<ChangeLog> - what has changed since last version
256    
257 dpavlin 81 =item *
258    
259 dpavlin 75 L<pgest-index> - helper script to create index and triggers
260 dpavlin 81
261     =item *
262    
263     L<pgFoundry|http://pgfoundry.org/projects/pgestraier/> hosts home page of this project
264    
265     =item *
266    
267     L<Hyper Estraier user guide|http://hyperestraier.sourceforge.net/uguide-en.html#searchcond>
268     has a documentaton about query format. C<pgestraier> is using noraml queries (with
269     C<AND>, C<OR> etc.) and not simplified queryies (with C<|>).
270    
271     =back

  ViewVC Help
Powered by ViewVC 1.1.26