/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 87 - (show annotations)
Fri Jul 13 10:12:54 2007 UTC (16 years, 8 months ago) by dpavlin
File size: 7725 byte(s)
added IMDB download link for test database (trivia)
1 =head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
2
3 This package is essentially composed of two different parts:
4
5 =over 4
6
7 =item search function
8
9 PostgreSQL function to search Hyper Estraier full-text index, using
10 full-text queries and attribute filtering to return user-specified
11 table of results.
12
13 This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
14 functionality much faster than using those SQL constructs on search
15 results.
16
17 =item trigger function
18
19 PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
20 It triggers after insert, update or delete and update full-text index
21 accordingly.
22
23 =back
24
25 Both functions are written in C, while test framework and supporting
26 utilities are written in perl.
27
28 You can use just one of those functions. If you want just to search existing
29 Hyper Estraier index or generate it off-line (after nightly batch jobs, for
30 example), just use search function.
31
32 On the other hand, if you want just to keep your Hyper Estraier index in
33 sync with PostgreSQL data, you can use just trigger function to achieve that.
34
35 =head1 Why is it written?
36
37 Aside from providing single query language (SQL) to RDBMS and full text index
38 (using any language that has PostgreSQL client libraries), real power is
39 hidden in ability to join results from full text index and structured data
40 in RDBMS.
41
42 For simple real-life example which address problem
43 C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
44 is slow see L<Tutorial> and L<pgest-index> documentation.
45
46 =head1 How to install
47
48 Installation should be simple. However, you will have to have following
49 software already installed before you try this functions:
50
51 =over
52
53 =item *
54
55 PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
56
57 =item *
58
59 Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
60 with development headers
61
62 =item *
63
64 working C compiler (tested with gcc)
65
66 =back
67
68 If you want to use helper script to create consistency triggers to keep
69 Hyper Estraier in sync with PostgreSQL database, you will also need:
70
71 =over
72
73 =item *
74
75 working perl installation
76
77 =item *
78
79 perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
80
81 =back
82
83 To run tests you will also need:
84
85 =over
86
87 =item *
88
89 perl module C<Test::More>
90
91 =item *
92
93 C<trivia.list.gz> from Internet Movie Database in C<data/> directory.
94 You can download it from L<http://www.imdb.com/interfaces>
95
96 =item *
97
98 PostgreSQL database C<test> with permissions for current user
99
100 =item *
101
102 Hyper Estraier C<estmaster> running with permissions for C<admin> user
103 to create C<trivia> node.
104
105 =back
106
107 If you have all that, you should be able to type
108
109 make
110
111 and see sample results. You will be asked your password once (via sudo) to
112 install C<pgest.so> shared library in system-wide location so that PostgreSQL
113 could access it.
114
115 =head2 Create sample index using Hyper Estraier perl bindings
116
117 Perl bindings for Hyper Estraier are available at CPAN:
118
119 L<http://search.cpan.org/~dpavlin/Search-Estraier/>
120
121
122 After installing C<Search::Estraier> you can create index using following commands:
123
124 cd data
125 make index
126 cd ..
127
128 To run tests (which require that you have estcmd in your $PATH) issue
129
130 make test
131
132 See also included file C<test.sql> for more examples of usage.
133
134 =head1 Usage of search function pgest from SQL
135
136 C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
137
138 SELECT
139 -- columns to return (defined later)
140 id,title,size
141 FROM pgest(
142 -- node URI, login, password and depth of search
143 'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
144 -- query
145 'blade runner',
146 -- additional attributes, use NULL or '' to disable
147 -- multiple attributes conditions can be separated by {{!}}
148 '@title ISTRINC blade',
149 -- order results by
150 '@title STRA',
151 -- limit, use NULL or 0 to disable
152 null,
153 -- offset, use NULL or 0 to disable
154 null,
155 -- attributes to return as columns
156 ARRAY['@id','@title','@size']
157 ) AS (
158 -- specify names and types of returned attributes
159 id text, title text, size text
160 );
161
162 You should note that Hyper Estraier uses UTF-8 encoding, while your
163 PostgreSQL installation might use different encoding. To fix that, use
164 C<convert> function in PostgreSQL to convert encodings.
165
166 =head2 Using index via C<estmaster> server process
167
168 This is default and recommended way to use C<pgest> functionality. In this
169 case, C<pgest> will use node API and access index through C<estmaster>
170 process which should be running on (local or remote) machine.
171
172 This will remove database opening overhead, at a cost of (small) additional network
173 traffic. However, you can have Hyper Estraier C<estmaster> process running on
174 different machine or update index while doing searches, so benefits of this
175 approach are obvious.
176
177 =head2 Accessing database directly
178
179 B<Please note that direct access to database is depreciated.> As such, it's
180 not stated in example, and it's kept just for backward compatibility, but it
181 will probably be removed in future versions of C<pgest>.
182
183 If you want to access database directly (without running C<estmaster> process), you
184 have to replace node URI, login, password and depth with full path to database file.
185
186 Have in mind that C<postgres> user under which PostgreSQL is running must
187 have read permission on Hyper Estraier database files.
188
189 This will work a bit faster on really small indexes. However, when your
190 index grows bigger, you might consider using node API to remove overhead of
191 database opening on each query.
192
193 =head1 Usage of trigger function pgest_trigger from SQL
194
195 Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
196 create triggers because it already supports following steps automatically:
197
198 =over
199
200 =item begin transaction
201
202 Transaction is needed to catch updates which might happen while creation
203 of full-text index is in progress (and on huge collections this can take a while,
204 just like normal index creation in PostgreSQL).
205
206 =item insert all existing data in full-text index
207
208 This will be done directly from PostgreSQL database to Hyper Estraier index.
209 This is somewhat faster than waiting for trigger to fire for each existing
210 row.
211
212 =item create insert, update and delete triggers
213
214 Which will keep data in sync later
215
216 =item commit transaction
217
218 =back
219
220 If you still want to do that manually, you will need to know format of
221 C<pgest_trigger> function:
222
223 CREATE TRIGGER pgest_trigger_insert AFTER INSERT
224 ON table FOR EACH ROW
225 EXECUTE PROCEDURE pgest_trigger(
226 -- node URI, login and password
227 'http://localhost:1978/node/trivia', 'admin', 'admin',
228 -- name of primary key column
229 'id',
230 -- names of all other columns to index (one or more)
231 'column', 'another_one', 'and_another'
232 )
233
234 You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
235
236 =head1 Who wrote this?
237
238 Hyper Estraier is written by Mikio Hirabayashi.
239
240 PostgreSQL is written by hackers calling themselves PostgreSQL Global
241 Development Group.
242
243 This small C functions are written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.
244
245 =head1 See also
246
247 =over
248
249 =item *
250
251 L<Tutorial> - how to create first full-text index in under 10 minutes!
252
253 =item *
254
255 L<ChangeLog> - what has changed since last version
256
257 =item *
258
259 L<pgest-index> - helper script to create index and triggers
260
261 =item *
262
263 L<pgFoundry|http://pgfoundry.org/projects/pgestraier/> hosts home page of this project
264
265 =item *
266
267 L<Hyper Estraier user guide|http://hyperestraier.sourceforge.net/uguide-en.html#searchcond>
268 has a documentaton about query format. C<pgestraier> is using noraml queries (with
269 C<AND>, C<OR> etc.) and not simplified queryies (with C<|>).
270
271 =back

  ViewVC Help
Powered by ViewVC 1.1.26