/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 81 - (show annotations)
Wed Aug 9 17:25:31 2006 UTC (17 years, 8 months ago) by dpavlin
File size: 7665 byte(s)
finished see also, with note about queryies
1 =head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
2
3 This package is essentially composed of two different parts:
4
5 =over 4
6
7 =item search function
8
9 PostgreSQL function to search Hyper Estraier full-text index, using
10 full-text queries and attribute filtering to return user-specified
11 table of results.
12
13 This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
14 functionality much faster than using those SQL constructs on search
15 results.
16
17 =item trigger function
18
19 PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
20 It triggers after insert, update or delete and update full-text index
21 accordingly.
22
23 =back
24
25 Both functions are written in C, while test framework and supporting
26 utilities are written in perl.
27
28 You can use just one of those functions. If you want just to search existing
29 Hyper Estraier index or generate it off-line (after nightly batch jobs, for
30 example), just use search function.
31
32 On the other hand, if you want just to keep your Hyper Estraier index in
33 sync with PostgreSQL data, you can use just trigger function to achieve that.
34
35 =head1 Why is it written?
36
37 Aside from providing single query language (SQL) to RDBMS and full text index
38 (using any language that has PostgreSQL client libraries), real power is
39 hidden in ability to join results from full text index and structured data
40 in RDBMS.
41
42 For simple real-life example which address problem
43 C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
44 is slow see L<Tutorial> and L<pgest-index> documentation.
45
46 =head1 How to install
47
48 Installation should be simple. However, you will have to have following
49 software already installed before you try this functions:
50
51 =over
52
53 =item *
54
55 PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
56
57 =item *
58
59 Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
60 with development headers
61
62 =item *
63
64 working C compiler (tested with gcc)
65
66 =back
67
68 If you want to use helper script to create consistency triggers to keep
69 Hyper Estraier in sync with PostgreSQL database, you will also need:
70
71 =over
72
73 =item *
74
75 working perl installation
76
77 =item *
78
79 perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
80
81 =back
82
83 To run tests you will also need:
84
85 =over
86
87 =item *
88
89 perl module C<Test::More>
90
91 =item *
92
93 C<trivia.list.gz> from Internet Movie Database in C<data/> directory
94
95 =item *
96
97 PostgreSQL database C<test> with permissions for current user
98
99 =item *
100
101 Hyper Estraier C<estmaster> running with permissions for C<admin> user
102 to create C<trivia> node.
103
104 =back
105
106 If you have all that, you should be able to type
107
108 make
109
110 and see sample results. You will be asked your password once (via sudo) to
111 install C<pgest.so> shared library in system-wide location so that PostgreSQL
112 could access it.
113
114 =head2 Create sample index using Hyper Estraier perl bindings
115
116 Perl bindings for Hyper Estraier are available at CPAN:
117
118 L<http://search.cpan.org/~dpavlin/Search-Estraier/>
119
120
121 After installing C<Search::Estraier> you can create index using following commands:
122
123 cd data
124 make index
125 cd ..
126
127 To run tests (which require that you have estcmd in your $PATH) issue
128
129 make test
130
131 See also included file C<test.sql> for more examples of usage.
132
133 =head1 Usage of search function pgest from SQL
134
135 C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
136
137 SELECT
138 -- columns to return (defined later)
139 id,title,size
140 FROM pgest(
141 -- node URI, login, password and depth of search
142 'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
143 -- query
144 'blade runner',
145 -- additional attributes, use NULL or '' to disable
146 -- multiple attributes conditions can be separated by {{!}}
147 '@title ISTRINC blade',
148 -- order results by
149 '@title STRA',
150 -- limit, use NULL or 0 to disable
151 null,
152 -- offset, use NULL or 0 to disable
153 null,
154 -- attributes to return as columns
155 ARRAY['@id','@title','@size']
156 ) AS (
157 -- specify names and types of returned attributes
158 id text, title text, size text
159 );
160
161 You should note that Hyper Estraier uses UTF-8 encoding, while your
162 PostgreSQL installation might use different encoding. To fix that, use
163 C<convert> function in PostgreSQL to convert encodings.
164
165 =head2 Using index via C<estmaster> server process
166
167 This is default and recommended way to use C<pgest> functionality. In this
168 case, C<pgest> will use node API and access index through C<estmaster>
169 process which should be running on (local or remote) machine.
170
171 This will remove database opening overhead, at a cost of (small) additional network
172 traffic. However, you can have Hyper Estraier C<estmaster> process running on
173 different machine or update index while doing searches, so benefits of this
174 approach are obvious.
175
176 =head2 Accessing database directly
177
178 B<Please note that direct access to database is depreciated.> As such, it's
179 not stated in example, and it's kept just for backward compatibility, but it
180 will probably be removed in future versions of C<pgest>.
181
182 If you want to access database directly (without running C<estmaster> process), you
183 have to replace node URI, login, password and depth with full path to database file.
184
185 Have in mind that C<postgres> user under which PostgreSQL is running must
186 have read permission on Hyper Estraier database files.
187
188 This will work a bit faster on really small indexes. However, when your
189 index grows bigger, you might consider using node API to remove overhead of
190 database opening on each query.
191
192 =head1 Usage of trigger function pgest_trigger from SQL
193
194 Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
195 create triggers because it already supports following steps automatically:
196
197 =over
198
199 =item begin transaction
200
201 Transaction is needed to catch updates which might happen while creation
202 of full-text index is in progress (and on huge collections this can take a while,
203 just like normal index creation in PostgreSQL).
204
205 =item insert all existing data in full-text index
206
207 This will be done directly from PostgreSQL database to Hyper Estraier index.
208 This is somewhat faster than waiting for trigger to fire for each existing
209 row.
210
211 =item create insert, update and delete triggers
212
213 Which will keep data in sync later
214
215 =item commit transaction
216
217 =back
218
219 If you still want to do that manually, you will need to know format of
220 C<pgest_trigger> function:
221
222 CREATE TRIGGER pgest_trigger_insert AFTER INSERT
223 ON table FOR EACH ROW
224 EXECUTE PROCEDURE pgest_trigger(
225 -- node URI, login and password
226 'http://localhost:1978/node/trivia', 'admin', 'admin',
227 -- name of primary key column
228 'id',
229 -- names of all other columns to index (one or more)
230 'column', 'another_one', 'and_another'
231 )
232
233 You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
234
235 =head1 Who wrote this?
236
237 Hyper Estraier is written by Mikio Hirabayashi.
238
239 PostgreSQL is written by hackers calling themselves PostgreSQL Global
240 Development Group.
241
242 This small C functions are written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.
243
244 =head1 See also
245
246 =over
247
248 =item *
249
250 L<Tutorial> - how to create first full-text index in under 10 minutes!
251
252 =item *
253
254 L<ChangeLog> - what has changed since last version
255
256 =item *
257
258 L<pgest-index> - helper script to create index and triggers
259
260 =item *
261
262 L<pgFoundry|http://pgfoundry.org/projects/pgestraier/> hosts home page of this project
263
264 =item *
265
266 L<Hyper Estraier user guide|http://hyperestraier.sourceforge.net/uguide-en.html#searchcond>
267 has a documentaton about query format. C<pgestraier> is using noraml queries (with
268 C<AND>, C<OR> etc.) and not simplified queryies (with C<|>).
269
270 =back

  ViewVC Help
Powered by ViewVC 1.1.26