/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 64 - (hide annotations)
Mon Aug 7 14:32:21 2006 UTC (17 years, 8 months ago) by dpavlin
File size: 7044 byte(s)
improved documentation, now this project is full-text index for PostgreSQL and not just
search function :-)
1 dpavlin 64 =head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
2 dpavlin 23
3 dpavlin 64 This package is essentially composed of two different parts:
4 dpavlin 23
5 dpavlin 64 =over 4
6    
7     =item search function
8    
9     PostgreSQL function to search Hyper Estraier full-text index, using
10     full-text queries and attribute filtering to return user-specified
11     table of results.
12    
13     This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
14     functionality much faster than using those SQL constructs on search
15     results.
16    
17     =item trigger function
18    
19     PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
20     It triggers after insert, update or delete and update full-text index
21     accordingly.
22    
23     =back
24    
25     Both functions are written in C, while test framework and supporting
26     utilities are written in perl.
27    
28     You can use just one of those functions. If you want just to search existing
29     Hyper Estraier index or generate it off-line (after nightly batch jobs, for
30     example), just use search function.
31    
32     On the other hand, if you want just to keep your Hyper Estraier index in
33     sync with PostgreSQL data, you can use just trigger function to achieve that.
34    
35 dpavlin 33 =head1 Why is it written?
36 dpavlin 23
37 dpavlin 64 Aside from providing single query language (SQL) to RDBMS and full text index
38 dpavlin 23 (using any language that has PostgreSQL client libraries), real power is
39     hidden in ability to join results from full text index and structured data
40     in RDBMS.
41    
42 dpavlin 59 For simple real-life example which address problem
43 dpavlin 64 C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
44     is slow see L<Tutorial>.
45 dpavlin 59
46 dpavlin 33 =head1 How to install
47 dpavlin 23
48     Installation should be simple. However, you will have to have following
49 dpavlin 64 software already installed before you try this functions:
50 dpavlin 23
51 dpavlin 33 =over
52 dpavlin 23
53 dpavlin 33 =item *
54    
55 dpavlin 64 PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
56 dpavlin 33
57     =item *
58    
59 dpavlin 64 Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
60     with development headers
61 dpavlin 33
62 dpavlin 64 =item *
63    
64     working C compiler (tested with gcc)
65    
66 dpavlin 33 =back
67    
68 dpavlin 64 If you want to use helper script to create consistency triggers to keep
69     Hyper Estraier in sync with PostgreSQL database, you will also need:
70 dpavlin 23
71 dpavlin 33 =over
72 dpavlin 23
73 dpavlin 33 =item *
74    
75     working perl installation
76    
77     =item *
78    
79 dpavlin 64 perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
80 dpavlin 33
81 dpavlin 64 =back
82    
83     To run tests you will also need:
84    
85     =over
86    
87 dpavlin 33 =item *
88    
89 dpavlin 64 perl module C<Test::More>
90    
91     =item *
92    
93 dpavlin 49 C<trivia.list.gz> from Internet Movie Database in C<data/> directory
94 dpavlin 33
95     =item *
96    
97 dpavlin 43 PostgreSQL database C<test> with permissions for current user
98 dpavlin 33
99 dpavlin 43 =item *
100    
101 dpavlin 59 Hyper Estraier C<estmaster> running with permissions for C<admin> user
102     to create C<trivia> node.
103 dpavlin 43
104 dpavlin 33 =back
105    
106 dpavlin 23 If you have all that, you should be able to type
107    
108     make
109    
110     and see sample results. You will be asked your password once (via sudo) to
111 dpavlin 43 install C<pgest.so> shared library in system-wide location so that PostgreSQL
112 dpavlin 23 could access it.
113    
114 dpavlin 54 =head2 Create sample index using Hyper Estraier perl bindings
115 dpavlin 29
116 dpavlin 54 Perl bindings for Hyper Estraier are available at CPAN:
117 dpavlin 29
118 dpavlin 54 L<http://search.cpan.org/~dpavlin/Search-Estraier/>
119 dpavlin 29
120    
121 dpavlin 54 After installing C<Search::Estraier> you can create index using following commands:
122 dpavlin 42
123 dpavlin 29 cd data
124 dpavlin 54 make index
125 dpavlin 29 cd ..
126    
127 dpavlin 23 To run tests (which require that you have estcmd in your $PATH) issue
128    
129     make test
130    
131 dpavlin 54 See also included file C<test.sql> for more examples of usage.
132 dpavlin 23
133 dpavlin 64 =head1 Usage of search function pgest from SQL
134 dpavlin 42
135 dpavlin 54 C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
136 dpavlin 42
137     SELECT
138     -- columns to return (defined later)
139     id,title,size
140     FROM pgest(
141 dpavlin 54 -- node URI, login, password and depth of search
142 dpavlin 49 'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
143 dpavlin 42 -- query
144     'blade runner',
145     -- additional attributes, use NULL or '' to disable
146     -- multiple attributes conditions can be separated by {{!}}
147     '@title ISTRINC blade',
148     -- order results by
149     '@title STRA',
150     -- limit, use NULL or 0 to disable
151     null,
152     -- offset, use NULL or 0 to disable
153     null,
154     -- attributes to return as columns
155     ARRAY['@id','@title','@size']
156     ) AS (
157     -- specify names and types of returned attributes
158     id text, title text, size text
159     );
160    
161 dpavlin 54 You should note that Hyper Estraier uses UTF-8 encoding, while your
162     PostgreSQL installation might use different encoding. To fix that, use
163     C<convert> function in PostgreSQL to convert encodings.
164    
165     =head2 Using index via C<estmaster> server process
166    
167     This is default and recommended way to use C<pgest> functionality. In this
168     case, C<pgest> will use node API and access index through C<estmaster>
169     process which should be running on (local or remote) machine.
170    
171     This will remove database opening overhead, at a cost of (small) additional network
172     traffic. However, you can have Hyper Estraier C<estmaster> process running on
173     different machine or update index while doing searches, so benefits of this
174     approach are obvious.
175    
176 dpavlin 42 =head2 Accessing database directly
177    
178 dpavlin 54 B<Please note that direct access to database is depreciated.> As such, it's
179     not stated in example, and it's kept just for backward compatibility, but it
180     will probably be removed in future versions of C<pgest>.
181 dpavlin 42
182 dpavlin 54 If you want to access database directly (without running C<estmaster> process), you
183     have to replace node URI, login, password and depth with full path to database file.
184    
185 dpavlin 42 Have in mind that C<postgres> user under which PostgreSQL is running must
186     have read permission on Hyper Estraier database files.
187    
188     This will work a bit faster on really small indexes. However, when your
189     index grows bigger, you might consider using node API to remove overhead of
190     database opening on each query.
191    
192 dpavlin 64 =head1 Usage of trigger function pgest_trigger from SQL
193    
194     Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
195     create triggers because it already supports following steps automatically:
196    
197     =over
198    
199     =item begin transaction
200    
201     Transaction is needed to catch updates which might happen while creation
202     of full-text index is in progress (and on huge collections this can take a while,
203     just like normal index creation in PostgreSQL).
204    
205     =item insert all existing data in full-text index
206    
207     This will be done directly from PostgreSQL database to Hyper Estraier index.
208     This is somewhat faster than waiting for trigger to fire for each existing
209     row.
210    
211     =item create insert, update and delete triggers
212    
213     Which will keep data in sync later
214    
215     =item commit transaction
216    
217     =back
218    
219     If you still want to do that manually, you will need to know format of
220     C<pgest_trigger> function:
221    
222     CREATE TRIGGER pgest_trigger_insert AFTER INSERT
223     ON table FOR EACH ROW
224     EXECUTE PROCEDURE pgest_trigger(
225     -- node URI, login and password
226     'http://localhost:1978/node/trivia', 'admin', 'admin',
227     -- name of primary key column
228     'id',
229     -- names of all other columns to index (one or more)
230     'column', 'another_one', 'and_another'
231     )
232    
233     You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
234    
235 dpavlin 33 =head1 Who wrote this?
236 dpavlin 23
237     Hyper Estraier is written by Mikio Hirabayashi.
238    
239     PostgreSQL is written by hackers calling themselves PostgreSQL Global
240     Development Group.
241    
242 dpavlin 60 This small C function is written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.

  ViewVC Help
Powered by ViewVC 1.1.26