/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 49 - (hide annotations)
Sat Oct 29 18:54:40 2005 UTC (18 years, 6 months ago) by dpavlin
File size: 5363 byte(s)
added depth to node API version of pgest, note that you have to use modified
perl wrapper with node API

1 dpavlin 33 =head1 pgestraier - search Hyper Estraier indexes from PostgreSQL
2 dpavlin 23
3     This package is essentially PostgreSQL C function which calls Hyper Estraier
4     API and returns results in user defined format.
5    
6 dpavlin 33 =head1 Why is it written?
7 dpavlin 23
8     Aside from providing single API to query your RDBMS and full text index
9     (using any language that has PostgreSQL client libraries), real power is
10     hidden in ability to join results from full text index and structured data
11     in RDBMS.
12    
13 dpavlin 33 =head1 How to install
14 dpavlin 23
15     Installation should be simple. However, you will have to have following
16     software already installed before you try this function:
17    
18 dpavlin 33 =over
19 dpavlin 23
20 dpavlin 33 =item *
21    
22 dpavlin 43 PostgreSQL (tested with versions 7.4 and 8.0) with development libraries
23 dpavlin 33
24     =item *
25    
26 dpavlin 49 Hyper Estraier (tested with 0.5.0-1.0.0+, version newer than 0.9.6 are
27     recommended)
28 dpavlin 33
29     =back
30    
31 dpavlin 23 To run tests you will also need:
32    
33 dpavlin 33 =over
34 dpavlin 23
35 dpavlin 33 =item *
36    
37     working perl installation
38    
39     =item *
40    
41     perl modules C<DBI>, C<DBD::Pg>, C<Test::More> and optionally C<HyperEstraier>
42    
43     =item *
44    
45 dpavlin 49 C<trivia.list.gz> from Internet Movie Database in C<data/> directory
46 dpavlin 33
47     =item *
48    
49 dpavlin 43 PostgreSQL database C<test> with permissions for current user
50 dpavlin 33
51 dpavlin 43 =item *
52    
53     Hyper Estraier node C<trivia> with permissions for C<admin> user.
54    
55 dpavlin 33 =back
56    
57 dpavlin 23 If you have all that, you should be able to type
58    
59     make
60    
61     and see sample results. You will be asked your password once (via sudo) to
62 dpavlin 43 install C<pgest.so> shared library in system-wide location so that PostgreSQL
63 dpavlin 23 could access it.
64    
65 dpavlin 29 Next, you will have to create test index. You have two options:
66    
67 dpavlin 33 =head2 Create index using estcmd
68 dpavlin 29
69     This will create temporary files on disk and index them using estcmd gather
70    
71     cd data
72     make index
73     cd ..
74    
75 dpavlin 42 B<Warning:> this method is incomplete and won't create node index needed
76     to run last examples in C<test.sql> correctly. Solution is simple: either
77     symlink your newly created index to Hyper Estraier C<_node> directory or
78     create node and fill re-create index using C<estcall>.
79    
80 dpavlin 33 =head2 Create index using Hyper Estraier perl bindings
81 dpavlin 29
82 dpavlin 49 Perl bindings for Hyper Estraier are available at
83 dpavlin 29
84 dpavlin 42 L<http://hyperestraier.sourceforge.net/binding/>
85 dpavlin 29
86 dpavlin 49 However, they don't support node API (yet), so you will have to use
87     my modified version which is available at
88     L<http://svn.rot13.org/> in C<hyperestraier_wrappers> repository.
89    
90 dpavlin 42 If you installed bindings as documented in README file, you can use
91 dpavlin 43 perl binding to create index about three times faster than using C<estcmd>
92     (to be fair, I must say that creation of intermediate files take most time,
93     not indexing).
94 dpavlin 29
95 dpavlin 43 However, you will first need to create node I<trivia> using Hyper Estraier's
96     administration interface at L<http://localhost:1978/masterui>. You will also
97     need user C<admin> with password C<admin> because those values are
98     hard-coded in C<indexer.pl>. If you want to use different user on index
99     name, feel free to change script.
100    
101 dpavlin 29 cd data
102     make perl
103     cd ..
104    
105 dpavlin 23 To run tests (which require that you have estcmd in your $PATH) issue
106    
107     make test
108    
109     See also included file test.sql for more examples of usage.
110    
111 dpavlin 42 =head1 Usage of pgest from SQL
112    
113     C<pgest> PostgreSQL function has two different prototypes (number of arguments) depending on usage.
114    
115     SELECT
116     -- columns to return (defined later)
117     id,title,size
118     FROM pgest(
119     -- path to index OR URL to node, user-name and password
120     -- you will need JUST ONE of following two lines, depending
121 dpavlin 49 -- on your usage described below, for direct access
122 dpavlin 42 '/full/path/to/casket',
123 dpavlin 49 -- or for node API specify node URI, login, password
124     -- and depth of search
125     'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
126 dpavlin 42 -- query
127     'blade runner',
128     -- additional attributes, use NULL or '' to disable
129     -- multiple attributes conditions can be separated by {{!}}
130     '@title ISTRINC blade',
131     -- order results by
132     '@title STRA',
133     -- limit, use NULL or 0 to disable
134     null,
135     -- offset, use NULL or 0 to disable
136     null,
137     -- attributes to return as columns
138     ARRAY['@id','@title','@size']
139     ) AS (
140     -- specify names and types of returned attributes
141     id text, title text, size text
142     );
143    
144     =head2 Accessing database directly
145    
146     If you want to access database directly (without running C<estmaster> process), first argument is full path to database file.
147    
148     Have in mind that C<postgres> user under which PostgreSQL is running must
149     have read permission on Hyper Estraier database files.
150    
151     This will work a bit faster on really small indexes. However, when your
152     index grows bigger, you might consider using node API to remove overhead of
153     database opening on each query.
154    
155     =head2 Using index via C<estmaster> server process
156    
157     If first argument is URL to node (like C<http://localhost:1978/node/trivia>)
158     and there are two additional parameters (user-name and password) after it,
159     C<pgest> will use node API and access index through C<estmaster> process which should be running on (local or remote) machine.
160    
161     This will remove database opening overhead, at a cost of additional network
162     traffic. However, you can have Hyper Estraier C<estmaster> process running on
163     different machine or update index while doing searches, so benefits of this
164     approach are obvious.
165    
166 dpavlin 33 =head1 Who wrote this?
167 dpavlin 23
168     Hyper Estraier is written by Mikio Hirabayashi.
169    
170 dpavlin 42 Perl bindings for Hyper Estraier are written by MATSUNO Tokuhiro.
171    
172 dpavlin 23 PostgreSQL is written by hackers calling themselves PostgreSQL Global
173     Development Group.
174    
175     This small C function is written by Dobrica Pavlinusic, dpavlin@rot13.org.

  ViewVC Help
Powered by ViewVC 1.1.26