/[pgestraier]/trunk/README.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /trunk/README.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 60 - (hide annotations)
Tue Jul 11 14:11:42 2006 UTC (17 years, 9 months ago) by dpavlin
File size: 4545 byte(s)
make it in sync with relity
1 dpavlin 33 =head1 pgestraier - search Hyper Estraier indexes from PostgreSQL
2 dpavlin 23
3     This package is essentially PostgreSQL C function which calls Hyper Estraier
4     API and returns results in user defined format.
5    
6 dpavlin 33 =head1 Why is it written?
7 dpavlin 23
8     Aside from providing single API to query your RDBMS and full text index
9     (using any language that has PostgreSQL client libraries), real power is
10     hidden in ability to join results from full text index and structured data
11     in RDBMS.
12    
13 dpavlin 59 For simple real-life example which address problem
14     I<where like '%foo%' is slow>
15     see L<Tutorial>.
16    
17 dpavlin 33 =head1 How to install
18 dpavlin 23
19     Installation should be simple. However, you will have to have following
20     software already installed before you try this function:
21    
22 dpavlin 33 =over
23 dpavlin 23
24 dpavlin 33 =item *
25    
26 dpavlin 43 PostgreSQL (tested with versions 7.4 and 8.0) with development libraries
27 dpavlin 33
28     =item *
29    
30 dpavlin 54 Hyper Estraier (tested with various versions, recommended 1.2.4 of newer)
31 dpavlin 33
32     =back
33    
34 dpavlin 23 To run tests you will also need:
35    
36 dpavlin 33 =over
37 dpavlin 23
38 dpavlin 33 =item *
39    
40     working perl installation
41    
42     =item *
43    
44 dpavlin 54 perl modules C<DBI>, C<DBD::Pg>, C<Test::More> and optionally C<Search::Estraier>
45 dpavlin 33
46     =item *
47    
48 dpavlin 49 C<trivia.list.gz> from Internet Movie Database in C<data/> directory
49 dpavlin 33
50     =item *
51    
52 dpavlin 43 PostgreSQL database C<test> with permissions for current user
53 dpavlin 33
54 dpavlin 43 =item *
55    
56 dpavlin 59 Hyper Estraier C<estmaster> running with permissions for C<admin> user
57     to create C<trivia> node.
58 dpavlin 43
59 dpavlin 33 =back
60    
61 dpavlin 23 If you have all that, you should be able to type
62    
63     make
64    
65     and see sample results. You will be asked your password once (via sudo) to
66 dpavlin 43 install C<pgest.so> shared library in system-wide location so that PostgreSQL
67 dpavlin 23 could access it.
68    
69 dpavlin 54 =head2 Create sample index using Hyper Estraier perl bindings
70 dpavlin 29
71 dpavlin 54 Perl bindings for Hyper Estraier are available at CPAN:
72 dpavlin 29
73 dpavlin 54 L<http://search.cpan.org/~dpavlin/Search-Estraier/>
74 dpavlin 29
75    
76 dpavlin 54 After installing C<Search::Estraier> you can create index using following commands:
77 dpavlin 42
78 dpavlin 29 cd data
79 dpavlin 54 make index
80 dpavlin 29 cd ..
81    
82 dpavlin 23 To run tests (which require that you have estcmd in your $PATH) issue
83    
84     make test
85    
86 dpavlin 54 See also included file C<test.sql> for more examples of usage.
87 dpavlin 23
88 dpavlin 42 =head1 Usage of pgest from SQL
89    
90 dpavlin 54 C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
91 dpavlin 42
92     SELECT
93     -- columns to return (defined later)
94     id,title,size
95     FROM pgest(
96 dpavlin 54 -- node URI, login, password and depth of search
97 dpavlin 49 'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
98 dpavlin 42 -- query
99     'blade runner',
100     -- additional attributes, use NULL or '' to disable
101     -- multiple attributes conditions can be separated by {{!}}
102     '@title ISTRINC blade',
103     -- order results by
104     '@title STRA',
105     -- limit, use NULL or 0 to disable
106     null,
107     -- offset, use NULL or 0 to disable
108     null,
109     -- attributes to return as columns
110     ARRAY['@id','@title','@size']
111     ) AS (
112     -- specify names and types of returned attributes
113     id text, title text, size text
114     );
115    
116 dpavlin 54 You should note that Hyper Estraier uses UTF-8 encoding, while your
117     PostgreSQL installation might use different encoding. To fix that, use
118     C<convert> function in PostgreSQL to convert encodings.
119    
120     =head2 Using index via C<estmaster> server process
121    
122     This is default and recommended way to use C<pgest> functionality. In this
123     case, C<pgest> will use node API and access index through C<estmaster>
124     process which should be running on (local or remote) machine.
125    
126     This will remove database opening overhead, at a cost of (small) additional network
127     traffic. However, you can have Hyper Estraier C<estmaster> process running on
128     different machine or update index while doing searches, so benefits of this
129     approach are obvious.
130    
131 dpavlin 42 =head2 Accessing database directly
132    
133 dpavlin 54 B<Please note that direct access to database is depreciated.> As such, it's
134     not stated in example, and it's kept just for backward compatibility, but it
135     will probably be removed in future versions of C<pgest>.
136 dpavlin 42
137 dpavlin 54 If you want to access database directly (without running C<estmaster> process), you
138     have to replace node URI, login, password and depth with full path to database file.
139    
140 dpavlin 42 Have in mind that C<postgres> user under which PostgreSQL is running must
141     have read permission on Hyper Estraier database files.
142    
143     This will work a bit faster on really small indexes. However, when your
144     index grows bigger, you might consider using node API to remove overhead of
145     database opening on each query.
146    
147 dpavlin 33 =head1 Who wrote this?
148 dpavlin 23
149     Hyper Estraier is written by Mikio Hirabayashi.
150    
151     PostgreSQL is written by hackers calling themselves PostgreSQL Global
152     Development Group.
153    
154 dpavlin 60 This small C function is written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.

  ViewVC Help
Powered by ViewVC 1.1.26