/[webpac2]/trunk/lib/WebPAC/Manual.pod
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /trunk/lib/WebPAC/Manual.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 311 - (hide annotations)
Tue Dec 20 23:16:21 2005 UTC (18 years, 5 months ago) by dpavlin
File size: 4292 byte(s)
 r341@athlon:  dpavlin | 2005-12-21 00:17:34 +0100
 small documentation update

1 dpavlin 1 =head1 WebPAC - Search engine or data-warehouse manual
2    
3     It's quite hard to explain conceisly what webpac is. It's a mix between
4     search engine and data warehousing application. Let's see that in detail...
5    
6     WebPAC was originally written to search CDS/ISIS records using C<swish-e>.
7     Since then it has, however, adopted different other input formats and added
8     support for alphabetical lists (earlier described as indexes).
9    
10     With evolution of this concept, we decided to produce following work-flow
11     of your data:
12    
13     step
14    
15     source file CDS/ISIS, MARC, Excel, robots, ...
16     |
17 dpavlin 311 1 | apply input normalisation rules (xml or yaml)
18 dpavlin 1 V
19     intermidiate this data is re-formatted source data converted
20 dpavlin 311 data to chunks based on tag names from config/input/
21 dpavlin 1 |
22 dpavlin 311 2 | optionally apply output filter (TT2)
23 dpavlin 1 V
24     data search engine, HTML, OAI, RDBMS
25     |
26     3 | filter using query in REST format
27     4 | apply output filter (TT2)
28     V
29     client Web browser, SOAP
30    
31     =head2 Normalisation and Intermidiate data
32    
33     This is first step in working with your data.
34    
35     You are creating mappings, one-to-one from source data records to documents
36     in webpac. You can split or merge data from input records, apply filters
37     (perl subroutines), use lookups within same source file or do simple
38     evaluations while producing output.
39    
40 dpavlin 311 All that is controlled with C<config/input/> configuration file. You
41 dpavlin 8 will want to create fine-grained chunks of data (like separate first and
42     last name), which will later be used to produce output. You can think of
43 dpavlin 311 conversation process as application of C<config/input/> recepie on
44 dpavlin 8 every input record.
45 dpavlin 1
46     Each tag within recepie is creating one new records as long as there are
47     fields in input format (which can be repeatable) that satisfy at least one
48     field within tag.
49    
50     Users of older webpac should note that this file doesn't contain any more
51     formatting or specification of output type and that granularity of each tag
52     has increased.
53    
54 dpavlin 311 B<this document should really be updated to reflect Webpacus front-end from
55     this point...>
56    
57 dpavlin 1 =head2 Output filter
58    
59     Now that we have normalized record, we can create some output. You can create
60     html from it, data files for search engine or insert them into RDBMS.
61    
62     The twist is that application of output filters can be recursive, allowing
63     you to query data generated in previous step. This enables to you represent
64     lists or trees from source data that have structure. This also requires to
65     produce structured data in step 2 which can be filtered and queried in steps
66     3 and 4 to produce final output.
67    
68     You should note that you can query intermidiate data in step 4 also, not
69     just data produced in step 2.
70    
71     Output filter use Template Toolkit 2, so you have full power of simple
72     procedural language (loops, conditions) and handy built-in functions to
73     produce output.
74    
75     =head2 REST Query Format
76    
77     Design decision is to use REST query format. This has benefit of simplicity
78     and ability to create unique URLs to all content within webpac. Simple query
79     format is:
80    
81     http://webpac/search/html/personal_name/Joe%20Doe/AND/year/LT%201995
82    
83     This REST query can be broken down to:
84    
85     =over
86    
87     =item http://webpac
88    
89     Hostname on which service is running. Not required if doing lookups, just
90     for browser usage.
91    
92     =item search
93    
94     Name of output filtering methods. This will specify search engine.
95    
96     =item html
97    
98     Specified template that will be used to produce output.
99    
100     =item perlsonal_name/Joe%20Doe...
101    
102     URL encoded query string. It is specific to filtering method used.
103    
104     =back
105    
106     You can easily produce RSS feed for same query using follwing REST url:
107    
108     http://webpac/search/rss/personal_name/Joe%20Doe/AND/year/LT%201995
109    
110     Yes, it really is that simple. As it should be.
111    
112     =head1 Tehnical stuff
113    
114     Following text will be more hard-code tehnical stuff about how is webpac
115     implemented and why.
116    
117     =head2 Search Engine
118    
119     We are using Hyper Estraier search engine using pgestraier PostgreSQL bindings
120     for it.
121    
122     It should be relativly easy to plugin another one if need arise.
123    
124     =head2 Data Warehouse
125    
126     In a nutshell, webpac has evolved to support hybrid data as input. That
127     means it has become kind of data-warehouse application. It doesn't support
128     directly roll-up and roll-down operations, but they can be emulated using
129     intermidiate data step or output step.
130    

  ViewVC Help
Powered by ViewVC 1.1.26