12 |
|
|
13 |
step |
step |
14 |
|
|
15 |
source file CDS/ISIS, MARC, Excel, robots, ... |
source data CDS/ISIS, MARC, Excel, robots, ... |
16 |
| |
| |
17 |
|
0 | apply lookup rules (optional) |
18 |
1 | apply input normalisation rules (xml or yaml) |
1 | apply input normalisation rules (xml or yaml) |
19 |
V |
V |
20 |
intermidiate this data is re-formatted source data converted |
intermidiate this data is re-formatted source data converted |
21 |
data to chunks based on tag names from config/input/ |
data to chunks based on tag names from config/input/ |
22 |
| |
| |
23 |
2 | optionally apply output filter (TT2) |
2 | optionally apply output filter (TT2) |
24 |
V |
V |
25 |
data search engine, HTML, OAI, RDBMS |
data search engine, HTML, OAI, RDBMS |
26 |
| |
| |
27 |
3 | filter using query in REST format |
3 | filter using query in REST format |
28 |
4 | apply output filter (TT2) |
4 | apply output filter (TT2) |
29 |
V |
V |
30 |
client Web browser, SOAP |
client Web browser (html), JSON |
31 |
|
|
32 |
=head2 Normalisation and Intermidiate data |
=head2 Source data |
33 |
|
|
34 |
This is first step in working with your data. |
WebPAC supports various input formats: |
35 |
|
|
36 |
|
=over 2 |
37 |
|
|
38 |
|
=item L<WebPAC::Input::ISIS> CDS/ISIS data |
39 |
|
|
40 |
|
=item L<WebPAC::Input::MARC> for MARC records |
41 |
|
|
42 |
|
=item L<WebPAC::Input::Excel> Microsoft Excel C<.xls> support |
43 |
|
|
44 |
|
=item L<WebPAC::Input::DBF> support legacy tables (e.g. Clipper) |
45 |
|
|
46 |
|
=item L<WebPAC::Input::Gutemberg> for RDF catalog data from Project Gutenberg |
47 |
|
|
48 |
|
=back |
49 |
|
|
50 |
|
=head2 Create data lookups |
51 |
|
|
52 |
|
Before you can begin normalisation, you might want to create lookups which store |
53 |
|
C<< key -> value(s) >> pair(s). Lookups are especially useful if you want to |
54 |
|
I<well> lookup value of some other record using some sort of identifier. |
55 |
|
|
56 |
|
Lookup are described in more details in L<WebPAC::Lookup>. |
57 |
|
|
58 |
|
=head2 Normalisation to intermidiate data |
59 |
|
|
60 |
|
Intermidiate data is internal representation of data on which WebPAC operates. |
61 |
|
|
62 |
You are creating mappings, one-to-one from source data records to documents |
You are creating mappings, one-to-one from source data records to documents |
63 |
in webpac. You can split or merge data from input records, apply filters |
in WebPAC. You can split or merge data from input records, apply regexes, |
64 |
(perl subroutines), use lookups within same source file or do simple |
use lookups within same source file, do conditions, branches and/or |
65 |
evaluations while producing output. |
simple evaluations while producing intermidiate data. |
66 |
|
|
67 |
|
All that is controlled with C<config/config.yml> configuration file. |
68 |
|
This file is in human-readable YAML format, and it describes all configuration of |
69 |
|
WebPAC and it's front-end Webpacus. |
70 |
|
|
71 |
|
|
72 |
All that is controlled with C<config/input/> configuration file. You |
All that is controlled with C<config/input/> configuration files. You |
73 |
will want to create fine-grained chunks of data (like separate first and |
will want to create fine-grained chunks of data (like separate first and |
74 |
last name), which will later be used to produce output. You can think of |
last name), which will later be used to produce output. You can think of |
75 |
conversation process as application of C<config/input/> recepie on |
conversation process as application of C<config/input/> recepie on |