1 |
dpavlin |
222 |
How to lookup some value in my output? |
2 |
|
|
|
3 |
|
|
|
4 |
|
|
You might want to use these feature if you try to display something that is |
5 |
|
|
related to current record. |
6 |
|
|
|
7 |
|
|
All lookups are modelled around key => value(s) idea, so you can store any |
8 |
|
|
value attached to unique key value. Both of those values can have fields for |
9 |
|
|
any import formats or fixed values (delimiters, prefixes etc.) |
10 |
|
|
|
11 |
|
|
First, it's important that database that have to create key => value data |
12 |
|
|
must be specified before database that uses those values in all2xml.conf. |
13 |
|
|
|
14 |
|
|
Second, that usually means that you will have to have two database |
15 |
|
|
configurations in all2xml.conf which point to same database if you want to |
16 |
|
|
lookup records from same database. I would suggest to have two import_xml/ |
17 |
|
|
files, one which just store lookup key and values (and thus is faster |
18 |
|
|
executed) and another that creates output for swish and indexer which just |
19 |
|
|
use lookup. |
20 |
|
|
|
21 |
|
|
|
22 |
|
|
1. Lookup to other database (using type="lookup_key" and lookup="1") |
23 |
|
|
|
24 |
|
|
For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which |
25 |
|
|
have unique identifiers in field 900 and we want those term for display. |
26 |
|
|
|
27 |
|
|
Bibliographic database (import_xml/isis_hidra_bib.xml) have just field |
28 |
|
|
which has field 900 from entry in thesaurus. While that's enough to create |
29 |
|
|
links in search results (using links and format, see doc/links.txt) we would |
30 |
|
|
like to display term from thesaurus and not value of field 900. |
31 |
|
|
|
32 |
|
|
In first step, we store fields from thesaurus (as value) that relates to |
33 |
|
|
field 900 for that entry (which is key) using following XML (in |
34 |
|
|
import_xml/isis_hidra.ths.xml): |
35 |
|
|
|
36 |
|
|
<IDths name="ID" order="300"> |
37 |
|
|
<isis type="lookup_key">900</isis> |
38 |
|
|
</IDths> |
39 |
|
|
|
40 |
|
|
<SubjectIndex name="Predmetno kazalo" order="301"> |
41 |
|
|
<isis type="lookup_val">[5624] 562a</isis> |
42 |
|
|
</SubjectIndex> |
43 |
|
|
|
44 |
|
|
This will create lookup which you might write like this: |
45 |
|
|
|
46 |
|
|
900 => "[5624] 562a" |
47 |
|
|
|
48 |
|
|
Quotes are added to denote that value is single entry. |
49 |
|
|
We also have to specify in all2xml.conf something like: |
50 |
|
|
|
51 |
|
|
lookup_newfile=/data/webpac/thes.lookup |
52 |
|
|
|
53 |
|
|
Which will create new lookup file. |
54 |
|
|
|
55 |
|
|
For bibliographic database which will do lookups into previously created file, |
56 |
|
|
all2xml.conf must have: |
57 |
|
|
|
58 |
|
|
lookup_open=/data/webpac/thes.lookup |
59 |
|
|
|
60 |
|
|
and then in import_xml/ we use: |
61 |
|
|
|
62 |
|
|
<isis lookup="1">6013</isis> |
63 |
|
|
|
64 |
|
|
Value of field 6103 must match exactly to field 900 (which is key) from |
65 |
|
|
thesaurus. You can however add arbitrary prefix or suffix to store unrelated |
66 |
|
|
keys in values in same lookup. |
67 |
|
|
|
68 |
|
|
|
69 |
|
|
1.1 NOTE about memory usage: |
70 |
|
|
|
71 |
|
|
This lookups are created on disk. Default configuration also creates |
72 |
|
|
memory cache for faster indexing which you can turn off by changing line |
73 |
|
|
|
74 |
|
|
my $use_lhash_cache = 1; |
75 |
|
|
|
76 |
|
|
in all2xml.pl to |
77 |
|
|
|
78 |
|
|
my $use_lhash_cache = 0; |
79 |
|
|
|
80 |
|
|
You won't probably need to do that so, it's not configuration option. |
81 |
|
|
|
82 |
|
|
|
83 |
|
|
2. Lookup that has to store more than one value |
84 |
|
|
|
85 |
|
|
While lookups described above are sufficient when you want to store just one |
86 |
|
|
value associated with one key, they don't quite help us if we need to have |
87 |
|
|
more than one value for each key. |
88 |
|
|
|
89 |
|
|
Typical example of that might be displaying of narrower terms in thesaurus. |
90 |
|
|
Each narrower term have id of parent term (which is enough to display |
91 |
|
|
narrower term), but we would like to display all brother terms with each |
92 |
|
|
term also. |
93 |
|
|
|
94 |
|
|
So, we'll store under key of parent term all keys of terms which are brother. |
95 |
|
|
But, we would also like to display terms and not term numbers. That requests |
96 |
|
|
first to find all brother terms (which is lookup returning one or more term ids) |
97 |
|
|
and than lookup names of those returned terms for display. |
98 |
|
|
|
99 |
|
|
It's usually called indirect lookup, and is much hated by CS majors in their |
100 |
|
|
freshman year. Later, it becomes so natural that you think it's the only way |
101 |
|
|
to solve problem. So, you are stuck with it :-) |
102 |
|
|
|
103 |
|
|
Since lookups can return more than one value, and we would like to use format |
104 |
|
|
to create links, this lookup is implemented like filter="mem_lookup". Let's |
105 |
|
|
look at example. |
106 |
|
|
|
107 |
|
|
<LookupThesNT name="lookup for thesaurus narrow term"> |
108 |
|
|
<!-- |
109 |
|
|
Store value of field 250a (for display) in key composed |
110 |
|
|
of prefix "d:" and value of field 900. |
111 |
|
|
This is one key - one value lookup. |
112 |
|
|
--> |
113 |
|
|
<isis filter="mem_lookup" type="display">d:900 => 250a</isis> |
114 |
|
|
|
115 |
|
|
<!-- |
116 |
|
|
Now, for each entry generate parent ID (using fields |
117 |
|
|
5614, 5624, 4611 add prefix "a:" to it as a key) |
118 |
|
|
and value of field 900 for value. |
119 |
|
|
That will create lookup which can (and will) have |
120 |
|
|
more than one value for each key (because parent |
121 |
|
|
term have more than one child). |
122 |
|
|
--> |
123 |
|
|
<isis filter="mem_lookup" type="display">a:5614:5624:4611 => 900</isis> |
124 |
|
|
|
125 |
|
|
</LookupThesNT> |
126 |
|
|
|
127 |
|
|
So, after we index database with import_xml which have mem_lookup filter (which won't |
128 |
|
|
create any output to swish or index) we have just two lookups stored in memory (that's |
129 |
|
|
where name mem_lookup comes from): |
130 |
|
|
|
131 |
|
|
d:900 => 250a |
132 |
|
|
|
133 |
|
|
a:5614:5624:4611 => 900 900 900 900 900 ... |
134 |
|
|
|
135 |
|
|
Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or |
136 |
|
|
a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614, |
137 |
|
|
and descriptors have 5614 and 5624 or all of them, depending on level). |
138 |
|
|
|
139 |
|
|
Now, let's display some of those lookups. |
140 |
|
|
|
141 |
|
|
First, we can display all ids of fields which are child to field 251: |
142 |
|
|
|
143 |
|
|
<isis type="display" filter="mem_looku">[a:251]</isis> |
144 |
|
|
|
145 |
|
|
That's not very useful, because we would like to display terms, and not |
146 |
|
|
ids, possibly separated by " * ". |
147 |
|
|
|
148 |
|
|
<isis type="display" filter="mem_lookup" delimiter=" * ">[d:[a:251]]</isis> |
149 |
|
|
|
150 |
|
|
That's great. But, let's link those fields using format: |
151 |
|
|
|
152 |
|
|
<format name="IDths"><![CDATA[ |
153 |
|
|
<a href="?rm=results&show_full=1&f=IDths&v=%s">%s</a> |
154 |
|
|
]]></format> |
155 |
|
|
|
156 |
|
|
<isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">[a:251];;[d:[a:251]]</isis> |
157 |
|
|
|
158 |
|
|
|
159 |
|
|
There is only one problem left. Since we want to display just child records |
160 |
|
|
from current record, we have to use three different tags to display child |
161 |
|
|
records (for field, micro-thesaurus and term). However, that means that |
162 |
|
|
term will display also all child fields and child micro-thesaurus terms which |
163 |
|
|
isn't what's needed. |
164 |
|
|
|
165 |
|
|
But, each record has also it's own level written in 901a, so we can filter |
166 |
|
|
just correct child entries using something like: |
167 |
|
|
|
168 |
|
|
<isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]]</isis> |
169 |
|
|
|