ffzg/doc/lookup.txt

How to lookup some value in my output?


You might want to use these feature if you try to display something that is
related to current record.

All lookups are modelled around key => value(s) idea, so you can store any
value attached to unique key value. Both of those values can have fields for
any import formats or fixed values (delimiters, prefixes etc.)

First, it's important that database that have to create key => value data
must be specified before database that uses those values in all2xml.conf.

Second, that usually means that you will have to have two database
configurations in all2xml.conf which point to same database if you want to
lookup records from same database. I would suggest to have two import_xml/
files, one which just store lookup key and values (and thus is faster
executed) and another that creates output for swish and indexer which just
use lookup.


1. Lookup to other database (using type="lookup_key" and lookup="1")

For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which
have unique identifiers in field 900 and we want those term for display.

Bibliographic database (import_xml/isis_hidra_bib.xml) have just field
which has field 900 from entry in thesaurus. While that's enough to create
links in search results (using links and format, see doc/links.txt) we would
like to display term from thesaurus and not value of field 900.

In first step, we store fields from thesaurus (as value) that relates to
field 900 for that entry (which is key) using following XML (in
import_xml/isis_hidra.ths.xml):

        <IDths name="ID" order="300">
                <isis type="lookup_key">900</isis>
        </IDths>

        <SubjectIndex name="Predmetno kazalo" order="301">
                <isis type="lookup_val">[5624] 562a</isis>
        </SubjectIndex>

This will create lookup which you might write like this:

        900 => "[5624] 562a"

Quotes are added to denote that value is single entry.
We also have to specify in all2xml.conf something like:

        lookup_newfile=/data/webpac/thes.lookup

Which will create new lookup file.

For bibliographic database which will do lookups into previously created file,
all2xml.conf must have:

        lookup_open=/data/webpac/thes.lookup

and then in import_xml/ we use:

        <isis lookup="1">6013</isis>

Value of field 6103 must match exactly to field 900 (which is key) from
thesaurus. You can however add arbitrary prefix or suffix to store unrelated
keys in values in same lookup.


1.1 NOTE about memory usage:

This lookups are created on disk. Default configuration also creates
memory cache for faster indexing which you can turn off by changing line

 my $use_lhash_cache = 1;

in all2xml.pl to

 my $use_lhash_cache = 0;

You won't probably need to do that so, it's not configuration option.


2. Lookup that has to store more than one value

While lookups described above are sufficient when you want to store just one
value associated with one key, they don't quite help us if we need to have
more than one value for each key.

Typical example of that might be displaying of narrower terms in thesaurus.
Each narrower term have id of parent term (which is enough to display
narrower term), but we would like to display all brother terms with each
term also.

So, we'll store under key of parent term all keys of terms which are brother.
But, we would also like to display terms and not term numbers. That requests
first to find all brother terms (which is lookup returning one or more term ids)
and than lookup names of those returned terms for display.

It's usually called indirect lookup, and is much hated by CS majors in their
freshman year. Later, it becomes so natural that you think it's the only way
to solve problem. So, you are stuck with it :-)

Since lookups can return more than one value, and we would like to use format
to create links, this lookup is implemented like filter="mem_lookup". Let's
look at example.

        <LookupThesNT name="lookup for thesaurus narrow term">
                <!--
                        Store value of field 250a (for display) in key composed
                        of prefix "d:" and value of field 900.
                        This is one key - one value lookup.
                -->
                <isis filter="mem_lookup" type="display">d:900 => 250a</isis>

                <!--
                        Now, for each entry generate parent ID (using fields
                        5614, 5624, 4611 add prefix "a:" to it as a key)
                        and value of field 900 for value.
                        That will create lookup which can (and will) have
                        more than one value for each key (because parent
                        term have more than one child).
                -->
                <isis filter="mem_lookup" type="display">a:5614:5624:4611 => 900</isis>

        </LookupThesNT>

So, after we index database with import_xml which have mem_lookup filter (which won't
create any output to swish or index) we have just two lookups stored in memory (that's
where name mem_lookup comes from):

        d:900 => 250a

        a:5614:5624:4611 => 900 900 900 900 900 ...

Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or
a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614,
and descriptors have 5614 and 5624 or all of them, depending on level).

Now, let's display some of those lookups.

First, we can display all ids of fields which are child to field 251:

        <isis type="display" filter="mem_looku">[a:251]</isis>

That's not very useful, because we would like to display terms, and not
ids, possibly separated by " * ".

        <isis type="display" filter="mem_lookup" delimiter=" * ">[d:[a:251]]</isis>

That's great. But, let's link those fields using format:

        <format name="IDths"><![CDATA[
                <a href="?rm=results&show_full=1&f=IDths&v=%s">%s</a>
        ]]></format>

        <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">[a:251];;[d:[a:251]]</isis>


There is only one problem left. Since we want to display just child records
from current record, we have to use three different tags to display child
records (for field, micro-thesaurus and term). However, that means that
term will display also all child fields and child micro-thesaurus terms which
isn't what's needed.

But, each record has also it's own level written in 901a, so we can filter
just correct child entries using something like:

        <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]]</isis>

1	dpavlin	222	How to lookup some value in my output?
2
3
4			You might want to use these feature if you try to display something that is
5			related to current record.
6
7			All lookups are modelled around key => value(s) idea, so you can store any
8			value attached to unique key value. Both of those values can have fields for
9			any import formats or fixed values (delimiters, prefixes etc.)
10
11			First, it's important that database that have to create key => value data
12			must be specified before database that uses those values in all2xml.conf.
13
14			Second, that usually means that you will have to have two database
15			configurations in all2xml.conf which point to same database if you want to
16			lookup records from same database. I would suggest to have two import_xml/
17			files, one which just store lookup key and values (and thus is faster
18			executed) and another that creates output for swish and indexer which just
19			use lookup.
20
21
22			1. Lookup to other database (using type="lookup_key" and lookup="1")
23
24			For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which
25			have unique identifiers in field 900 and we want those term for display.
26
27			Bibliographic database (import_xml/isis_hidra_bib.xml) have just field
28			which has field 900 from entry in thesaurus. While that's enough to create
29			links in search results (using links and format, see doc/links.txt) we would
30			like to display term from thesaurus and not value of field 900.
31
32			In first step, we store fields from thesaurus (as value) that relates to
33			field 900 for that entry (which is key) using following XML (in
34			import_xml/isis_hidra.ths.xml):
35
36			<IDths name="ID" order="300">
37			<isis type="lookup_key">900</isis>
38			</IDths>
39
40			<SubjectIndex name="Predmetno kazalo" order="301">
41			<isis type="lookup_val">[5624] 562a</isis>
42			</SubjectIndex>
43
44			This will create lookup which you might write like this:
45
46			900 => "[5624] 562a"
47
48			Quotes are added to denote that value is single entry.
49			We also have to specify in all2xml.conf something like:
50
51			lookup_newfile=/data/webpac/thes.lookup
52
53			Which will create new lookup file.
54
55			For bibliographic database which will do lookups into previously created file,
56			all2xml.conf must have:
57
58			lookup_open=/data/webpac/thes.lookup
59
60			and then in import_xml/ we use:
61
62			<isis lookup="1">6013</isis>
63
64			Value of field 6103 must match exactly to field 900 (which is key) from
65			thesaurus. You can however add arbitrary prefix or suffix to store unrelated
66			keys in values in same lookup.
67
68
69			1.1 NOTE about memory usage:
70
71			This lookups are created on disk. Default configuration also creates
72			memory cache for faster indexing which you can turn off by changing line
73
74			my $use_lhash_cache = 1;
75
76			in all2xml.pl to
77
78			my $use_lhash_cache = 0;
79
80			You won't probably need to do that so, it's not configuration option.
81
82
83			2. Lookup that has to store more than one value
84
85			While lookups described above are sufficient when you want to store just one
86			value associated with one key, they don't quite help us if we need to have
87			more than one value for each key.
88
89			Typical example of that might be displaying of narrower terms in thesaurus.
90			Each narrower term have id of parent term (which is enough to display
91			narrower term), but we would like to display all brother terms with each
92			term also.
93
94			So, we'll store under key of parent term all keys of terms which are brother.
95			But, we would also like to display terms and not term numbers. That requests
96			first to find all brother terms (which is lookup returning one or more term ids)
97			and than lookup names of those returned terms for display.
98
99			It's usually called indirect lookup, and is much hated by CS majors in their
100			freshman year. Later, it becomes so natural that you think it's the only way
101			to solve problem. So, you are stuck with it :-)
102
103			Since lookups can return more than one value, and we would like to use format
104			to create links, this lookup is implemented like filter="mem_lookup". Let's
105			look at example.
106
107			<LookupThesNT name="lookup for thesaurus narrow term">
108			<!--
109			Store value of field 250a (for display) in key composed
110			of prefix "d:" and value of field 900.
111			This is one key - one value lookup.
112			-->
113			<isis filter="mem_lookup" type="display">d:900 => 250a</isis>
114
115			<!--
116			Now, for each entry generate parent ID (using fields
117			5614, 5624, 4611 add prefix "a:" to it as a key)
118			and value of field 900 for value.
119			That will create lookup which can (and will) have
120			more than one value for each key (because parent
121			term have more than one child).
122			-->
123			<isis filter="mem_lookup" type="display">a:5614:5624:4611 => 900</isis>
124
125			</LookupThesNT>
126
127			So, after we index database with import_xml which have mem_lookup filter (which won't
128			create any output to swish or index) we have just two lookups stored in memory (that's
129			where name mem_lookup comes from):
130
131			d:900 => 250a
132
133			a:5614:5624:4611 => 900 900 900 900 900 ...
134
135			Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or
136			a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614,
137			and descriptors have 5614 and 5624 or all of them, depending on level).
138
139			Now, let's display some of those lookups.
140
141			First, we can display all ids of fields which are child to field 251:
142
143			<isis type="display" filter="mem_looku">[a:251]</isis>
144
145			That's not very useful, because we would like to display terms, and not
146			ids, possibly separated by " * ".
147
148			<isis type="display" filter="mem_lookup" delimiter=" * ">[d:[a:251]]</isis>
149
150			That's great. But, let's link those fields using format:
151
152			<format name="IDths"><![CDATA[
153			<a href="?rm=results&show_full=1&f=IDths&v=%s">%s</a>
154			]]></format>
155
156			<isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">[a:251];;[d:[a:251]]</isis>
157
158
159			There is only one problem left. Since we want to display just child records
160			from current record, we have to use three different tags to display child
161			records (for field, micro-thesaurus and term). However, that means that
162			term will display also all child fields and child micro-thesaurus terms which
163			isn't what's needed.
164
165			But, each record has also it's own level written in 901a, so we can filter
166			just correct child entries using something like:
167
168			<isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]]</isis>
169