1 |
* what makes ISIS ISIS ? |
2 |
|
3 |
Andrew Giles-Peters raised the important question |
4 |
"What is it about ISIS that makes it ISIS?" |
5 |
|
6 |
|
7 |
So here are some thougts on this topic from the OpenIsis team: |
8 |
|
9 |
- As a database used for bibliographic data (among other), |
10 |
ISIS must be able to store and retrieve records as exchanged via |
11 |
ISO2709 efficiently and with no or minimal loss of information. |
12 |
- Besides the ability to retrieve records by number, |
13 |
ISIS must support an indexing mechanism which is essentially |
14 |
"function based", that is, index entries are not the immediate |
15 |
field values, but rather the values of a "view" derived by |
16 |
some computation are indexed. |
17 |
- ISIS must efficiently support typical query elements commonly |
18 |
used on bibliographic databases, like looking up a value without |
19 |
regard for the field or in several fields at once and specifying |
20 |
a distance within search terms should occur. |
21 |
|
22 |
Since these are minimal requirements, |
23 |
they would not stop anybody from adding tons of features on top. |
24 |
For example, it's relatively easy to store ISO2709 data in a |
25 |
relational database like Sybase (used by OCLC/Pica), |
26 |
each record covering several rows (mfn, field number, field occ, value), |
27 |
then compute a second similar table for the index and so on. |
28 |
|
29 |
|
30 |
However, there is the word "efficiently", |
31 |
which practically turns out to put some restrictions on the |
32 |
feature-load, especially when combined with: |
33 |
|
34 |
- ISIS must be widely usable even in the face of *very* low budgets. |
35 |
Therefore, not only the software itself must be available for at most |
36 |
a nominal fee, but it also must not require very new, very powerful |
37 |
or otherwise expensive hardware and system. |
38 |
Even very large catalogs should get by with moderate system costs. |
39 |
|
40 |
The OCLC/Pica system for example requires one to spend |
41 |
hundreds of thousands of dollars for powerful Sun machines. |
42 |
|
43 |
|
44 |
* end of story ? |
45 |
|
46 |
Still, it would be very nice if more areas of application could |
47 |
be explored for ISIS, both for the librarians in order to be able |
48 |
to use their favourite DB (i.e. ISIS) for a broader range of |
49 |
tasks and also to expand the user community, possibly leading |
50 |
to more support for everybody. |
51 |
|
52 |
One important question is whether ISIS needs some fundamental changes |
53 |
deep in it's guts, or whether it already has everything that's needed |
54 |
to build a broad range of sophisticated solutions on top of it. |
55 |
As you might expect, we are pretty well convinced of the latter. |
56 |
|
57 |
|
58 |
* file formats |
59 |
|
60 |
Just like it doesn't harm a database much to be exported to and |
61 |
imported from ISO2709, there is not much of a problem with different |
62 |
file formats, as long as there do exist conversion tools. |
63 |
As you know, CISIS/Unix-DBs are incompatible to WinIsis/DOS-DBs, |
64 |
but may be converted via ISO files. |
65 |
As long as the basic data structures are the same, |
66 |
lossless conversion is just a matter of tools. |
67 |
It's even less of a problem if the software itself can read |
68 |
several file formats (like openisis does). |
69 |
You won't care much whether your wordprocessor is reading |
70 |
a .doc or .rtf file, would you? |
71 |
We did an interesting and very successful study implementing |
72 |
an ISIS-like DB in pure Java using a plaintext masterfile |
73 |
very similar to the Mbox mailfolder format |
74 |
(hope to be able to release the code soon). |
75 |
Likewise there is no reason why one should not be able to read |
76 |
directly from an ISO2709 file. |
77 |
Besides convertible masterfile formats, one might well use other |
78 |
formats for xref and index, which always can be reconstructed as needed. |
79 |
There are several reasons like improved performance or robustness to do so. |
80 |
So I don't think ISIS is defined in terms of detailled file formats, |
81 |
but rather in terms of the basic data structures. |
82 |
|
83 |
One problem that might come to mind when talking about file formats |
84 |
are the limits. While the maximum number of records per DB as well |
85 |
as the maximum total file sizes are bypassed relatively easy |
86 |
by logically joining several databases, the maximum record size of |
87 |
about 32K is a limit which might be unacceptable for some applications. |
88 |
(Although it can partly be resolved by deploying external files |
89 |
like OCLC/Pica does to circumvent Sybase's varchar limits). |
90 |
Raising this limit would clearly restrict lossless conversion to one way, |
91 |
from small to large DB. Where a large DB model is needed, |
92 |
all parties developing ISIS software should agree on one format |
93 |
to allow for as-painless-as-possible interoperability. |
94 |
|
95 |
|
96 |
* so what kind of database is ISIS ? |
97 |
|
98 |
Classical database theory basically distinguishes ISAM, |
99 |
network, hierarchical and relational database systems. |
100 |
ISIS is strongly related to ISAM DBs, however it's flexible |
101 |
indexing is rarely paralleled by any of these systems |
102 |
and it's non-flat data model is targeted by hierarchical DBs |
103 |
only (in greater generality and with much higher costs). |
104 |
|
105 |
- Although direct joins by MFN shouldn't be too costly, |
106 |
ISIS is not the database of choice when several records |
107 |
typically need to be combined in queries or transactions. |
108 |
However, in many application cases, only one ISIS record is |
109 |
needed as opposed to several relational table rows. |
110 |
In such situations, ISIS is even an excellent and efficient |
111 |
transaction (OLTP) database (since save writing of an ISIS |
112 |
record is much simpler than other DB's undo/redo logs). |
113 |
- ISIS is not the database of choice when records are updated |
114 |
by the hour. However, where only about 10% of records are |
115 |
changed between two (monthly, weekly or daily) runs of backup |
116 |
and compactification, the space overhead is not a big problem. |
117 |
Where old versions of data need to be retained anyway |
118 |
(as often needed and supported, for example, by postgres history), |
119 |
you would hardly find a more efficient solution. |
120 |
- ISIS is not the database of choice when it comes to high volume online |
121 |
analytical processing (querying statistics on several dimensions, OLAP). |
122 |
However, after reading some database books and Oracle manuals, |
123 |
one learns that OLAP requires a well designed ("star schema") |
124 |
database separate from the transactional one, anyway. |
125 |
- ISIS does not, in itself, provide any concurrency control |
126 |
(actual implementations do, to some extend). This doesn't |
127 |
hurt when running a read-only multi-user catalogue, |
128 |
a stand-alone application and in some insert-only situations. |
129 |
For distributed multi-client update, there are mechanisms based |
130 |
on timestamps or stored procedures that need to be supported |
131 |
by some ISIS server to come. |
132 |
|
133 |
|
134 |
While these data models are strongly tied to the logical |
135 |
nature and physical organisation of the data, |
136 |
newer notions like that of an 'object oriented' or 'XML' |
137 |
database rather describe a way to use and access a database. |
138 |
Actually OO or XML DBs are usually based on one of |
139 |
the above mentioned systems (mostly relational ones). |
140 |
For the most part, using a DB as OO or XML storage does require nothing |
141 |
but some libraries and optionally precompilers for C++ or Java |
142 |
-- these can be build on top of existing ISIS without changing it, |
143 |
and ISIS will be an excellent choice for many applications. |
144 |
Some aspects of increased functionality and performance will |
145 |
require sort of "stored procedures" running inside the database. |
146 |
In the case of a XML DB they are used for example to decomposite structures, |
147 |
in the OO case they might need some sort of "magic switch" (method |
148 |
overriding) to perform differently for some records than for others. |
149 |
We believe that all this magic can be achieved based on ISIS. |
150 |
The concepts of an ISIS database server and a scripting language as an |
151 |
alternative to formatting exits are to be discussed elsewhere ... |
152 |
|
153 |
First we want to shed some more light on the great flexibility the |
154 |
ISIS database system has by it's very nature. |
155 |
|
156 |
|
157 |
* ISIS is a mail database |
158 |
|
159 |
Looking at http://www.faqs.org/rfcs/rfc822.html (or its updates) |
160 |
one will find many similarities between ISO2709 records and internet mails, |
161 |
which are, after all, essentially a series of header names and values. |
162 |
After assigning numbers to the 100 or 200 most commonly used headers |
163 |
and some sort of subfield encoding (e.g. "^nname^vvalue", |
164 |
"name<TAB>value" or simply "name: value") to store other header lines |
165 |
with a special field number, mails are easily and very efficiently |
166 |
stored in an ISIS database. Given the enormous number of communication, |
167 |
groupware and workflow systems that are nowadays built upon standard plain |
168 |
internet mails (typically using a set of special mail headers), |
169 |
this is a very large area to be served by ISIS databases. |
170 |
The above mentioned Mbox-style implementation of ISIS tends towards |
171 |
that direction, building upon the javax.mail standard. |
172 |
IMAP mail servers could greatly benefit from the powerful indexing |
173 |
and retrieval system of ISIS databases. |
174 |
If also the mail sending application allows to select special headers |
175 |
from an entry form prepared by a skilled librarian with thesauri |
176 |
and systematics, an institution or company could really come to a new |
177 |
way of using mail as a system of qualified, living information. |
178 |
|
179 |
|
180 |
* ISIS is a multimedia database |
181 |
|
182 |
After all the mail not only has got headers, but also a body. |
183 |
A plaintext body of reasonable length (some KB, like sent by nice people), |
184 |
fits without problem in a field whose number means "body". |
185 |
A multipart body is easily decomposed to a series of body fields. |
186 |
Wether larger or non-plaintext bodies are stored within or outside |
187 |
the masterfile is a matter of the actual implementation and doesn't |
188 |
need to be discussed here, both approaches have their pros and cons. |
189 |
Anyway, the MIME standard, up and running since 1982, |
190 |
allows for storage and transmission of anything that uses bytes, |
191 |
and is easily integrated with ISIS databases |
192 |
(we partly did it, code to be released). |
193 |
|
194 |
|
195 |
* ISIS is a XML database |
196 |
|
197 |
Likewise XML, which basically is text, can be stored in an ISIS database |
198 |
(with respect to the implementation's maximum record length). |
199 |
Add some formatting exits to address the XML node content via |
200 |
a DOM-style a.b.c notation as used in javascript, use them in your FST |
201 |
and you will for sure have one of the world's best indexed and fastest |
202 |
XML database -- most others are using a relational DB as basis. |
203 |
So indexing, retrieving and displaying XML data is more or less |
204 |
simply a matter of some formatting functions. |
205 |
|
206 |
However, when thinking about data entry forms, for example, |
207 |
the dark side of the force shows up: |
208 |
Even with a very sophisticated database system with the ability to make |
209 |
sense out of XML DTDs, it is anyway potentially much more complicated. |
210 |
XML was meant to provide arbitrary complexity in the first place. |
211 |
And when it comes to DTDs like that of XHTML, which will carry just about |
212 |
the same content as any HTML page, one easily understands that reasonable |
213 |
automatic processing becomes nearly impossible -- that's the reason why |
214 |
HTML pages are largely beefed up with headers (Dublin Core and others). |
215 |
If you really desperately need it, it's good to have it, |
216 |
but else using it might be looking for trouble. |
217 |
|
218 |
When having to work with XML structures for one or the other reason, |
219 |
typically because they should be imported or exported, |
220 |
one should think of a mapping between XML and ISIS structures. |
221 |
In many situations XML structures are shallow and can be ISIfied by |
222 |
simply mapping the first level of sister nodes to ISIS fields and the |
223 |
second level to subfields (may require repeated subfield support). |
224 |
In other situations a closer look at the data structure may reveal |
225 |
that it is not well designed with regard to Ockham's razor but contains |
226 |
totally unnecessary depth which may be collapsed to the first case. |
227 |
Actually, during several years of work with XML structures as |
228 |
suggested by several "standards", I rarely found a reasonable |
229 |
structure which can not be mapped to a field-subfield-schema. |
230 |
|
231 |
|
232 |
But even if you really need XML structures "as is", |
233 |
they can be stored very |
234 |
> Struct efficiently |
235 |
in ISIS, with all the benefits of the flexible index |
236 |
(c.f. |
237 |
> unirec the universal ISIS record) |
238 |
. |
239 |
Anyway, Dublin Core metadata or other RDF (resource description framework) |
240 |
headers are conveniently stored in ISIS just like mail headers. |
241 |
Maybe, as this schema was created to suit the needs of the very |
242 |
old science of bibliographic knowledge management, much of that |
243 |
experience was built into it. |
244 |
|
245 |
On the other hand, XML's ancestor SGML was conceived for a document's body, |
246 |
not the head, and I guess there still is it's place in spite of |
247 |
programming industry's hype. The use of XML for structuring documents |
248 |
that are ment to be read by humans rather than machines of course |
249 |
is perfectly reasonable. Transparent access to file based data associated |
250 |
with a record and a XML add-on to the formatting language could aid |
251 |
in converting extracts of document contents to metadata accessible |
252 |
in the ISIS database and/or it's index. |
253 |
|
254 |
|
255 |
To wrap it up, I'd suggest to look at XML as an optional add-on to ISIS |
256 |
rather than an integral part. ISIS already has all the functionality |
257 |
needed to support any reasonable use of XML. ISIS data can much more |
258 |
efficiently contain XML structures than the other way round. |
259 |
|
260 |
|
261 |
* ISIS is a database for document/content management systems |
262 |
|
263 |
It follows that ISIS may very well support the needs of |
264 |
systems for XML documents or website content in XML or HTML. |
265 |
With increasing experience with such systems, people tend to |
266 |
understand that content metadata should be organized according |
267 |
to bibliographic principles. (Not that surprising, is it)? |
268 |
In cooperation with the oc4science.org there are projects at german |
269 |
universities to integrate publishing, document management and website CMS, |
270 |
based on an (Open)ISIS DB and directed by the librarian. |
271 |
|
272 |
|
273 |
----------- |
274 |
$Id: whatabout.txt,v 1.8 2003/02/14 17:30:33 kripke Exp $ |