24 |
<h1>Programming Guide</h1> |
<h1>Programming Guide</h1> |
25 |
|
|
26 |
<div class="note">Copyright (C) 2004-2005 Mikio Hirabayashi</div> |
<div class="note">Copyright (C) 2004-2005 Mikio Hirabayashi</div> |
27 |
<div class="note">Last Update: Tue, 07 Jun 2005 06:17:00 +0900</div> |
<div class="note">Last Update: Mon, 01 Aug 2005 00:50:38 +0900</div> |
28 |
<div class="navi">[<a href="pguide-ja.html" hreflang="ja">Japanese</a>] [<a href="index.html">HOME</a>]</div> |
<div class="navi">[<a href="pguide-ja.html" hreflang="ja">Japanese</a>] [<a href="index.html">HOME</a>]</div> |
29 |
|
|
30 |
<hr /> |
<hr /> |
49 |
|
|
50 |
<p>This document describes how to use the API of Hyper Estraier. If you have never read <a href="uguide-en.html">the user's guide</a> yet, please do it beforehand.</p> |
<p>This document describes how to use the API of Hyper Estraier. If you have never read <a href="uguide-en.html">the user's guide</a> yet, please do it beforehand.</p> |
51 |
|
|
52 |
<p>The API enables to realize many requirements which is impossible with `estcmd' and `estsearch.cgi' only. Whlie `estcmd' can handle documents as files, it is possible to make an application to handle records in a relational database as a document by using the library. While `estseek.cgi' is accessed with a web browser, it is possible to make an application with a GUI based on the native OS.</p> |
<p>The API enables to realize many requirements which is impossible with `estcmd' and `estsearch.cgi' only. While `estcmd' can handle documents as files, it is possible to make an application to handle records in a relational database as a document by using the library. While `estseek.cgi' is accessed with a web browser, it is possible to make an application with a GUI based on the native OS.</p> |
53 |
|
|
54 |
<p>The core API of Hyper Estraier provides some functions to manage the inverted index only. That is, processes of retrieving documents and calculating them are assigned to an application. Also, processes to display the search result is assigned to the application. Consequently, Hyper Estraier does not depend on any document repository, any file format, nor any user interface. They can be selected by the author of the application.</p> |
<p>The core API of Hyper Estraier provides some functions to manage the inverted index only. That is, processes of retrieving documents and calculating them are assigned to an application. Also, processes to display the search result is assigned to the application. Consequently, Hyper Estraier does not depend on any document repository, any file format, nor any user interface. They can be selected by the author of the application.</p> |
55 |
|
|
57 |
|
|
58 |
<p>One of characteristics of Hyper Estraier is high scalability. So, the author of the application does not need to consider the scalability as long as using the API of Hyper Estraier.</p> |
<p>One of characteristics of Hyper Estraier is high scalability. So, the author of the application does not need to consider the scalability as long as using the API of Hyper Estraier.</p> |
59 |
|
|
60 |
<p>As this document descibes the core API, Hyper Estraier provides the node API based on P2P architecture. Refer to <a href="nguide-en.html">the P2P Guide</a> for the node API.</p> |
<p>As this document describes the core API, Hyper Estraier provides the node API based on P2P architecture. Refer to <a href="nguide-en.html">the P2P Guide</a> for the node API.</p> |
61 |
|
|
62 |
<hr /> |
<hr /> |
63 |
|
|
64 |
<h2 id="architecture">Architecture</h2> |
<h2 id="architecture">Architecture</h2> |
65 |
|
|
66 |
<p>This section describes the arhcitecture of the core API of Hyper Estraier.</p> |
<p>This section describes the architecture of the core API of Hyper Estraier.</p> |
67 |
|
|
68 |
<h3>Gatherer and Filter</h3> |
<h3>Gatherer and Filter</h3> |
69 |
|
|
70 |
<p>The term `gatherer' means functions to register documents to the index. A gatherer is to be implemented in an application. For example, `estcmd' has functions to collect documents by scanning the file system. There are the following procedures.</p> |
<p>The term `gatherer' means functions to register documents to the index. A gatherer is to be implemented in an application. For example, `estcmd' has functions to collect documents by scanning the file system. There are the following procedures.</p> |
71 |
|
|
72 |
<ul> |
<ul> |
73 |
<li>To specifiy the name of the index and the entry point of scanning, by parsing the command line arguments.</li> |
<li>To specify the name of the index and the entry point of scanning, by parsing the command line arguments.</li> |
74 |
<li>To open the index.</li> |
<li>To open the index.</li> |
75 |
<li>To scan the file system and specify the paths of the target files.</li> |
<li>To scan the file system and specify the paths of the target files.</li> |
76 |
<li>For each file of the list above --<ul> |
<li>For each file of the list above --<ul> |
253 |
|
|
254 |
<dl> |
<dl> |
255 |
<dt><kbd>int est_doc_id(ESTDOC *<var>doc</var>);</kbd></dt> |
<dt><kbd>int est_doc_id(ESTDOC *<var>doc</var>);</kbd></dt> |
256 |
<dd>`doc' specifies a document object. The return value is the ID number of the document object. If the object has never been registered, -1 is returned.</dd> |
<dd>`doc' specifies a document object. The return value is the ID number of the document object. If the object has not been registered, -1 is returned.</dd> |
257 |
</dl> |
</dl> |
258 |
|
|
259 |
<p>The function `est_doc_attr_names' is used in order to get a list of attribute names of a document object.</p> |
<p>The function `est_doc_attr_names' is used in order to get a list of attribute names of a document object.</p> |
281 |
|
|
282 |
<dl> |
<dl> |
283 |
<dt><kbd>char *est_doc_cat_texts(ESTDOC *<var>doc</var>);</kbd></dt> |
<dt><kbd>char *est_doc_cat_texts(ESTDOC *<var>doc</var>);</kbd></dt> |
284 |
<dd>`doc' specifies a document object. The return value is concatenated sentences of a document object. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd> |
<dd>`doc' specifies a document object. The return value is concatenated sentences of the document object. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd> |
285 |
</dl> |
</dl> |
286 |
|
|
287 |
<p>The function `est_doc_dump_draft' is used in order to dump draft data of a document object.</p> |
<p>The function `est_doc_dump_draft' is used in order to dump draft data of a document object.</p> |
288 |
|
|
289 |
<dl> |
<dl> |
290 |
<dt><kbd>char *est_doc_dump_draft(ESTDOC *<var>doc</var>);</kbd></dt> |
<dt><kbd>char *est_doc_dump_draft(ESTDOC *<var>doc</var>);</kbd></dt> |
291 |
<dd>`doc' specifies a document object. The return value is draft data of a document object. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd> |
<dd>`doc' specifies a document object. The return value is draft data of the document object. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd> |
292 |
</dl> |
</dl> |
293 |
|
|
294 |
<p>The function `est_doc_make_snippet' is used in order to make a snippet of the body text of a document object.</p> |
<p>The function `est_doc_make_snippet' is used in order to make a snippet of the body text of a document object.</p> |
295 |
|
|
296 |
<dl> |
<dl> |
297 |
<dt><kbd>char *est_doc_make_snippet(ESTDOC *<var>doc</var>, const CBLIST *<var>words</var>, int <var>wwidth</var>, int <var>hwidth</var>, int <var>awidth</var>);</kbd></dt> |
<dt><kbd>char *est_doc_make_snippet(ESTDOC *<var>doc</var>, const CBLIST *<var>words</var>, int <var>wwidth</var>, int <var>hwidth</var>, int <var>awidth</var>);</kbd></dt> |
298 |
<dd>`doc' specifies a document object. `word' specifies a list object of words to be highlight. `wwitdh' specifies whole width of the result. `hwitdh' specifies width of strings picked up from the beginning of the text. `awitdh' specifies width of strings picked up around each highlighted word. The return value is a snippet string of the body text of a document object. There are tab separated values. Each line is a string to be shown. Though most lines have only one field, some lines have two fields. If the second field exists, the first field is to be shown with highlighted, and the second field means its normalized form. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd> |
<dd>`doc' specifies a document object. `word' specifies a list object of words to be highlight. `wwitdh' specifies whole width of the result. `hwitdh' specifies width of strings picked up from the beginning of the text. `awitdh' specifies width of strings picked up around each highlighted word. The return value is a snippet string of the body text of the document object. There are tab separated values. Each line is a string to be shown. Though most lines have only one field, some lines have two fields. If the second field exists, the first field is to be shown with highlighted, and the second field means its normalized form. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd> |
299 |
</dl> |
</dl> |
300 |
|
|
301 |
<p>The function `est_doc_scan_words' is used in order to check whether the text of a document object includes every specified words.</p> |
<p>The function `est_doc_scan_words' is used in order to check whether the text of a document object includes every specified words.</p> |
384 |
|
|
385 |
<dl> |
<dl> |
386 |
<dt><kbd>void est_cond_set_options(ESTCOND *<var>cond</var>, int <var>options</var>);</kbd></dt> |
<dt><kbd>void est_cond_set_options(ESTCOND *<var>cond</var>, int <var>options</var>);</kbd></dt> |
387 |
<dd>`cond' specifies a condition object. `options' specifies options: `ESTCONDSURE' specifies that it checks every N-gram key, `ESTCONDUSU', which is the default, specifies that it checks N-gram keys with skipping one key, `ESTCONDFAST' skips two keys, `ESTCONDAGIT' skips three keys, `ESTCONDNOIDF' specifies not to perform TF-IDF tuning, `ESTCONDSIMPLE' specifies to use simplefied phrase. Each option can be specified at the same time by bitwise or. If keys are skipped, though search speed is improved, the relevance ratio grows less.</dd> |
<dd>`cond' specifies a condition object. `options' specifies options: `ESTCONDSURE' specifies that it checks every N-gram key, `ESTCONDUSU', which is the default, specifies that it checks N-gram keys with skipping one key, `ESTCONDFAST' skips two keys, `ESTCONDAGIT' skips three keys, `ESTCONDNOIDF' specifies not to perform TF-IDF tuning, `ESTCONDSIMPLE' specifies to use simplified phrase. Each option can be specified at the same time by bitwise or. If keys are skipped, though search speed is improved, the relevance ratio grows less.</dd> |
388 |
</dl> |
</dl> |
389 |
|
|
390 |
<hr /> |
<hr /> |
448 |
|
|
449 |
<dl> |
<dl> |
450 |
<dt><kbd>ESTDB *est_db_open(const char *<var>name</var>, int <var>omode</var>, int *<var>ecp</var>);</kbd></dt> |
<dt><kbd>ESTDB *est_db_open(const char *<var>name</var>, int <var>omode</var>, int *<var>ecp</var>);</kbd></dt> |
451 |
<dd>`name' specifies the name of a database directory. `mode' specifies open modes: `ESTDBWRITER' as a writer, `ESTDBREADER' as a reader. If the mode is `ESTDBWRITER', the following may be added by bitwise or: `ESTDBCREAT', which means it creates a new database if not exist, `ESTDBTRUNC', which means it creates a new database regardless if one exists. Both of `ESTDBREADER' and `ESTDBWRITER' can be added to by bitwise or: `ESTDBNOLCK', which means it opens a database file without file locking, or `ESTDBLCKNB', which means locking is performed without blocking. If `ESTDBNOLCK' is used, the application is responsible for exclusion control. `ESTDBCREAT' can be added to by bitwise or: `ESTDBPERFNG', which means N-gram analysis is performed against Europian text also. `ecp' specifies the pointer to a variable to which the error code is assigned. The return value is a database object of the database or `NULL' if failure.</dd> |
<dd>`name' specifies the name of a database directory. `mode' specifies open modes: `ESTDBWRITER' as a writer, `ESTDBREADER' as a reader. If the mode is `ESTDBWRITER', the following may be added by bitwise or: `ESTDBCREAT', which means it creates a new database if not exist, `ESTDBTRUNC', which means it creates a new database regardless if one exists. Both of `ESTDBREADER' and `ESTDBWRITER' can be added to by bitwise or: `ESTDBNOLCK', which means it opens a database file without file locking, or `ESTDBLCKNB', which means locking is performed without blocking. If `ESTDBNOLCK' is used, the application is responsible for exclusion control. `ESTDBCREAT' can be added to by bitwise or: `ESTDBPERFNG', which means N-gram analysis is performed against European text also. `ecp' specifies the pointer to a variable to which the error code is assigned. The return value is a database object of the database or `NULL' if failure.</dd> |
452 |
</dl> |
</dl> |
453 |
|
|
454 |
<p>The function `est_db_close' is used in order to close a database.</p> |
<p>The function `est_db_close' is used in order to close a database.</p> |
458 |
<dd>`db' specifies a database object. `ecp' specifies the pointer to a variable to which the error code is assigned. The return value is true if success, else it is false.</dd> |
<dd>`db' specifies a database object. `ecp' specifies the pointer to a variable to which the error code is assigned. The return value is true if success, else it is false.</dd> |
459 |
</dl> |
</dl> |
460 |
|
|
461 |
<p>The function `est_db_error' is used in order to get the last happended error code of a database.</p> |
<p>The function `est_db_error' is used in order to get the last happened error code of a database.</p> |
462 |
|
|
463 |
<dl> |
<dl> |
464 |
<dt><kbd>int est_db_error(ESTDB *<var>db</var>);</kbd></dt> |
<dt><kbd>int est_db_error(ESTDB *<var>db</var>);</kbd></dt> |
465 |
<dd>`db' specifies a database object. The return value is the last happended error code of the database.</dd> |
<dd>`db' specifies a database object. The return value is the last happened error code of the database.</dd> |
466 |
</dl> |
</dl> |
467 |
|
|
468 |
<p>The function `est_db_fatal' is used in order to check whether a database has a fatal error.</p> |
<p>The function `est_db_fatal' is used in order to check whether a database has a fatal error.</p> |
469 |
|
|
470 |
<dl> |
<dl> |
471 |
<dt><kbd>int est_db_fatal(ESTDB *<var>db</var>);</kbd></dt> |
<dt><kbd>int est_db_fatal(ESTDB *<var>db</var>);</kbd></dt> |
472 |
<dd>`db' specifies a database object. The return value is true if the database has fatal erroor, else it is false.</dd> |
<dd>`db' specifies a database object. The return value is true if the database has fatal error, else it is false.</dd> |
473 |
</dl> |
</dl> |
474 |
|
|
475 |
<p>The function `est_db_flush' is used in order to flush index words in the cache of a database.</p> |
<p>The function `est_db_flush' is used in order to flush index words in the cache of a database.</p> |
490 |
|
|
491 |
<dl> |
<dl> |
492 |
<dt><kbd>int est_db_optimize(ESTDB *<var>db</var>, int <var>options</var>);</kbd></dt> |
<dt><kbd>int est_db_optimize(ESTDB *<var>db</var>, int <var>options</var>);</kbd></dt> |
493 |
<dd>`db' specifies a database object connected as a writer. `options' specifies options: `ESTOPTNOPURGE' to omit purging dispensable region of deleted documents, `ESTOPTNODBOPT' to omit optimizization of the database files. The two can be specified at the same time by bitwise or. The return value is true if success, else it is false.</dd> |
<dd>`db' specifies a database object connected as a writer. `options' specifies options: `ESTOPTNOPURGE' to omit purging dispensable region of deleted documents, `ESTOPTNODBOPT' to omit optimization of the database files. The two can be specified at the same time by bitwise or. The return value is true if success, else it is false.</dd> |
494 |
</dl> |
</dl> |
495 |
|
|
496 |
<p>The function `est_db_put_doc' is used in order to add a document to a database.</p> |
<p>The function `est_db_put_doc' is used in order to add a document to a database.</p> |
694 |
printf("%s\n", value); |
printf("%s\n", value); |
695 |
} |
} |
696 |
|
|
697 |
/* destloy the document object */ |
/* destroy the document object */ |
698 |
est_doc_delete(doc); |
est_doc_delete(doc); |
699 |
|
|
700 |
} |
} |