/[hyperestraier]/upstream/0.5.3/doc/pguide-en.html
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Diff of /upstream/0.5.3/doc/pguide-en.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

upstream/0.5.2/doc/pguide-en.html revision 9 by dpavlin, Wed Aug 3 15:21:15 2005 UTC upstream/0.5.3/doc/pguide-en.html revision 10 by dpavlin, Wed Aug 3 15:25:48 2005 UTC
# Line 24  Line 24 
24  <h1>Programming Guide</h1>  <h1>Programming Guide</h1>
25    
26  <div class="note">Copyright (C) 2004-2005 Mikio Hirabayashi</div>  <div class="note">Copyright (C) 2004-2005 Mikio Hirabayashi</div>
27  <div class="note">Last Update: Tue, 07 Jun 2005 06:17:00 +0900</div>  <div class="note">Last Update: Mon, 01 Aug 2005 00:50:38 +0900</div>
28  <div class="navi">[<a href="pguide-ja.html" hreflang="ja">Japanese</a>] [<a href="index.html">HOME</a>]</div>  <div class="navi">[<a href="pguide-ja.html" hreflang="ja">Japanese</a>] [<a href="index.html">HOME</a>]</div>
29    
30  <hr />  <hr />
# Line 49  Line 49 
49    
50  <p>This document describes how to use the API of Hyper Estraier.  If you have never read <a href="uguide-en.html">the user's guide</a> yet, please do it beforehand.</p>  <p>This document describes how to use the API of Hyper Estraier.  If you have never read <a href="uguide-en.html">the user's guide</a> yet, please do it beforehand.</p>
51    
52  <p>The API enables to realize many requirements which is impossible with `estcmd' and `estsearch.cgi' only.  Whlie `estcmd' can handle documents as files, it is possible to make an application to handle records in a relational database as a document by using the library.  While `estseek.cgi' is accessed with a web browser, it is possible to make an application with a GUI based on the native OS.</p>  <p>The API enables to realize many requirements which is impossible with `estcmd' and `estsearch.cgi' only.  While `estcmd' can handle documents as files, it is possible to make an application to handle records in a relational database as a document by using the library.  While `estseek.cgi' is accessed with a web browser, it is possible to make an application with a GUI based on the native OS.</p>
53    
54  <p>The core API of Hyper Estraier provides some functions to manage the inverted index only.  That is, processes of retrieving documents and calculating them are assigned to an application.  Also, processes to display the search result is assigned to the application.  Consequently, Hyper Estraier does not depend on any document repository, any file format, nor any user interface.  They can be selected by the author of the application.</p>  <p>The core API of Hyper Estraier provides some functions to manage the inverted index only.  That is, processes of retrieving documents and calculating them are assigned to an application.  Also, processes to display the search result is assigned to the application.  Consequently, Hyper Estraier does not depend on any document repository, any file format, nor any user interface.  They can be selected by the author of the application.</p>
55    
# Line 57  Line 57 
57    
58  <p>One of characteristics of Hyper Estraier is high scalability.  So, the author of the application does not need to consider the scalability as long as using the API of Hyper Estraier.</p>  <p>One of characteristics of Hyper Estraier is high scalability.  So, the author of the application does not need to consider the scalability as long as using the API of Hyper Estraier.</p>
59    
60  <p>As this document descibes the core API, Hyper Estraier provides the node API based on P2P architecture.  Refer to <a href="nguide-en.html">the P2P Guide</a> for the node API.</p>  <p>As this document describes the core API, Hyper Estraier provides the node API based on P2P architecture.  Refer to <a href="nguide-en.html">the P2P Guide</a> for the node API.</p>
61    
62  <hr />  <hr />
63    
64  <h2 id="architecture">Architecture</h2>  <h2 id="architecture">Architecture</h2>
65    
66  <p>This section describes the arhcitecture of the core API of Hyper Estraier.</p>  <p>This section describes the architecture of the core API of Hyper Estraier.</p>
67    
68  <h3>Gatherer and Filter</h3>  <h3>Gatherer and Filter</h3>
69    
70  <p>The term `gatherer' means functions to register documents to the index.  A gatherer is to be implemented in an application.  For example, `estcmd' has functions to collect documents by scanning the file system.  There are the following procedures.</p>  <p>The term `gatherer' means functions to register documents to the index.  A gatherer is to be implemented in an application.  For example, `estcmd' has functions to collect documents by scanning the file system.  There are the following procedures.</p>
71    
72  <ul>  <ul>
73  <li>To specifiy the name of the index and the entry point of scanning, by parsing the command line arguments.</li>  <li>To specify the name of the index and the entry point of scanning, by parsing the command line arguments.</li>
74  <li>To open the index.</li>  <li>To open the index.</li>
75  <li>To scan the file system and specify the paths of the target files.</li>  <li>To scan the file system and specify the paths of the target files.</li>
76  <li>For each file of the list above --<ul>  <li>For each file of the list above --<ul>
# Line 253  est_doc_delete(doc); Line 253  est_doc_delete(doc);
253    
254  <dl>  <dl>
255  <dt><kbd>int est_doc_id(ESTDOC *<var>doc</var>);</kbd></dt>  <dt><kbd>int est_doc_id(ESTDOC *<var>doc</var>);</kbd></dt>
256  <dd>`doc' specifies a document object.  The return value is the ID number of the document object.  If the object has never been registered, -1 is returned.</dd>  <dd>`doc' specifies a document object.  The return value is the ID number of the document object.  If the object has not been registered, -1 is returned.</dd>
257  </dl>  </dl>
258    
259  <p>The function `est_doc_attr_names' is used in order to get a list of attribute names of a document object.</p>  <p>The function `est_doc_attr_names' is used in order to get a list of attribute names of a document object.</p>
# Line 281  est_doc_delete(doc); Line 281  est_doc_delete(doc);
281    
282  <dl>  <dl>
283  <dt><kbd>char *est_doc_cat_texts(ESTDOC *<var>doc</var>);</kbd></dt>  <dt><kbd>char *est_doc_cat_texts(ESTDOC *<var>doc</var>);</kbd></dt>
284  <dd>`doc' specifies a document object.  The return value is concatenated sentences of a document object.  Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>  <dd>`doc' specifies a document object.  The return value is concatenated sentences of the document object.  Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
285  </dl>  </dl>
286    
287  <p>The function `est_doc_dump_draft' is used in order to dump draft data of a document object.</p>  <p>The function `est_doc_dump_draft' is used in order to dump draft data of a document object.</p>
288    
289  <dl>  <dl>
290  <dt><kbd>char *est_doc_dump_draft(ESTDOC *<var>doc</var>);</kbd></dt>  <dt><kbd>char *est_doc_dump_draft(ESTDOC *<var>doc</var>);</kbd></dt>
291  <dd>`doc' specifies a document object.  The return value is draft data of a document object.  Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>  <dd>`doc' specifies a document object.  The return value is draft data of the document object.  Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
292  </dl>  </dl>
293    
294  <p>The function `est_doc_make_snippet' is used in order to make a snippet of the body text of a document object.</p>  <p>The function `est_doc_make_snippet' is used in order to make a snippet of the body text of a document object.</p>
295    
296  <dl>  <dl>
297  <dt><kbd>char *est_doc_make_snippet(ESTDOC *<var>doc</var>, const CBLIST *<var>words</var>, int <var>wwidth</var>, int <var>hwidth</var>, int <var>awidth</var>);</kbd></dt>  <dt><kbd>char *est_doc_make_snippet(ESTDOC *<var>doc</var>, const CBLIST *<var>words</var>, int <var>wwidth</var>, int <var>hwidth</var>, int <var>awidth</var>);</kbd></dt>
298  <dd>`doc' specifies a document object.  `word' specifies a list object of words to be highlight.  `wwitdh' specifies whole width of the result.  `hwitdh' specifies width of strings picked up from the beginning of the text.  `awitdh' specifies width of strings picked up around each highlighted word.  The return value is a snippet string of the body text of a document object.  There are tab separated values.  Each line is a string to be shown.  Though most lines have only one field, some lines have two fields.  If the second field exists, the first field is to be shown with highlighted, and the second field means its normalized form.  Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>  <dd>`doc' specifies a document object.  `word' specifies a list object of words to be highlight.  `wwitdh' specifies whole width of the result.  `hwitdh' specifies width of strings picked up from the beginning of the text.  `awitdh' specifies width of strings picked up around each highlighted word.  The return value is a snippet string of the body text of the document object.  There are tab separated values.  Each line is a string to be shown.  Though most lines have only one field, some lines have two fields.  If the second field exists, the first field is to be shown with highlighted, and the second field means its normalized form.  Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
299  </dl>  </dl>
300    
301  <p>The function `est_doc_scan_words' is used in order to check whether the text of a document object includes every specified words.</p>  <p>The function `est_doc_scan_words' is used in order to check whether the text of a document object includes every specified words.</p>
# Line 384  est_cond_delete(cond); Line 384  est_cond_delete(cond);
384    
385  <dl>  <dl>
386  <dt><kbd>void est_cond_set_options(ESTCOND *<var>cond</var>, int <var>options</var>);</kbd></dt>  <dt><kbd>void est_cond_set_options(ESTCOND *<var>cond</var>, int <var>options</var>);</kbd></dt>
387  <dd>`cond' specifies a condition object.  `options' specifies options: `ESTCONDSURE' specifies that it checks every N-gram key, `ESTCONDUSU', which is the default, specifies that it checks N-gram keys with skipping one key, `ESTCONDFAST' skips two keys, `ESTCONDAGIT' skips three keys, `ESTCONDNOIDF' specifies not to perform TF-IDF tuning, `ESTCONDSIMPLE' specifies to use simplefied phrase.  Each option can be specified at the same time by bitwise or.  If keys are skipped, though search speed is improved, the relevance ratio grows less.</dd>  <dd>`cond' specifies a condition object.  `options' specifies options: `ESTCONDSURE' specifies that it checks every N-gram key, `ESTCONDUSU', which is the default, specifies that it checks N-gram keys with skipping one key, `ESTCONDFAST' skips two keys, `ESTCONDAGIT' skips three keys, `ESTCONDNOIDF' specifies not to perform TF-IDF tuning, `ESTCONDSIMPLE' specifies to use simplified phrase.  Each option can be specified at the same time by bitwise or.  If keys are skipped, though search speed is improved, the relevance ratio grows less.</dd>
388  </dl>  </dl>
389    
390  <hr />  <hr />
# Line 448  if(!est_db_close(db, &amp;ecode)){ Line 448  if(!est_db_close(db, &amp;ecode)){
448    
449  <dl>  <dl>
450  <dt><kbd>ESTDB *est_db_open(const char *<var>name</var>, int <var>omode</var>, int *<var>ecp</var>);</kbd></dt>  <dt><kbd>ESTDB *est_db_open(const char *<var>name</var>, int <var>omode</var>, int *<var>ecp</var>);</kbd></dt>
451  <dd>`name' specifies the name of a database directory.  `mode' specifies open modes: `ESTDBWRITER' as a writer, `ESTDBREADER' as a reader.  If the mode is `ESTDBWRITER', the following may be added by bitwise or: `ESTDBCREAT', which means it creates a new database if not exist, `ESTDBTRUNC', which means it creates a new database regardless if one exists.  Both of `ESTDBREADER' and  `ESTDBWRITER' can be added to by bitwise or: `ESTDBNOLCK', which means it opens a database file without file locking, or `ESTDBLCKNB', which means locking is performed without blocking.  If `ESTDBNOLCK' is used, the application is responsible for exclusion control.  `ESTDBCREAT' can be added to by bitwise or: `ESTDBPERFNG', which means N-gram analysis is performed against Europian text also.  `ecp' specifies the pointer to a variable to which the error code is assigned.  The return value is a database object of the database or `NULL' if failure.</dd>  <dd>`name' specifies the name of a database directory.  `mode' specifies open modes: `ESTDBWRITER' as a writer, `ESTDBREADER' as a reader.  If the mode is `ESTDBWRITER', the following may be added by bitwise or: `ESTDBCREAT', which means it creates a new database if not exist, `ESTDBTRUNC', which means it creates a new database regardless if one exists.  Both of `ESTDBREADER' and  `ESTDBWRITER' can be added to by bitwise or: `ESTDBNOLCK', which means it opens a database file without file locking, or `ESTDBLCKNB', which means locking is performed without blocking.  If `ESTDBNOLCK' is used, the application is responsible for exclusion control.  `ESTDBCREAT' can be added to by bitwise or: `ESTDBPERFNG', which means N-gram analysis is performed against European text also.  `ecp' specifies the pointer to a variable to which the error code is assigned.  The return value is a database object of the database or `NULL' if failure.</dd>
452  </dl>  </dl>
453    
454  <p>The function `est_db_close' is used in order to close a database.</p>  <p>The function `est_db_close' is used in order to close a database.</p>
# Line 458  if(!est_db_close(db, &amp;ecode)){ Line 458  if(!est_db_close(db, &amp;ecode)){
458  <dd>`db' specifies a database object.  `ecp' specifies the pointer to a variable to which the error code is assigned.  The return value is true if success, else it is false.</dd>  <dd>`db' specifies a database object.  `ecp' specifies the pointer to a variable to which the error code is assigned.  The return value is true if success, else it is false.</dd>
459  </dl>  </dl>
460    
461  <p>The function `est_db_error' is used in order to get the last happended error code of a database.</p>  <p>The function `est_db_error' is used in order to get the last happened error code of a database.</p>
462    
463  <dl>  <dl>
464  <dt><kbd>int est_db_error(ESTDB *<var>db</var>);</kbd></dt>  <dt><kbd>int est_db_error(ESTDB *<var>db</var>);</kbd></dt>
465  <dd>`db' specifies a database object.  The return value is the last happended error code of the database.</dd>  <dd>`db' specifies a database object.  The return value is the last happened error code of the database.</dd>
466  </dl>  </dl>
467    
468  <p>The function `est_db_fatal' is used in order to check whether a database has a fatal error.</p>  <p>The function `est_db_fatal' is used in order to check whether a database has a fatal error.</p>
469    
470  <dl>  <dl>
471  <dt><kbd>int est_db_fatal(ESTDB *<var>db</var>);</kbd></dt>  <dt><kbd>int est_db_fatal(ESTDB *<var>db</var>);</kbd></dt>
472  <dd>`db' specifies a database object.  The return value is true if the database has fatal erroor, else it is false.</dd>  <dd>`db' specifies a database object.  The return value is true if the database has fatal error, else it is false.</dd>
473  </dl>  </dl>
474    
475  <p>The function `est_db_flush' is used in order to flush index words in the cache of a database.</p>  <p>The function `est_db_flush' is used in order to flush index words in the cache of a database.</p>
# Line 490  if(!est_db_close(db, &amp;ecode)){ Line 490  if(!est_db_close(db, &amp;ecode)){
490    
491  <dl>  <dl>
492  <dt><kbd>int est_db_optimize(ESTDB *<var>db</var>, int <var>options</var>);</kbd></dt>  <dt><kbd>int est_db_optimize(ESTDB *<var>db</var>, int <var>options</var>);</kbd></dt>
493  <dd>`db' specifies a database object connected as a writer.  `options' specifies options: `ESTOPTNOPURGE' to omit purging dispensable region of deleted documents, `ESTOPTNODBOPT' to omit optimizization of the database files.  The two can be specified at the same time by bitwise or.  The return value is true if success, else it is false.</dd>  <dd>`db' specifies a database object connected as a writer.  `options' specifies options: `ESTOPTNOPURGE' to omit purging dispensable region of deleted documents, `ESTOPTNODBOPT' to omit optimization of the database files.  The two can be specified at the same time by bitwise or.  The return value is true if success, else it is false.</dd>
494  </dl>  </dl>
495    
496  <p>The function `est_db_put_doc' is used in order to add a document to a database.</p>  <p>The function `est_db_put_doc' is used in order to add a document to a database.</p>
# Line 694  int main(int argc, char **argv){ Line 694  int main(int argc, char **argv){
694        printf("%s\n", value);        printf("%s\n", value);
695      }      }
696    
697      /* destloy the document object */      /* destroy the document object */
698      est_doc_delete(doc);      est_doc_delete(doc);
699    
700    }    }

Legend:
Removed from v.9  
changed lines
  Added in v.10

  ViewVC Help
Powered by ViewVC 1.1.26