1 |
This project is side-effect of WebPac indexer -- I learned so much about |
2 |
perl and swish-e during working on WebPac so I decided to make smaller |
3 |
cgi interface to swish for general purpose web crawling and searching. |
4 |
|
5 |
|
6 |
For this to work, you will have to create symlink from swish.cgi to |
7 |
each configuration like this: |
8 |
|
9 |
cd html |
10 |
ln -s swish.cgi rot13.cgi |
11 |
|
12 |
Then, rot13.cgi will use rot13.xml file as configuration. |
13 |
|
14 |
|
15 |
This perl cgi font-end has following interesting features: |
16 |
1. old (easy to understand :-) swish spider modified to support "no parent" |
17 |
URL (url above which it will stop spidering -- useful if you want to |
18 |
spider just your personal pages under some URL) |
19 |
2. separate xml configuration file for one index (all uses same cgi script) |
20 |
3. no need to design html pages (but, that limits you to one rather ugly |
21 |
design with some fill-in words -- someone could change that to use |
22 |
templates as I will probably in future) |
23 |
4. support for searching using Lingua::Spelling::Alternative module which |
24 |
uses ispell affix or findaffix data to create variations of entered words |
25 |
(you can use affix and findaffix together; even with more than one file |
26 |
separated buy commans [,] or spaces) |
27 |
5. support for converting swish UTF-8 output to some other encoding on web |
28 |
using iconv. |
29 |
|
30 |
|
31 |
For example how to create useful configuration file take a look at |
32 |
included Makefile. In short, make_config.pl can create configuration |
33 |
file for swish, xml configuration for cgi script and necessary symlink |
34 |
by just: |
35 |
|
36 |
$ make_config.pl name_of_config http://host/url_to_index/ [strip from url] |
37 |
|
38 |
Optional "strip from url" will remove that part of the path when storing |
39 |
in swish index. That enables you to create indexes which can later be merged |
40 |
in one combined index easily. It's also useful if you want to index your |
41 |
whole /doc directory and don't want that prefix in each and every entry in |
42 |
index (which should save some space on disk too!). |
43 |
|
44 |
|
45 |
Dobrica Pavlinusic <dpavlin@rot13.org> 2003-04-26 |