Parent Directory | Revision Log
Links to HEAD: | (view) (annotate) |
Sticky Revision: |
output modules can now define add_row as opposed to add to get input row before normalization
don't export anything from LWP::Simple by default
create out/report and out/marc if they doesn't exist
fix --mirror option to create directories and return nicer reports
r1830@llin: dpavlin | 2009-04-25 16:46:33 +0200 implement --mirror http://www.example.com/ to create local mirror of remote paths behind http
implement skip on inputs
r1765@llin: dpavlin | 2009-04-21 23:03:52 +0200 don't hide indexer errors
Make cleanup of encodings, moving webpac closer to having internal utf-8 representation. This will break current code, but is really neceserry step toward checking input encoding for validity
dump combined normalize rules into out/debug/all-normalize.pl if out/debug exists
support databases without normalization (useful for input files which are used just for lookups)
r1679@llin: dpavlin | 2007-11-28 12:14:40 +0100 better output
r1669@llin: dpavlin | 2007-11-27 23:42:31 +0100 don't process output modules which init returned false
push debug levels > 2 to WebPAC::Normalize, just like tests do
fix MARC creation for new WebPAC::Normalize::MARC
comment out debugging
call parser with --only filter, create MARC files even when normalization rules don't return any data_structure
r1481@llin: dpavlin | 2007-11-02 15:28:43 +0100 support input modules definition right withing database definition in yaml. This basically means that you can replace type with module, so type will be depriciated soon
r1475@llin: dpavlin | 2007-11-02 14:11:19 +0100 fix ./run.pl --config conf/foobar.yml to actually work!
r1455@llin: dpavlin | 2007-11-02 11:53:42 +0100 comment out debug output
r1454@llin: dpavlin | 2007-11-02 00:39:10 +0100 check if normalization produce any data_sctructure, and complain if it isn't
r1422@llin: dpavlin | 2007-10-31 14:31:11 +0100 fix output module name
r1413@llin: dpavlin | 2007-10-31 13:13:58 +0100 Add input name to all output filters
r1394@llin: dpavlin | 2007-10-31 01:26:46 +0100 add new (exported by default) function force_array used all over the place
r1389@llin: dpavlin | 2007-10-31 00:31:54 +0100 support multiple output modules
r1378@llin: dpavlin | 2007-10-30 21:32:15 +0100 add database => 'name of current database' to every output plugin
r1322@llin: dpavlin | 2007-09-03 16:44:01 +0200 - replace Data::Dumper usage with Data::Dump - rewrite WebPAC::Store to use Class::Accessor
r1316@llin: dpavlin | 2007-08-23 22:57:13 +0200 call finish for outputs
r1312@llin: dpavlin | 2007-08-23 22:28:19 +0200 added generic output handler to run.pl (if this design proves itself, I will port all output to it)
r1289@llin: dpavlin | 2007-06-21 23:26:10 +0200 * transfer input configuration hash as input_config to input module
r1285@llin: dpavlin | 2007-06-21 14:53:43 +0200 make indexers optional [2.30]
r1280@llin: dpavlin | 2007-05-28 00:24:21 +0200 various fixes to make save_delimiters_tamplates produce output file
r1279@llin: dpavlin | 2007-05-28 00:15:04 +0200 make --validate and --validate-delimiters independent from each other again
implemented read_validate_file and read_validate_delimiters file, so you can now change paths and data for each input (which run.pl does if you use $database and/or $input variable substitution)
added fill_in to create dynamic values and some tests
r1258@llin: dpavlin | 2007-05-27 13:14:58 +0200 Changed delimiter_templates arguments to be more intuitive, disable MARC generation when running delimiters validation (this is just disk overhead)
r1254@llin: dpavlin | 2007-05-27 12:51:21 +0200 typo
r1253@llin: dpavlin | 2007-05-27 12:49:54 +0200 added accumulated delimiters templates to hold all templates which are found in this run (and thus saved to --validate-delimiters file) while generating correct stats for each input
bug fix
r1232@llin: dpavlin | 2007-05-24 14:44:53 +0200 added storing and retriving of delimiters templates to file and some basic tests
r1227@llin: dpavlin | 2007-05-24 10:26:01 +0200 generate humanly readable report or machine readable
r1225@llin: dpavlin | 2007-05-23 22:04:17 +0200 use delimiters from config.yml file and dump report to file
r1185@llin: dpavlin | 2007-04-01 23:47:05 +0200 r1183@llin: dpavlin | 2007-04-01 23:47:01 +0200 propagate changes through documentation and actuall running code :-)
added --[no-]marc-generate
r1154@llin: dpavlin | 2006-12-13 11:13:28 +0100 added -h option using Pod::Usage
r1130@llin: dpavlin | 2006-11-05 13:29:36 +0100 rotate log file on startup
another swiping API change: input->dump is gone, replaced with input->dump_ascii which is more understandable. If you want to override default behaviour (which is to use Data::Dump's dump in input->fetch_rec) define dump_ascii in low-level WebPAC::Input:: API
r1119@llin: dpavlin | 2006-11-03 21:21:35 +0100 --validate will automatically turn on --stats and won't produce any output
r1117@llin: dpavlin | 2006-11-03 20:42:24 +0100 cleanup API a bit. validate_errors in now validate_rec [0.10]
moved two more dump()s to closure, saving 2/3rd of memory and some CPU time on output which is not needed without debug level
use save_row and load_row to share data between lookups and input->fetch, added some timing for loading of lookups which revealed a big performance impact of one debug(dump())
r1097@llin: dpavlin | 2006-10-08 22:24:54 +0200 replaced generate_marc with universal have_rules [0.08]
r1076@llin: dpavlin | 2006-10-08 02:31:04 +0200 don't dump undef data_structure in debug log
r1069@llin: dpavlin | 2006-10-05 16:43:16 +0200 remove --marc-normalize and --marc-output and generate marc output only if normalize rules have marc directives
r1065@llin: dpavlin | 2006-10-05 14:54:48 +0200 actually we don't need *_load_ds, but _load_row for lookups
r1057@llin: dpavlin | 2006-09-29 22:23:05 +0200 less chatty
r1053@llin: dpavlin | 2006-09-29 22:16:01 +0200 moving to final peace of puzzle: run.pl which use new APIs
r1020@llin: dpavlin | 2006-09-26 14:40:34 +0200 refactored WebPAC::Store
r1018@llin: dpavlin | 2006-09-26 12:20:52 +0200 correct creation of lookups (by database and input)
r1014@llin: dpavlin | 2006-09-25 20:56:33 +0200 save lookups using WebPAC::Store
r1008@llin: dpavlin | 2006-09-25 17:23:42 +0200 lookup creation somewhat works
r1006@llin: dpavlin | 2006-09-25 16:04:39 +0200 added have_lookup_create
r998@llin: dpavlin | 2006-09-25 15:06:05 +0200 first cuts at depends
r992@llin: dpavlin | 2006-09-25 13:48:56 +0200 tweaks
r990@llin: dpavlin | 2006-09-25 13:12:42 +0200 new depends method to track dependencies, input in most places can be input name or hash with key 'name' which will be used as input (for exaple, from configuration file), database and input names will have correctly stripped quotes, begin removal of old lookup support
r962@llin: dpavlin | 2006-09-24 17:51:45 +0200 first cut at using WebPAC::Config
fix warning
r947@llin: dpavlin | 2006-09-12 16:45:55 +0200 fixed indexing with EstraierNative (it will be used for every database, not just first one), and create separate indexes for each database (we'll merge them at end)
work without report, too :-)
r942@llin: dpavlin | 2006-09-11 17:58:32 +0200 generate reports (validation and stats) for each input in out/report/
r928@llin: dpavlin | 2006-09-09 20:24:06 +0200 a try at implementing of validation reporter
get validation_errors
added reset_errors and all_errors to validator (real reporter is still pending), rewriten validator tests
refactored internal WebPAC::Input::* API a bit, added dump_rec, validate is now more clever and reports all errors from database at end
disable modification of records if --stats is in use
implement new modify_file format which is (hopefully) simplier than yaml and/or perl [2.27] (yes, I know... It's a sin...)
r884@llin: dpavlin | 2006-09-05 17:13:36 +0200 added preliminary support for perl native Hyper Estraier bindings
r867@llin: dpavlin | 2006-08-25 14:32:05 +0200 statistics now show data before modify_records
r857@llin: dpavlin | 2006-08-23 13:04:58 +0200 modify_records is now applied only once for each field to prevent looping of regexpes
support local configuration files in conf/{hostname}.yml
added --merge option which shuts down Hyper Estraier, merge databases which have indexes and start estmaster again.
added option --only-links which just re-create links in index (and fixed link creation which was broken for quite some time)
added --parallel option to utilize multiple CPUs in machine
added modify_records
r827@llin: dpavlin | 2006-07-10 12:17:16 +0200 add config() and id() to WebPAC::Normalize
r815@llin: dpavlin | 2006-07-07 23:19:04 +0200 delete input->lookup so that 0.03 WebPAC::Lookup don't die on it
r810@llin: dpavlin | 2006-07-05 21:53:01 +0200 change of parametars to WebPAC::Input
r801@llin: dpavlin | 2006-07-04 13:19:45 +0200 use MARC::Normalize 0.11
r800@llin: dpavlin | 2006-07-04 13:11:42 +0200 marc_duplicate now creates duplicate MARC records
r796@llin: dpavlin | 2006-07-04 12:34:13 +0200 created WebPAC::Output::MARC and cleanup run.pl
r792@llin: dpavlin | 2006-07-04 00:11:52 +0200 improved marc log output
much better output
r772@llin: dpavlin | 2006-07-02 22:14:37 +0200 rough implementation of marc_leader (not tested enough)
make --debug incremental (e.g. use -d -d -d for debug level 3)
added --dump-marc
typo
r760@llin: dpavlin | 2006-07-01 22:29:21 +0200 added MARC::Lint to check generated MARC records and report warnings
r751@llin: dpavlin | 2006-06-30 22:43:07 +0200 added --marc-normalize to specify normalization.pl and --marc-output for (optional) output marc file
r742@llin: dpavlin | 2006-06-30 01:21:24 +0200 added marc_repetable_subfield and marc_indicators, renamed marc21 to marc [2.23]
r740@llin: dpavlin | 2006-06-30 00:55:19 +0200 fix warning
r730@llin: dpavlin | 2006-06-29 21:33:48 +0200 use MARC::Record 2.0 to support utf-8 encoding in MARC http://marcpm.sourceforge.net/
r726@llin: dpavlin | 2006-06-29 17:31:13 +0200 add marc21 to normalize and create MARC file from those data [2.22]
r725@llin: dpavlin | 2006-06-29 15:48:38 +0200 support arrays for normalize in config.yml [2.21]
r719@llin: dpavlin | 2006-06-26 18:40:57 +0200 big refacture: depriciate and remove all normalisation formats except .pl sets (but old code is still available in WebPAC::Lookup::Normalize because lookups use it) [2.20]
r706@llin: dpavlin | 2006-05-22 21:39:37 +0200 create links at end of indexing (using last indexer which might be wrong!)
transfer all input variables to open_db in input module
r690@llin: dpavlin | 2006-05-18 15:47:03 +0200 make lookup optional
r682@llin: dpavlin | 2006-05-16 17:27:02 +0200 final touches on validation, added --validate to run.pl
fixes
even better output
added input filter to --only
disable indexing if --stats option is used, added delimiter line to begin of output to log
fix pod
r669@llin: dpavlin | 2006-05-15 15:18:36 +0200 added nicely formatted stats and --stats flag to run.pl
don't mess with value of --clean
r646@llin: dpavlin | 2006-05-14 15:45:57 +0200 dump --force-set debug (not warn)
r644@llin: dpavlin | 2006-05-14 15:29:07 +0200 added --force-set and implemented usage of WebPAC::Normalize::Set if normalize->path ends in .pl [2.13]
r527@llin: dpavlin | 2006-04-17 18:50:54 +0200 call finish at end of indexing, force create if index_path doesn't exist
r520@llin: dpavlin | 2006-04-17 17:09:51 +0200 added support for KinoSearch [2.12]
r519@llin: dpavlin | 2006-04-17 16:40:54 +0200 make support for WebPAC::Output::Estraier optional
r508@llin: dpavlin | 2006-03-22 02:24:34 +0100 renamed --one to --only (with legacy support for --one)
r494@llin: dpavlin | 2006-02-27 00:22:59 +0100 implemented recode option to input (for now, just for MARC)
r466@llin: dpavlin | 2006-02-19 17:45:26 +0100 create label from database name, move from Text::Iconv to Encode
report lookup which is used
r443@llin: dpavlin | 2006-01-22 14:41:18 +0100 output of rec/s and elapsed time added
r339@llin: dpavlin | 2005-12-31 15:02:38 +0100 added --one=database_name option to reindex just single database
moved clean into WebPAC::Output::Estraier, cleanup
r322@athlon: dpavlin | 2005-12-19 22:27:06 +0100 make run.pl moderatly chatty (along with other modules), added command line options (try perldoc run.pl) new target index (to reindex all) and run (to index first 100 records of each database)
r11789@llin: dpavlin | 2005-12-19 06:29:24 +0100 final tweaks, version bumping [2.00_6]
r11787@llin: dpavlin | 2005-12-19 06:10:47 +0100 MARC indexing seems to work
r11783@llin: dpavlin | 2005-12-19 05:01:56 +0100 fix
r11779@llin: dpavlin | 2005-12-19 04:07:22 +0100 and fixes to make it work
r11778@llin: dpavlin | 2005-12-19 03:59:54 +0100 move work on input
r11777@llin: dpavlin | 2005-12-19 00:02:47 +0100 refactor Input::ISIS::* [0.02]
r11743@llin: dpavlin | 2005-12-17 02:09:53 +0100 added YAML as normalization input format
r11742@llin: dpavlin | 2005-12-17 01:26:41 +0100 cleanup
r11716@llin: dpavlin | 2005-12-16 05:04:26 +0100 links to other databases allmost work
various updates to make lookups work (but they don't still)
r243@athlon: dpavlin | 2005-12-06 17:44:43 +0100 support multiple inputs into same database and multiple databases [2.00_1]
r11543@llin: dpavlin | 2005-12-05 16:48:20 +0100 added prefix
r11536@llin: dpavlin | 2005-12-05 15:29:47 +0100 change on load_ds and save_ds which not accept ONLY hash (and optional database name if not specified when calling new WebPAC::Store)
r11527@llin: dpavlin | 2005-12-05 02:29:24 +0100 unused var
r11526@llin: dpavlin | 2005-12-05 02:21:09 +0100 index multiple input sources in one database
r11519@llin: dpavlin | 2005-12-05 00:20:06 +0100 partial changes to support multiple databases
r11518@llin: dpavlin | 2005-12-04 19:43:29 +0100 renamed WebPAC::DB to WebPAC::Store
r9121@llin: dpavlin | 2005-11-25 01:24:43 +0100 read conf/config.yml which is same file that Webpacus use.
r9091@llin: dpavlin | 2005-11-24 12:49:05 +0100 small tweaks
r9064@llin: dpavlin | 2005-11-23 01:15:24 +0100 minor tweak for database routines, run.pl now iterates through all entries (to fix problem with stopping at first deleted entry)
r8992@llin: dpavlin | 2005-11-20 21:33:56 +0100 minor tweaks
r8988@llin: dpavlin | 2005-11-20 20:46:12 +0100 added real implementation for WebPAC::Output::Estraier along with run.pl script which run test indexing (which will in one point move to WebPAC::Simple or something like that)
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.
ViewVC Help | |
Powered by ViewVC 1.1.26 |