Parent Directory | Revision Log
Links to HEAD: | (view) (annotate) |
Sticky Revision: |
improved debugging output
More DWIM changes: scrape can now also return multiple elements, which will be separated in results by <hr/>. Attribute values are now treated as words surrounded by word boundary (\b) so multiple classes separated with spaces will now be treathed correctly.
Very experimental support for selecting multiple wrapper divs in which we will then try to find search results -- this change is mostly needed for sites which have so little semantic markup that we need to pass several divs of which just one have results. To Source modules everything should "just work"(tm). PunBB forum is to blame for this feature, so it's new source.
Implemented redirect_single_result option for sources (MoinMoin uses that), and documented element_by_triplet and scrape
remove debugging snippet
rename templates to triplets to be in sync with method name
fix to really support triplets in templates
support legacy singe result
Refactor scraping by extracting element_by_triplet into own method, now every parametar accepts one argument (tag) or multiple number of triplets (tag, attribute, value)
dump number of result nodes for better debugging
another bunch of various tweaks, but Lucene still doesn't lock index right
rewrite Grep::Search to be isa Jifty::Object
fetch maximum of 15 pages from remote wiki when scraping results
removed all debug warn(s) or move them to $self->log->debug
isa Jifty::Object
explicitly destroy $parent passed to plugins as another try to get around Lucene's locking problems use HTML::ResolveLink to resolve all links before add_record
redirect errors and warnings to warn so they are all non-fatal
treat missing results div as no results and don't die
added hooks to Grep::Source->save to keep useful snippets of html in /tmp/grep (if writable)
refactor most of code for scraping into common code making plugins really simple
really work like designed as opposed to returning first available plugin (ouch!)
fix warning
each feed now has default source class which is called for it. Added PhpWiki source. Code still has problems with Lucene locking.
another great refactoring: added new Source object which implements searching within feed (which now can be anything as long as it produce fields which somewhat resamble RSS feed). Source plugins implement just (site or source format specific) fetching of items. Sample implementation of MoinMoin scraper, which fetch full pages from wiki for results, so it has performance impact on remote wiki, be kind to it.
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.
ViewVC Help | |
Powered by ViewVC 1.1.26 |