/[Grep]/lib/Grep/Source.pm
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Log of /lib/Grep/Source.pm

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (annotate)
Sticky Revision:

Revision 173 - (view) (annotate) - [select for diffs]
Modified Wed Jul 4 20:20:48 2007 UTC (16 years, 9 months ago) by dpavlin
File length: 10606 byte(s)
Diff to previous 171
single result redirect isn't fatal any more

Revision 171 - (view) (annotate) - [select for diffs]
Modified Wed Jul 4 14:10:43 2007 UTC (16 years, 9 months ago) by dpavlin
File length: 10606 byte(s)
Diff to previous 165
added link_current to Grep::Model::Item which will check do we want to use
local cached version (if newer than 24h) or fetch new one.
This *dramatically* decreases load on wikis we are scraping ;-)

Revision 165 - (view) (annotate) - [select for diffs]
Modified Mon Jun 11 22:56:53 2007 UTC (16 years, 10 months ago) by dpavlin
File length: 10420 byte(s)
Diff to previous 135
minor tweak when there are no results found (and page doesn't have result wrapper div)

Revision 135 - (view) (annotate) - [select for diffs]
Modified Wed May 2 09:37:06 2007 UTC (17 years ago) by dpavlin
File length: 10299 byte(s)
Diff to previous 134
improved debugging output

Revision 134 - (view) (annotate) - [select for diffs]
Modified Tue May 1 21:06:10 2007 UTC (17 years ago) by dpavlin
File length: 10306 byte(s)
Diff to previous 133
More DWIM changes: scrape can now also return multiple elements, which will
be separated in results by <hr/>.
Attribute values are now treated as words surrounded by word boundary (\b)
so multiple classes separated with spaces will now be treathed correctly.

Revision 133 - (view) (annotate) - [select for diffs]
Modified Tue May 1 20:50:14 2007 UTC (17 years ago) by dpavlin
File length: 10058 byte(s)
Diff to previous 132
Very experimental support for selecting multiple wrapper divs in which
we will then try to find search results -- this change is mostly needed
for sites which have so little semantic markup that we need to pass
several divs of which just one have results.
To Source modules everything should "just work"(tm).
PunBB forum is to blame for this feature, so it's new source. 

Revision 132 - (view) (annotate) - [select for diffs]
Modified Tue May 1 12:20:49 2007 UTC (17 years ago) by dpavlin
File length: 9736 byte(s)
Diff to previous 130
Implemented redirect_single_result option for sources (MoinMoin uses that), and documented
element_by_triplet and scrape

Revision 130 - (view) (annotate) - [select for diffs]
Modified Sun Apr 29 12:07:06 2007 UTC (17 years ago) by dpavlin
File length: 8332 byte(s)
Diff to previous 126
remove debugging snippet

Revision 126 - (view) (annotate) - [select for diffs]
Modified Sat Apr 28 22:53:37 2007 UTC (17 years ago) by dpavlin
File length: 8364 byte(s)
Diff to previous 125
rename templates to triplets to be in sync with method name

Revision 125 - (view) (annotate) - [select for diffs]
Modified Sat Apr 28 22:52:08 2007 UTC (17 years ago) by dpavlin
File length: 8382 byte(s)
Diff to previous 123
fix to really support triplets in templates

Revision 123 - (view) (annotate) - [select for diffs]
Modified Sat Apr 28 17:55:37 2007 UTC (17 years ago) by dpavlin
File length: 8348 byte(s)
Diff to previous 121
support legacy singe result

Revision 121 - (view) (annotate) - [select for diffs]
Modified Sat Apr 28 13:08:50 2007 UTC (17 years ago) by dpavlin
File length: 8126 byte(s)
Diff to previous 113
Refactor scraping by extracting element_by_triplet into own method, now
every parametar accepts one argument (tag) or multiple number of triplets
(tag, attribute, value)

Revision 113 - (view) (annotate) - [select for diffs]
Modified Sun Mar 25 11:41:54 2007 UTC (17 years, 1 month ago) by dpavlin
File length: 7359 byte(s)
Diff to previous 110
dump number of result nodes for better debugging

Revision 110 - (view) (annotate) - [select for diffs]
Modified Wed Mar 14 20:02:19 2007 UTC (17 years, 1 month ago) by dpavlin
File length: 7281 byte(s)
Diff to previous 109
another bunch of various tweaks, but Lucene still doesn't lock index right

Revision 109 - (view) (annotate) - [select for diffs]
Modified Wed Mar 14 18:46:37 2007 UTC (17 years, 1 month ago) by dpavlin
File length: 7177 byte(s)
Diff to previous 103
rewrite Grep::Search to be isa Jifty::Object

Revision 103 - (view) (annotate) - [select for diffs]
Modified Sun Mar 4 22:51:01 2007 UTC (17 years, 1 month ago) by dpavlin
File length: 7156 byte(s)
Diff to previous 102
fetch maximum of 15 pages from remote wiki when scraping results

Revision 102 - (view) (annotate) - [select for diffs]
Modified Sun Mar 4 22:16:23 2007 UTC (17 years, 1 month ago) by dpavlin
File length: 7155 byte(s)
Diff to previous 100
removed all debug warn(s) or move them to $self->log->debug

Revision 100 - (view) (annotate) - [select for diffs]
Modified Sat Feb 24 12:32:31 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 7094 byte(s)
Diff to previous 96
isa Jifty::Object

Revision 96 - (view) (annotate) - [select for diffs]
Modified Sat Feb 24 11:56:18 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 7024 byte(s)
Diff to previous 92
explicitly destroy $parent passed to plugins as another try to get around
Lucene's locking problems
use HTML::ResolveLink to resolve all links before add_record

Revision 92 - (view) (annotate) - [select for diffs]
Modified Sat Feb 24 11:16:05 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 6793 byte(s)
Diff to previous 88
redirect errors and warnings to warn so they are all non-fatal

Revision 88 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 21:52:29 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 6635 byte(s)
Diff to previous 86
treat missing results div as no results and don't die

Revision 86 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 21:16:44 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 6619 byte(s)
Diff to previous 85
added hooks to Grep::Source->save to keep useful snippets of html in /tmp/grep (if writable)

Revision 85 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 20:47:08 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 6446 byte(s)
Diff to previous 82
refactor most of code for scraping into common code making plugins really simple

Revision 82 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 18:10:26 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 3498 byte(s)
Diff to previous 74
really work like designed as opposed to returning first available plugin (ouch!)

Revision 74 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 17:16:33 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 3260 byte(s)
Diff to previous 73
fix warning

Revision 73 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 11:48:39 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 3251 byte(s)
Diff to previous 72
each feed now has default source class which is called for it. Added PhpWiki
source. Code still has problems with Lucene locking.

Revision 72 - (view) (annotate) - [select for diffs]
Added Fri Feb 23 09:54:28 2007 UTC (17 years, 2 months ago) by dpavlin
File length: 2562 byte(s)
another great refactoring: added new Source object which implements
searching within feed (which now can be anything as long as it produce fields
which somewhat resamble RSS feed). Source plugins implement just (site or
source format specific) fetching of items. 

Sample implementation of MoinMoin scraper, which fetch full pages from wiki
for results, so it has performance impact on remote wiki, be kind to it.

This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

  ViewVC Help
Powered by ViewVC 1.1.26