This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
Log of /lib/Grep/Source
Directory Listing
Revision
169 -
Directory Listing
Modified
Wed Jul 4 09:58:06 2007 UTC
(17 years ago)
by
dpavlin
update rest of code to new created_on and last_update
Revision
146 -
Directory Listing
Modified
Sun May 20 11:39:51 2007 UTC
(17 years, 1 month ago)
by
dpavlin
Yahoo Groups search provider
Revision
134 -
Directory Listing
Modified
Tue May 1 21:06:10 2007 UTC
(17 years, 2 months ago)
by
dpavlin
More DWIM changes: scrape can now also return multiple elements, which will
be separated in results by <hr/>.
Attribute values are now treated as words surrounded by word boundary (\b)
so multiple classes separated with spaces will now be treathed correctly.
Revision
133 -
Directory Listing
Modified
Tue May 1 20:50:14 2007 UTC
(17 years, 2 months ago)
by
dpavlin
Very experimental support for selecting multiple wrapper divs in which
we will then try to find search results -- this change is mostly needed
for sites which have so little semantic markup that we need to pass
several divs of which just one have results.
To Source modules everything should "just work"(tm).
PunBB forum is to blame for this feature, so it's new source.
Revision
132 -
Directory Listing
Modified
Tue May 1 12:20:49 2007 UTC
(17 years, 2 months ago)
by
dpavlin
Implemented redirect_single_result option for sources (MoinMoin uses that), and documented
element_by_triplet and scrape
Revision
122 -
Directory Listing
Modified
Sat Apr 28 17:54:53 2007 UTC
(17 years, 2 months ago)
by
dpavlin
from my limited sample, I would say that all DokuWiki results are inside
<div class="search_result">
Revision
121 -
Directory Listing
Modified
Sat Apr 28 13:08:50 2007 UTC
(17 years, 2 months ago)
by
dpavlin
Refactor scraping by extracting element_by_triplet into own method, now
every parametar accepts one argument (tag) or multiple number of triplets
(tag, attribute, value)
Revision
115 -
Directory Listing
Modified
Sun Mar 25 11:55:08 2007 UTC
(17 years, 3 months ago)
by
dpavlin
collect dd instead of dt
Revision
110 -
Directory Listing
Modified
Wed Mar 14 20:02:19 2007 UTC
(17 years, 3 months ago)
by
dpavlin
another bunch of various tweaks, but Lucene still doesn't lock index right
Revision
104 -
Directory Listing
Modified
Sun Mar 4 23:29:37 2007 UTC
(17 years, 4 months ago)
by
dpavlin
isa Jifty::Object (so that ->log works)
Revision
102 -
Directory Listing
Modified
Sun Mar 4 22:16:23 2007 UTC
(17 years, 4 months ago)
by
dpavlin
removed all debug warn(s) or move them to $self->log->debug
Revision
101 -
Directory Listing
Modified
Sun Mar 4 22:04:58 2007 UTC
(17 years, 4 months ago)
by
dpavlin
added support for DokuWiki (not superb, but working)
Revision
99 -
Directory Listing
Modified
Sat Feb 24 12:32:09 2007 UTC
(17 years, 4 months ago)
by
dpavlin
really save feed.xml content, put item link under title if there is no title
Revision
93 -
Directory Listing
Modified
Sat Feb 24 11:16:17 2007 UTC
(17 years, 4 months ago)
by
dpavlin
WebGUI search scraper
Revision
90 -
Directory Listing
Modified
Fri Feb 23 23:57:42 2007 UTC
(17 years, 4 months ago)
by
dpavlin
skip form submit for MoinMoin
Revision
87 -
Directory Listing
Modified
Fri Feb 23 21:17:54 2007 UTC
(17 years, 4 months ago)
by
dpavlin
added two variants of MediaWiki
Revision
86 -
Directory Listing
Modified
Fri Feb 23 21:16:44 2007 UTC
(17 years, 4 months ago)
by
dpavlin
added hooks to Grep::Source->save to keep useful snippets of html in /tmp/grep (if writable)
Revision
85 -
Directory Listing
Modified
Fri Feb 23 20:47:08 2007 UTC
(17 years, 4 months ago)
by
dpavlin
refactor most of code for scraping into common code making plugins really simple
Revision
83 -
Directory Listing
Modified
Fri Feb 23 18:38:48 2007 UTC
(17 years, 4 months ago)
by
dpavlin
remove *all* arguments from page uris
Revision
77 -
Directory Listing
Modified
Fri Feb 23 17:33:43 2007 UTC
(17 years, 4 months ago)
by
dpavlin
remove arguments from page uri to make it unique
Revision
75 -
Directory Listing
Modified
Fri Feb 23 17:16:51 2007 UTC
(17 years, 4 months ago)
by
dpavlin
remove page_tree when note needed any more
Revision
73 -
Directory Listing
Modified
Fri Feb 23 11:48:39 2007 UTC
(17 years, 4 months ago)
by
dpavlin
each feed now has default source class which is called for it. Added PhpWiki
source. Code still has problems with Lucene locking.
Revision
72 -
Directory Listing
Added
Fri Feb 23 09:54:28 2007 UTC
(17 years, 4 months ago)
by
dpavlin
another great refactoring: added new Source object which implements
searching within feed (which now can be anything as long as it produce fields
which somewhat resamble RSS feed). Source plugins implement just (site or
source format specific) fetching of items.
Sample implementation of MoinMoin scraper, which fetch full pages from wiki
for results, so it has performance impact on remote wiki, be kind to it.