/[Grep]/lib/Grep/Source
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Log of /lib/Grep/Source

View Directory Listing Directory Listing


Sticky Revision:

Revision 146 - Directory Listing
Modified Sun May 20 11:39:51 2007 UTC (16 years, 11 months ago) by dpavlin
Yahoo Groups search provider

Revision 134 - Directory Listing
Modified Tue May 1 21:06:10 2007 UTC (16 years, 11 months ago) by dpavlin
More DWIM changes: scrape can now also return multiple elements, which will
be separated in results by <hr/>.
Attribute values are now treated as words surrounded by word boundary (\b)
so multiple classes separated with spaces will now be treathed correctly.

Revision 133 - Directory Listing
Modified Tue May 1 20:50:14 2007 UTC (16 years, 11 months ago) by dpavlin
Very experimental support for selecting multiple wrapper divs in which
we will then try to find search results -- this change is mostly needed
for sites which have so little semantic markup that we need to pass
several divs of which just one have results.
To Source modules everything should "just work"(tm).
PunBB forum is to blame for this feature, so it's new source. 

Revision 132 - Directory Listing
Modified Tue May 1 12:20:49 2007 UTC (16 years, 11 months ago) by dpavlin
Implemented redirect_single_result option for sources (MoinMoin uses that), and documented
element_by_triplet and scrape

Revision 122 - Directory Listing
Modified Sat Apr 28 17:54:53 2007 UTC (16 years, 11 months ago) by dpavlin
from my limited sample, I would say that all DokuWiki results are inside
<div class="search_result">

Revision 121 - Directory Listing
Modified Sat Apr 28 13:08:50 2007 UTC (16 years, 11 months ago) by dpavlin
Refactor scraping by extracting element_by_triplet into own method, now
every parametar accepts one argument (tag) or multiple number of triplets
(tag, attribute, value)

Revision 115 - Directory Listing
Modified Sun Mar 25 11:55:08 2007 UTC (17 years ago) by dpavlin
collect dd instead of dt

Revision 114 - Directory Listing
Modified Sun Mar 25 11:42:49 2007 UTC (17 years ago) by dpavlin
scraper for unknown wiki engine at http://www.nslu2-linux.org/

Revision 110 - Directory Listing
Modified Wed Mar 14 20:02:19 2007 UTC (17 years, 1 month ago) by dpavlin
another bunch of various tweaks, but Lucene still doesn't lock index right

Revision 104 - Directory Listing
Modified Sun Mar 4 23:29:37 2007 UTC (17 years, 1 month ago) by dpavlin
isa Jifty::Object (so that ->log works)

Revision 102 - Directory Listing
Modified Sun Mar 4 22:16:23 2007 UTC (17 years, 1 month ago) by dpavlin
removed all debug warn(s) or move them to $self->log->debug

Revision 101 - Directory Listing
Modified Sun Mar 4 22:04:58 2007 UTC (17 years, 1 month ago) by dpavlin
added support for DokuWiki (not superb, but working)

Revision 99 - Directory Listing
Modified Sat Feb 24 12:32:09 2007 UTC (17 years, 1 month ago) by dpavlin
really save feed.xml content, put item link under title if there is no title

Revision 93 - Directory Listing
Modified Sat Feb 24 11:16:17 2007 UTC (17 years, 1 month ago) by dpavlin
WebGUI search scraper

Revision 90 - Directory Listing
Modified Fri Feb 23 23:57:42 2007 UTC (17 years, 1 month ago) by dpavlin
skip form submit for MoinMoin

Revision 87 - Directory Listing
Modified Fri Feb 23 21:17:54 2007 UTC (17 years, 1 month ago) by dpavlin
added two variants of MediaWiki

Revision 86 - Directory Listing
Modified Fri Feb 23 21:16:44 2007 UTC (17 years, 1 month ago) by dpavlin
added hooks to Grep::Source->save to keep useful snippets of html in /tmp/grep (if writable)

Revision 85 - Directory Listing
Modified Fri Feb 23 20:47:08 2007 UTC (17 years, 1 month ago) by dpavlin
refactor most of code for scraping into common code making plugins really simple

Revision 83 - Directory Listing
Modified Fri Feb 23 18:38:48 2007 UTC (17 years, 1 month ago) by dpavlin
remove *all* arguments from page uris

Revision 77 - Directory Listing
Modified Fri Feb 23 17:33:43 2007 UTC (17 years, 1 month ago) by dpavlin
remove arguments from page uri to make it unique

Revision 75 - Directory Listing
Modified Fri Feb 23 17:16:51 2007 UTC (17 years, 1 month ago) by dpavlin
remove page_tree when note needed any more

Revision 73 - Directory Listing
Modified Fri Feb 23 11:48:39 2007 UTC (17 years, 1 month ago) by dpavlin
each feed now has default source class which is called for it. Added PhpWiki
source. Code still has problems with Lucene locking.

Revision 72 - Directory Listing
Added Fri Feb 23 09:54:28 2007 UTC (17 years, 1 month ago) by dpavlin
another great refactoring: added new Source object which implements
searching within feed (which now can be anything as long as it produce fields
which somewhat resamble RSS feed). Source plugins implement just (site or
source format specific) fetching of items. 

Sample implementation of MoinMoin scraper, which fetch full pages from wiki
for results, so it has performance impact on remote wiki, be kind to it.

  ViewVC Help
Powered by ViewVC 1.1.26