Contents of /trunk/README

Sack - sharding memory hash in perl

Main design goal is to have interactive environment to query
perl hashes which are bigger than memory on single machine.

It implemented using TCP sockets between perl processes.
This allows horizontal scalability both on multi-core machines
as well as across the network to additional machines.

Reading data into hash is done using any perl module which
returns perl hash and supports offset and limit to select just
subset of data (this is required to create disjunctive shards).

Views are small perl snippets which are called for each record
on each shard with $rec. Views create data in $out hash which
is automatically merged in output.

You can influence default shard merge by adding + (plus sign)
in name of your key to indicate that key => values pairs below
should have sumed values when combining shards.

If you have long field names, add # to name of key above value
which you want to turn into integer value. This will reduce
memory usage on master node.


USAGE

1. create cloud definition

 etc/cloud-name       IP addresses of nodes similar to /etc/hosts
 etc/cloud-name.ssh   ssh configuration (user, compression etc)

2. shard data

 ./bin/shards.pl (hard-coded to use WebPAC::Input::ISI for now)

 ./bin/couchdb2shards.pl

3. start server

 CLOUD=etc/cloud-name ./lib/Sack/Server.pm

4. start repl

 ./lib/Sack/REPL.pm

5. start locally for development

 ./bin/split.sh

1	Sack - sharding memory hash in perl
2
3	Main design goal is to have interactive environment to query
4	perl hashes which are bigger than memory on single machine.
5
6	It implemented using TCP sockets between perl processes.
7	This allows horizontal scalability both on multi-core machines
8	as well as across the network to additional machines.
9
10	Reading data into hash is done using any perl module which
11	returns perl hash and supports offset and limit to select just
12	subset of data (this is required to create disjunctive shards).
13
14	Views are small perl snippets which are called for each record
15	on each shard with $rec. Views create data in $out hash which
16	is automatically merged in output.
17
18	You can influence default shard merge by adding + (plus sign)
19	in name of your key to indicate that key => values pairs below
20	should have sumed values when combining shards.
21
22	If you have long field names, add # to name of key above value
23	which you want to turn into integer value. This will reduce
24	memory usage on master node.
25
26
27	USAGE
28
29	1. create cloud definition
30
31	etc/cloud-name IP addresses of nodes similar to /etc/hosts
32	etc/cloud-name.ssh ssh configuration (user, compression etc)
33
34	2. shard data
35
36	./bin/shards.pl (hard-coded to use WebPAC::Input::ISI for now)
37
38	./bin/couchdb2shards.pl
39
40	3. start server
41
42	CLOUD=etc/cloud-name ./lib/Sack/Server.pm
43
44	4. start repl
45
46	./lib/Sack/REPL.pm
47
48	5. start locally for development
49
50	./bin/split.sh
51