1 |
Sack - sharding memory hash in perl |
2 |
|
3 |
Main design goal is to have interactive environment to query |
4 |
perl hashes which are bigger than memory on single machine. |
5 |
|
6 |
It implemented using TCP sockets between perl processes. |
7 |
This allows horizontal scalability both on multi-core machines |
8 |
as well as across the network to additional machines. |
9 |
|
10 |
Reading data into hash is done using any perl module which |
11 |
returns perl hash and supports offset and limit to select just |
12 |
subset of data (this is required to create disjunctive shards). |
13 |
|
14 |
Views are small perl snippets which are called for each record |
15 |
on each shard with $rec. Views create data in $out hash which |
16 |
is automatically merged in output. |
17 |
|
18 |
You can influence default shard merge by adding + (plus sign) |
19 |
in name of your key to indicate that key => values pairs below |
20 |
should have sumed values when combining shards. |
21 |
|
22 |
If you have long field names, add # to name of key above value |
23 |
which you want to turn into integer value. This will reduce |
24 |
memory usage on master node. |
25 |
|
26 |
|
27 |
USAGE |
28 |
|
29 |
1. create cloud definition |
30 |
|
31 |
etc/cloud-name IP addresses of nodes similar to /etc/hosts |
32 |
etc/cloud-name.ssh ssh configuration (user, compression etc) |
33 |
|
34 |
2. shard data |
35 |
|
36 |
./bin/shards.pl (hard-coded to use WebPAC::Input::ISI for now) |
37 |
|
38 |
./bin/couchdb2shards.pl |
39 |
|
40 |
3. start server |
41 |
|
42 |
CLOUD=etc/cloud-name ./lib/Sack/Server.pm |
43 |
|
44 |
4. start repl |
45 |
|
46 |
./lib/Sack/REPL.pm |
47 |
|
48 |
5. start locally for development |
49 |
|
50 |
./bin/split.sh |
51 |
|