1 |
dpavlin |
23 |
Sack - sharding memory hash in perl |
2 |
|
|
|
3 |
|
|
Main design goal is to have interactive environment to query |
4 |
|
|
perl hashes which are bigger than memory on single machine. |
5 |
|
|
|
6 |
|
|
It implemented using TCP sockets between perl processes. |
7 |
|
|
This allows horizontal scalability both on multi-core machines |
8 |
|
|
as well as across the network to additional machines. |
9 |
|
|
|
10 |
|
|
Reading data into hash is done using any perl module which |
11 |
|
|
returns perl hash and supports offset and limit to select just |
12 |
|
|
subset of data (this is required to create disjunctive shards). |
13 |
|
|
|
14 |
|
|
Views are small perl snippets which are called for each record |
15 |
|
|
on each shard with $rec. Views create data in $out hash which |
16 |
|
|
is automatically merged in output. |
17 |
|
|
|
18 |
dpavlin |
36 |
You can influence default shard merge by adding + (plus sign) |
19 |
|
|
in name of your key to indicate that key => values pairs below |
20 |
|
|
should have sumed values when combining shards. |
21 |
dpavlin |
23 |
|
22 |
dpavlin |
36 |
If you have long field names, add # to name of key above value |
23 |
dpavlin |
128 |
which you want to turn into integer value. This will reduce |
24 |
|
|
memory usage on master node. |