1 |
=for comment |
2 |
use 'perldoc ./rserv-explained.pod' to read this doc with formatting |
3 |
|
4 |
|
5 |
=head1 How Rserv Works |
6 |
|
7 |
Rserv is made up of the following components: |
8 |
|
9 |
=over 4 |
10 |
|
11 |
=item * |
12 |
|
13 |
several functions written in 'C' and compiled into the shared library rserv.so |
14 |
|
15 |
=item * |
16 |
|
17 |
several database tables '_rserv_*' used to track replication metadata |
18 |
|
19 |
=item * |
20 |
|
21 |
one trigger for each replicated table that fires on every insert/update/delete |
22 |
and calls one of the 'C' functions |
23 |
|
24 |
=item * |
25 |
|
26 |
a collection of Perl scripts for initialising the metadata and replicating the |
27 |
database updates. Most of the Perl code is in Rserv.pm and the routines can be |
28 |
run from custom scripts or from simple wrapper scripts that come with the |
29 |
distribution |
30 |
|
31 |
=back |
32 |
|
33 |
Rserv assumes each table has a single column to uniquely identify each row. |
34 |
|
35 |
When the master database is first created, the 'rserv_init.pl' script should be |
36 |
run with the '-m' option to do the following: |
37 |
|
38 |
=over 4 |
39 |
|
40 |
=item * |
41 |
|
42 |
create four tables: |
43 |
|
44 |
_rserv_tables_ stores the name of unique column for each table |
45 |
_rserv_log_ tracks which rows of each table have been updated |
46 |
_rserv_servers_ details of slave servers (not used?) |
47 |
_rserv_sync_ tracks which updates have been seen by each slave |
48 |
|
49 |
=item * |
50 |
|
51 |
call MasterAddTable once for each table in the database, to add one row to |
52 |
_rserv_tables_ and to create a trigger. (Note: rserv_init.pl identifies the |
53 |
unique column by locating the first column with a unique index). |
54 |
|
55 |
=back |
56 |
|
57 |
Once the initialisation script has been run the master is ready to run. |
58 |
Whenever a database update occurs a trigger will fire and a row will be added |
59 |
to the _rserv_log_ table. Note this table only tracks which row in which table |
60 |
was updated, it does not log details of the values which were changed. |
61 |
|
62 |
When a slave database is first created, the 'rserv_init.pl' script should be |
63 |
run with the '-s' option to do the following: |
64 |
|
65 |
=over 4 |
66 |
|
67 |
=item * |
68 |
|
69 |
create two tables: |
70 |
|
71 |
_rserv_slave_tables_ stores the name of unique column for each table |
72 |
_rserv_slave_sync_ tracks which updates have been seen by this slave |
73 |
|
74 |
=item * |
75 |
|
76 |
call MasterAddTable once for each table in the database, to add one row to |
77 |
_rserv_tables_ and to create a trigger. (Note: rserv_init.pl identifies the |
78 |
unique column by locating the first column with a unique index). |
79 |
|
80 |
=back |
81 |
|
82 |
Once the initialisation script has been run, replication can begin. One |
83 |
replication cycle between a master and one slave consists of the following |
84 |
steps: |
85 |
|
86 |
=over 4 |
87 |
|
88 |
=item * |
89 |
|
90 |
PrepareSnapshot (either the function in Rserv.pm or the wrapper script of the |
91 |
same name) is used to create a text file of all updates the slave has not yet |
92 |
seen. |
93 |
|
94 |
=item * |
95 |
|
96 |
ApplySnapshot is used to apply those updates to the slave |
97 |
|
98 |
=item * |
99 |
|
100 |
GetSyncID is used to retrieve the syncid just applied to the slave |
101 |
|
102 |
=item * |
103 |
|
104 |
MasterSync is used to store the syncid in the master's _rserv_sync_ record for |
105 |
the slave |
106 |
|
107 |
=cut |
108 |
|
109 |
The next time PrepareSnapshot is run, it will only include updates which |
110 |
occurred since the last snapshot the slave has seen however the log entries |
111 |
will still exist in the _rserv_log_ table. The CleanLog routine can be used to |
112 |
purge entries upto a specified syncid. |
113 |
|
114 |
The Replicate script performs all the steps listed above except the CleanLog. |
115 |
Note: although the metadata framework and Rserv.pm support multiple slaves, the |
116 |
wrapper scripts are all hardcoded for a single slave (number 0). |
117 |
|
118 |
=head2 SyncIDs |
119 |
|
120 |
Rserv uses the concept of 'SyncIDs' to track how up-to-date a slave is. |
121 |
SyncIDs are merely an ascending series of numbers which are derived from a |
122 |
PostgreSQL sequence. A group of transactions may share the same SyncID: |
123 |
|
124 |
=over 4 |
125 |
|
126 |
=item * |
127 |
|
128 |
As updates are logged on the master, they are assigned the current value of the |
129 |
_rserv_sync_seq_ sequence (not the next value) |
130 |
|
131 |
=item * |
132 |
|
133 |
When a snapshot is prepared, the sequence is incremented |
134 |
|
135 |
=item * |
136 |
|
137 |
One snapshot will include updates for all SyncID's that the slave host not yet |
138 |
seen |
139 |
|
140 |
=item * |
141 |
|
142 |
Applying a snapshot to a slave updates I<the slave's> record of which snapshots |
143 |
it has seen |
144 |
|
145 |
=item * |
146 |
|
147 |
GetSyncID + MasterSync are used to update I<the master's> record of which |
148 |
snapshots a slave has seen |
149 |
|
150 |
=head2 Snapshots |
151 |
|
152 |
A snapshot is a sequence of instructions that should be applied to a slave to |
153 |
bring it up to date with a given SyncID. The file does not contain SQL |
154 |
statements, but commands/comments (preceded by '--') and tab-delimited data. |
155 |
|
156 |
There are two types of instruction: a DELETE and an UPDATE. When the snapshot |
157 |
is applied, all the records listed in the snapshot will be deleted from the |
158 |
slave database and then the records listed in UPDATE instructions will be |
159 |
inserted. |
160 |
|
161 |
One consequence of this design is that it is perfectly safe to apply the same |
162 |
snapshot more than once. |
163 |
|
164 |
Another consequence of the design is that if the SyncID is not updated on the |
165 |
master then the next snapshot will include everything from the last snapshot |
166 |
plus all updates since then. This is also perfectly safe. |
167 |
|
168 |
=cut |
169 |
|