rserv/doc/rserv-explained.pod

=for comment
use 'perldoc ./rserv-explained.pod' to read this doc with formatting


=head1 How Rserv Works

Rserv is made up of the following components:

=over 4

=item *

several functions written in 'C' and compiled into the shared library rserv.so

=item *

several database tables '_rserv_*' used to track replication metadata

=item *

one trigger for each replicated table that fires on every insert/update/delete
and calls one of the 'C' functions

=item *

a collection of Perl scripts for initialising the metadata and replicating the
database updates.  Most of the Perl code is in Rserv.pm and the routines can be
run from custom scripts or from simple wrapper scripts that come with the
distribution

=back

Rserv assumes each table has a single column to uniquely identify each row.

When the master database is first created, the 'rserv_init.pl' script should be
run with the '-m' option to do the following:

=over 4

=item *

create four tables:

  _rserv_tables_   stores the name of unique column for each table
  _rserv_log_      tracks which rows of each table have been updated
  _rserv_servers_  details of slave servers (not used?)
  _rserv_sync_     tracks which updates have been seen by each slave

=item *

call MasterAddTable once for each table in the database, to add one row to
_rserv_tables_ and to create a trigger.  (Note: rserv_init.pl identifies the
unique column by locating the first column with a unique index).

=back

Once the initialisation script has been run the master is ready to run.
Whenever a database update occurs a trigger will fire and a row will be added
to the _rserv_log_ table.  Note this table only tracks which row in which table
was updated, it does not log details of the values which were changed.

When a slave database is first created, the 'rserv_init.pl' script should be
run with the '-s' option to do the following:

=over 4

=item *

create two tables:

  _rserv_slave_tables_  stores the name of unique column for each table
  _rserv_slave_sync_    tracks which updates have been seen by this slave

=item *

call MasterAddTable once for each table in the database, to add one row to
_rserv_tables_ and to create a trigger.  (Note: rserv_init.pl identifies the
unique column by locating the first column with a unique index).

=back

Once the initialisation script has been run, replication can begin.  One
replication cycle between a master and one slave consists of the following
steps:

=over 4

=item *

PrepareSnapshot (either the function in Rserv.pm or the wrapper script of the
same name) is used to create a text file of all updates the slave has not yet
seen.

=item *

ApplySnapshot is used to apply those updates to the slave

=item *

GetSyncID is used to retrieve the syncid just applied to the slave

=item *

MasterSync is used to store the syncid in the master's _rserv_sync_ record for
the slave

=cut

The next time PrepareSnapshot is run, it will only include updates which
occurred since the last snapshot the slave has seen however the log entries
will still exist in the _rserv_log_ table.  The CleanLog routine can be used to
purge entries upto a specified syncid.

The Replicate script performs all the steps listed above except the CleanLog.
Note: although the metadata framework and Rserv.pm support multiple slaves, the
wrapper scripts are all hardcoded for a single slave (number 0).

=head2 SyncIDs

Rserv uses the concept of 'SyncIDs' to track how up-to-date a slave is.
SyncIDs are merely an ascending series of numbers which are derived from a
PostgreSQL sequence.  A group of transactions may share the same SyncID:

=over 4

=item *

As updates are logged on the master, they are assigned the current value of the
_rserv_sync_seq_ sequence (not the next value)

=item *

When a snapshot is prepared, the sequence is incremented

=item *

One snapshot will include updates for all SyncID's that the slave host not yet
seen

=item *

Applying a snapshot to a slave updates I<the slave's> record of which snapshots
it has seen

=item *

GetSyncID + MasterSync are used to update I<the master's> record of which
snapshots a slave has seen

=head2 Snapshots

A snapshot is a sequence of instructions that should be applied to a slave to
bring it up to date with a given SyncID.  The file does not contain SQL
statements, but commands/comments (preceded by '--') and tab-delimited data.

There are two types of instruction: a DELETE and an UPDATE.  When the snapshot
is applied, all the records listed in the snapshot will be deleted from the
slave database and then the records listed in UPDATE instructions will be
inserted.

One consequence of this design is that it is perfectly safe to apply the same
snapshot more than once.

Another consequence of the design is that if the SyncID is not updated on the
master then the next snapshot will include everything from the last snapshot
plus all updates since then.  This is also perfectly safe.

=cut

1	dpavlin	1.1	=for comment
2			use 'perldoc ./rserv-explained.pod' to read this doc with formatting
3
4
5			=head1 How Rserv Works
6
7			Rserv is made up of the following components:
8
9			=over 4
10
11			=item *
12
13			several functions written in 'C' and compiled into the shared library rserv.so
14
15			=item *
16
17			several database tables '_rserv_*' used to track replication metadata
18
19			=item *
20
21			one trigger for each replicated table that fires on every insert/update/delete
22			and calls one of the 'C' functions
23
24			=item *
25
26			a collection of Perl scripts for initialising the metadata and replicating the
27			database updates. Most of the Perl code is in Rserv.pm and the routines can be
28			run from custom scripts or from simple wrapper scripts that come with the
29			distribution
30
31			=back
32
33			Rserv assumes each table has a single column to uniquely identify each row.
34
35			When the master database is first created, the 'rserv_init.pl' script should be
36			run with the '-m' option to do the following:
37
38			=over 4
39
40			=item *
41
42			create four tables:
43
44			_rserv_tables_ stores the name of unique column for each table
45			_rserv_log_ tracks which rows of each table have been updated
46			_rserv_servers_ details of slave servers (not used?)
47			_rserv_sync_ tracks which updates have been seen by each slave
48
49			=item *
50
51			call MasterAddTable once for each table in the database, to add one row to
52			_rserv_tables_ and to create a trigger. (Note: rserv_init.pl identifies the
53			unique column by locating the first column with a unique index).
54
55			=back
56
57			Once the initialisation script has been run the master is ready to run.
58			Whenever a database update occurs a trigger will fire and a row will be added
59			to the _rserv_log_ table. Note this table only tracks which row in which table
60			was updated, it does not log details of the values which were changed.
61
62			When a slave database is first created, the 'rserv_init.pl' script should be
63			run with the '-s' option to do the following:
64
65			=over 4
66
67			=item *
68
69			create two tables:
70
71			_rserv_slave_tables_ stores the name of unique column for each table
72			_rserv_slave_sync_ tracks which updates have been seen by this slave
73
74			=item *
75
76			call MasterAddTable once for each table in the database, to add one row to
77			_rserv_tables_ and to create a trigger. (Note: rserv_init.pl identifies the
78			unique column by locating the first column with a unique index).
79
80			=back
81
82			Once the initialisation script has been run, replication can begin. One
83			replication cycle between a master and one slave consists of the following
84			steps:
85
86			=over 4
87
88			=item *
89
90			PrepareSnapshot (either the function in Rserv.pm or the wrapper script of the
91			same name) is used to create a text file of all updates the slave has not yet
92			seen.
93
94			=item *
95
96			ApplySnapshot is used to apply those updates to the slave
97
98			=item *
99
100			GetSyncID is used to retrieve the syncid just applied to the slave
101
102			=item *
103
104			MasterSync is used to store the syncid in the master's _rserv_sync_ record for
105			the slave
106
107			=cut
108
109			The next time PrepareSnapshot is run, it will only include updates which
110			occurred since the last snapshot the slave has seen however the log entries
111			will still exist in the _rserv_log_ table. The CleanLog routine can be used to
112			purge entries upto a specified syncid.
113
114			The Replicate script performs all the steps listed above except the CleanLog.
115			Note: although the metadata framework and Rserv.pm support multiple slaves, the
116			wrapper scripts are all hardcoded for a single slave (number 0).
117
118			=head2 SyncIDs
119
120			Rserv uses the concept of 'SyncIDs' to track how up-to-date a slave is.
121			SyncIDs are merely an ascending series of numbers which are derived from a
122			PostgreSQL sequence. A group of transactions may share the same SyncID:
123
124			=over 4
125
126			=item *
127
128			As updates are logged on the master, they are assigned the current value of the
129			_rserv_sync_seq_ sequence (not the next value)
130
131			=item *
132
133			When a snapshot is prepared, the sequence is incremented
134
135			=item *
136
137			One snapshot will include updates for all SyncID's that the slave host not yet
138			seen
139
140			=item *
141
142			Applying a snapshot to a slave updates I<the slave's> record of which snapshots
143			it has seen
144
145			=item *
146
147			GetSyncID + MasterSync are used to update I<the master's> record of which
148			snapshots a slave has seen
149
150			=head2 Snapshots
151
152			A snapshot is a sequence of instructions that should be applied to a slave to
153			bring it up to date with a given SyncID. The file does not contain SQL
154			statements, but commands/comments (preceded by '--') and tab-delimited data.
155
156			There are two types of instruction: a DELETE and an UPDATE. When the snapshot
157			is applied, all the records listed in the snapshot will be deleted from the
158			slave database and then the records listed in UPDATE instructions will be
159			inserted.
160
161			One consequence of this design is that it is perfectly safe to apply the same
162			snapshot more than once.
163
164			Another consequence of the design is that if the SyncID is not updated on the
165			master then the next snapshot will include everything from the last snapshot
166			plus all updates since then. This is also perfectly safe.
167
168			=cut
169