rserv/doc/rserv-explained.pod

=for comment
use 'perldoc ./rserv-explained.pod' to read this doc with formatting


=head1 How Rserv Works

Rserv is made up of the following components:

=over 4

=item *

several functions written in 'C' and compiled into the shared library rserv.so

=item *

several database tables '_rserv_*' used to track replication metadata

=item *

one trigger for each replicated table that fires on every insert/update/delete
and calls one of the 'C' functions

=item *

a collection of Perl scripts for initialising the metadata and replicating the
database updates.  Most of the Perl code is in Rserv.pm and the routines can be
run from custom scripts or from simple wrapper scripts that come with the
distribution

=back

Rserv assumes each table has a single column to uniquely identify each row.

When the master database is first created, the 'rserv_init.pl' script should be
run with the '-m' option to do the following:

=over 4

=item *

create four tables:

  _rserv_tables_   stores the name of unique column for each table
  _rserv_log_      tracks which rows of each table have been updated
  _rserv_servers_  details of slave servers (not used?)
  _rserv_sync_     tracks which updates have been seen by each slave

=item *

call MasterAddTable once for each table in the database, to add one row to
_rserv_tables_ and to create a trigger.  (Note: rserv_init.pl identifies the
unique column by locating the first column with a unique index).

=back

Once the initialisation script has been run the master is ready to run.
Whenever a database update occurs a trigger will fire and a row will be added
to the _rserv_log_ table.  Note this table only tracks which row in which table
was updated, it does not log details of the values which were changed.

When a slave database is first created, the 'rserv_init.pl' script should be
run with the '-s' option to do the following:

=over 4

=item *

create two tables:

  _rserv_slave_tables_  stores the name of unique column for each table
  _rserv_slave_sync_    tracks which updates have been seen by this slave

=item *

call MasterAddTable once for each table in the database, to add one row to
_rserv_tables_ and to create a trigger.  (Note: rserv_init.pl identifies the
unique column by locating the first column with a unique index).

=back

Once the initialisation script has been run, replication can begin.  One
replication cycle between a master and one slave consists of the following
steps:

=over 4

=item *

PrepareSnapshot (either the function in Rserv.pm or the wrapper script of the
same name) is used to create a text file of all updates the slave has not yet
seen.

=item *

ApplySnapshot is used to apply those updates to the slave

=item *

GetSyncID is used to retrieve the syncid just applied to the slave

=item *

MasterSync is used to store the syncid in the master's _rserv_sync_ record for
the slave

=cut

The next time PrepareSnapshot is run, it will only include updates which
occurred since the last snapshot the slave has seen however the log entries
will still exist in the _rserv_log_ table.  The CleanLog routine can be used to
purge entries upto a specified syncid.

The Replicate script performs all the steps listed above except the CleanLog.
Note: although the metadata framework and Rserv.pm support multiple slaves, the
wrapper scripts are all hardcoded for a single slave (number 0).

=head2 SyncIDs

Rserv uses the concept of 'SyncIDs' to track how up-to-date a slave is.
SyncIDs are merely an ascending series of numbers which are derived from a
PostgreSQL sequence.  A group of transactions may share the same SyncID:

=over 4

=item *

As updates are logged on the master, they are assigned the current value of the
_rserv_sync_seq_ sequence (not the next value)

=item *

When a snapshot is prepared, the sequence is incremented

=item *

One snapshot will include updates for all SyncID's that the slave host not yet
seen

=item *

Applying a snapshot to a slave updates I<the slave's> record of which snapshots
it has seen

=item *

GetSyncID + MasterSync are used to update I<the master's> record of which
snapshots a slave has seen

=head2 Snapshots

A snapshot is a sequence of instructions that should be applied to a slave to
bring it up to date with a given SyncID.  The file does not contain SQL
statements, but commands/comments (preceded by '--') and tab-delimited data.

There are two types of instruction: a DELETE and an UPDATE.  When the snapshot
is applied, all the records listed in the snapshot will be deleted from the
slave database and then the records listed in UPDATE instructions will be
inserted.

One consequence of this design is that it is perfectly safe to apply the same
snapshot more than once.

Another consequence of the design is that if the SyncID is not updated on the
master then the next snapshot will include everything from the last snapshot
plus all updates since then.  This is also perfectly safe.

=cut

1	=for comment
2	use 'perldoc ./rserv-explained.pod' to read this doc with formatting
3
4
5	=head1 How Rserv Works
6
7	Rserv is made up of the following components:
8
9	=over 4
10
11	=item *
12
13	several functions written in 'C' and compiled into the shared library rserv.so
14
15	=item *
16
17	several database tables '_rserv_*' used to track replication metadata
18
19	=item *
20
21	one trigger for each replicated table that fires on every insert/update/delete
22	and calls one of the 'C' functions
23
24	=item *
25
26	a collection of Perl scripts for initialising the metadata and replicating the
27	database updates. Most of the Perl code is in Rserv.pm and the routines can be
28	run from custom scripts or from simple wrapper scripts that come with the
29	distribution
30
31	=back
32
33	Rserv assumes each table has a single column to uniquely identify each row.
34
35	When the master database is first created, the 'rserv_init.pl' script should be
36	run with the '-m' option to do the following:
37
38	=over 4
39
40	=item *
41
42	create four tables:
43
44	_rserv_tables_ stores the name of unique column for each table
45	_rserv_log_ tracks which rows of each table have been updated
46	_rserv_servers_ details of slave servers (not used?)
47	_rserv_sync_ tracks which updates have been seen by each slave
48
49	=item *
50
51	call MasterAddTable once for each table in the database, to add one row to
52	_rserv_tables_ and to create a trigger. (Note: rserv_init.pl identifies the
53	unique column by locating the first column with a unique index).
54
55	=back
56
57	Once the initialisation script has been run the master is ready to run.
58	Whenever a database update occurs a trigger will fire and a row will be added
59	to the _rserv_log_ table. Note this table only tracks which row in which table
60	was updated, it does not log details of the values which were changed.
61
62	When a slave database is first created, the 'rserv_init.pl' script should be
63	run with the '-s' option to do the following:
64
65	=over 4
66
67	=item *
68
69	create two tables:
70
71	_rserv_slave_tables_ stores the name of unique column for each table
72	_rserv_slave_sync_ tracks which updates have been seen by this slave
73
74	=item *
75
76	call MasterAddTable once for each table in the database, to add one row to
77	_rserv_tables_ and to create a trigger. (Note: rserv_init.pl identifies the
78	unique column by locating the first column with a unique index).
79
80	=back
81
82	Once the initialisation script has been run, replication can begin. One
83	replication cycle between a master and one slave consists of the following
84	steps:
85
86	=over 4
87
88	=item *
89
90	PrepareSnapshot (either the function in Rserv.pm or the wrapper script of the
91	same name) is used to create a text file of all updates the slave has not yet
92	seen.
93
94	=item *
95
96	ApplySnapshot is used to apply those updates to the slave
97
98	=item *
99
100	GetSyncID is used to retrieve the syncid just applied to the slave
101
102	=item *
103
104	MasterSync is used to store the syncid in the master's _rserv_sync_ record for
105	the slave
106
107	=cut
108
109	The next time PrepareSnapshot is run, it will only include updates which
110	occurred since the last snapshot the slave has seen however the log entries
111	will still exist in the _rserv_log_ table. The CleanLog routine can be used to
112	purge entries upto a specified syncid.
113
114	The Replicate script performs all the steps listed above except the CleanLog.
115	Note: although the metadata framework and Rserv.pm support multiple slaves, the
116	wrapper scripts are all hardcoded for a single slave (number 0).
117
118	=head2 SyncIDs
119
120	Rserv uses the concept of 'SyncIDs' to track how up-to-date a slave is.
121	SyncIDs are merely an ascending series of numbers which are derived from a
122	PostgreSQL sequence. A group of transactions may share the same SyncID:
123
124	=over 4
125
126	=item *
127
128	As updates are logged on the master, they are assigned the current value of the
129	_rserv_sync_seq_ sequence (not the next value)
130
131	=item *
132
133	When a snapshot is prepared, the sequence is incremented
134
135	=item *
136
137	One snapshot will include updates for all SyncID's that the slave host not yet
138	seen
139
140	=item *
141
142	Applying a snapshot to a slave updates I<the slave's> record of which snapshots
143	it has seen
144
145	=item *
146
147	GetSyncID + MasterSync are used to update I<the master's> record of which
148	snapshots a slave has seen
149
150	=head2 Snapshots
151
152	A snapshot is a sequence of instructions that should be applied to a slave to
153	bring it up to date with a given SyncID. The file does not contain SQL
154	statements, but commands/comments (preceded by '--') and tab-delimited data.
155
156	There are two types of instruction: a DELETE and an UPDATE. When the snapshot
157	is applied, all the records listed in the snapshot will be deleted from the
158	slave database and then the records listed in UPDATE instructions will be
159	inserted.
160
161	One consequence of this design is that it is perfectly safe to apply the same
162	snapshot more than once.
163
164	Another consequence of the design is that if the SyncID is not updated on the
165	master then the next snapshot will include everything from the last snapshot
166	plus all updates since then. This is also perfectly safe.
167
168	=cut
169