/[rserv]/doc/ErServer_internals.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /doc/ErServer_internals.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations)
Mon Oct 27 20:55:12 2003 UTC (20 years, 6 months ago) by dpavlin
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
How ErServer 1.2 (newer code than rservimp is based on) works

1 dpavlin 1.1 This article discuss ErServer 1.2, new version which uses Java replication
2     engine. URL: http://www.varlena.com/GeneralBits/Tidbits/erserver_works.html
3    
4    
5     Quick Outline of ErServer Internals
6    
7     by Andrew Sullivan
8     23-Oct-2003
9    
10     Abstract
11    
12     This is a quick conceptual outline of what erserver does once you have
13     it working, if you ever get it there. That is, this is not an outline
14     of how the setup scripts, etc., work (or don't), but how a tuple on
15     the master ends up on the slave(s).
16    
17     I. Required support
18    
19     A. On the master
20    
21     The master must have the following tables set up:
22     _rserv_log_1_
23     _rserv_log_2_
24     _rserv_sync_
25     _rserv_tables_
26    
27     Any other tables are not strictly required. There is a table that
28     normally gets set up by the system, _rserv_servers_, but it is never
29     actually used. It was apparently a stub for some planned features.
30     There is a server number in _rserv_sync_, and you could therefore in
31     principle use _rserv_servers_ for convenient lookups, but it won't get
32     used by anything.
33    
34     The master must also have a couple of installed sequences:
35     _rserv_active_log_id_
36     _rserv_old_log_status_
37     _rserv_sync_seq_
38    
39     Again, there is a sequence which exists for _rserv_servers_. It's not
40     actually used.
41    
42     The function for the replication log trigger must be created. It's
43     generally called _rserv_log_, but the important detail is that it is
44     what is called by the trigger created on each replicated table, and
45     that it in turn is a C function which is compiled from src/erserver.c;
46     it should compile to a function called erserver.so, which will be in
47     the directory where you installed erserver. (In case you're wondering,
48     by the way, yes this breaks the nice $libdir trick in 7.2 and later.)
49    
50     B. On the slave
51    
52     On the slave, you need exact copies of the tables you are going to
53     replicate, plus you need _rserv_slave_tables_ and _rserv_slave_sync_.
54    
55     C. General
56    
57     All this is set up as a database super user. In fact, you really only
58     need to be super user to install the function, because it's a C
59     function. In practice, it's easier to set up the relevant bits as
60     super user, and then make some limited GRANTs to the relevant tables.
61    
62     II. Queueing things to be replicated
63    
64     A table which is to be replicated must meet several conditions on the
65     master.
66    
67     A. It must be listed in the table _rserv_tables_:
68    
69     Column Type Modifiers
70     tname name
71     cname name
72     reloid oid
73     key integer
74    
75     The tname is the table name. The cname is the unique, not null key of
76     the table for replication purposes (the setup script creates
77     _ers_uniq, but there's nothing requiring that; if you had
78     single-column primary keys on all your tables, you could use them).
79     The reloid is the oid in pg_class for that table (i.e. select a.oid
80     from pg_class a, _rserv_tables_ b where a.relname = b.tname and
81     b.tname = [target table]), and the key is the attnum from pg_attribute
82     for that field (i.e. select attnum from pg_attribute a, _rserv_tables_
83     b where a.attname = b.cname and a.attrelid = b.reloid and b.tname =
84     [target table]; note that you'd have to have reloid set correctly for
85     this to work).
86    
87     B. Trigger Installed
88    
89     On the master, it must have the replication trigger installed. It is
90     normally called _rserv_trigger_t_, but the name is not important. What
91     _is_ important is that the trigger calls the _rserv_log_ C function
92     (the actual relevant bit is that the function which is called is the
93     function, distributed with the erserver source, which compiles to
94     rserv.so).
95    
96     C. Permissions
97    
98     Users writing to a replicated table must have permissions to read and
99     write on _rserv_log_1_ and _rserv_log_2_, and must be able to select
100     from _rserv_active_log_id_.
101    
102     D. Trigger action
103    
104     Suppose we are replicating target_table, then. We have the relevant
105     entries and the trigger is set up. When you insert a record into
106     target_table, you will cause the trigger to fire. It will insert into
107     the current log table (see below about cleanlog, but if this is the
108     first time, it should be _rserv_log_1_) a record with the reloid (see
109     _rserv_tables_), the logid (which is the xid from the transaction),
110     the timestamp, whether the record is deleted (-1 is an insert, 0 is an
111     update, 1 a delete), and the value of the key (that is, the actual
112     value in the key field for that row).
113    
114     III. Replicating
115    
116     A. Things on the slave
117    
118     In order to replicate to a slave, you must have set up
119     _rserv_slave_tables_ on the slave analogously to how it is set up on
120     the master. Note that the oids are unlikely to be the same, so you
121     can't just copy the values from the master. Also, your slave tables
122     must have _exactly_ the same structure as they do on the master. In
123     particular, the key field must be available on the slave. Some
124     versions of the setup scripts apparently have done things in the wrong
125     order, and the _ers_uniq field is not correctly set up on the slave.
126     You'll be wanting to fix that. If you don't have the slave completely
127     set up before the sync starts, you will lose data.
128    
129     B. Configuration file.
130    
131     If people ask for an annotation of the config file, I'll provide one.
132     I think the comments are not too bad in there, though. It should be
133     relatively obvious if you have a misconfiguration there. (Send mail to
134     the [1]erserver-general list if not.)
135    
136     C. Run the replication engine
137    
138     You _must_ use the Sun JDK to run this software: apparently there are
139     bits of it which depend on special Sun bits. Java is only actually
140     portable if it's written that way, and this isn't. Patches are
141     welcome. The logs normally end up /path/to/erserver/logs.
142    
143     D. Worth noting
144    
145     If you look at the query generated by the replication engine, you will
146     note that it joins _rserv_log_1_ (or _2_) to the replicated tables,
147     and replicates the version of the row in the table. That means that
148     _old versions of the row are not replicated_. This is efficient, but
149     can surprise you if you're not expecting it.
150    
151     E. What the engine does
152    
153     The engine looks at _rserv_sync_ for the master and the current
154     thread's target slave, and goes through the current log (either
155     _rserv_log_1_ or _2_, depending on the value in _rserv_active_log_id_
156     and the state of _rserv_old_log_status_) and gets everything that is
157     affected _after_ the most recent successful sync for the target
158     server, and which is not currently active. It joins that result to the
159     tables in question (using LEFT JOIN so that it gets null rows for the
160     missing ones), and then applies all the affected rows in the result
161     set for each table to the slave. All this is done in one transaction
162     on the master and one on the slave. The master commits last (which
163     means it is possible that the master crashes without knowing a slave
164     has been updated, but not that the slave crashes when the master
165     thinks rows have been committed, but haven't been).
166    
167     IV. CleanLog
168    
169     A. What is a CleanLog?
170    
171     The erserver engine uses more than one log into which it writes in
172     order to keep track of the rows to be replicated. That could be
173     inefficient, or it could cause maintenance to have to be done
174     regularly. Instead, a strategy of log rotation is used.
175    
176     B. How does it work?
177    
178     The _rserv_log_ trigger inserts its results into whichever table is
179     dictated by the current value of _rserv_active_log_id_. So, when it is
180     time to switch logs, the clean log portion of the program just selects
181     nextval('_rserv_active_log_id_'). The sequence cycles, so it returns
182     to 1 after 2.
183    
184     Of course, immediately after the switch, there is a period in which
185     old transactions still are using _rserv_log_1_ while _rserv_log_2_ is
186     filling up. The replication engine knows to use both logs by the value
187     of _rserv_old_log_status_. Once it has caught up all the slaves with
188     all the items in _rserv_log_1_, it sets the _rserv_old_log_status_
189     back to normal, and then clean log knows it can proceed with issuing a
190     TRUNCATE on _rserv_log_1_.
191    
192     The interval between clean log events is controlled in the
193     configuration file by replic.server.CleanLogInterval. The default is
194     once every 24 hours. You'll find that is too infrequent on a busy
195     database.
196    
197     That's everything I can think of for now. I hope these notes are
198     useful. Feel free to ask questions.
199    
200     Andrew Sullivan
201     andrew at libertyrms.info
202     Formatting by [2]A. Elein Mustain
203    
204     References
205    
206     1. mailto:erserver-general@postgresql
207     2. mailto:elein@varlena.com

  ViewVC Help
Powered by ViewVC 1.1.26