vdw/trunk/readme_vdw.txt

Object Server (Virtual Data Warehouse)
======================================

I.  What is it?

    Object Server is a sql parser with perl embedded, that maps virtual table
    column names onto perl subroutines, and runs the subs needed based on the
    columns in the "select" and "where" clauses of the query, then returns the
    data after filtering it through the conditions in the "where" clause.

    The column-to-subroutine map is in obj_srvr.tables, a file which obj_srvr.pm
    reads at "embed time" (when the parser is started).  (The parser embeds
    obj_srvr.pl, which is a wrapper for obj_srvr.pm.)

    The data model relies on various Banner SIS related perl modules in
        woup:[wou_sis_mods.com]
 
            WOU_Admit.pm
            WOU_Person.pm
            WOU_SIS_Util.pm
            WOU_Student.pm
            WOU_Util.pm

    However, obj_srvr.pm can work with ANY data model.  All you need is the
    map in obj_srvr.tables, and the subs to return the data.  (* obj_srvr.pm
    needs subs in WOU_Util.pm, regardless of data model used. *)

    Conventions of the Data Model:

        1.  "from" clause can only have one table.  

            You can build whatever joins you need with wrapper subs, see
            stu_schedule "table" for example.

        2.  Driver sub (the sub associated with the virtual table in %table_objs)
            returns a ref to an array of hash refs.  The array is like a table,
            and the hashes are like records.  The data looks like this:

                term_students = [ 
                  { "pidm"         => 12345,
                    "term"         => "200203",
                    "level"        => "UG",
                    "confidential" => "N",
                    "dead"         => "N",
                    "majr_code"    => "5CSC",
                    "minr_code"    => "4PSY",
                    "con1_code"    => "",  # actually, might be null
                    "con2_code"    => "",
                    "con3_code"    => "",
                    "major"        => "Computer Science",
                    "minor"        => "Pyschology",
                    "conc1"        => "",
                    "conc2"        => "",
                    "conc3"        => "",
                    "resident"     => "R",
                    "deg_plan"     => "PPF",
                    "admt_code"    => "FT",
                    "styp_code"    => "C",
                    "orsn_code"    => "E",
                    "rate_code"    => "BASIC" },

                  { "pidm"         => 12346,
                    "term"         => "200203",
                    "level"        => "UG",
                    "confidential" => "N",
                    "dead"         => "N",
                    "majr_code"    => "5CSM",
                    "minr_code"    => "",
                    "con1_code"    => "",
                    "con2_code"    => "",
                    "con3_code"    => "",
                    "major"        => "Computer Science/Math",
                    "minor"        => "",
                    "conc1"        => "",
                    "conc2"        => "",
                    "conc3"        => "",
                    "resident"     => "N",
                    "deg_plan"     => "",
                    "admt_code"    => "FR",
                    "styp_code"    => "C",
                    "orsn_code"    => "S1",
                    "rate_code"    => "BASIC" },
                                        .
                                        .
                                        .  ];

        3.  Supporting subs (subs associated with the columns in %obj_accessor)
            return a hash ref (a single record).  Data looks like this for
            get_addr ($dbh, $pidm, $atyp1, $atyp2, $atyp3):

           { "atyp"      => "CU",
             "street"    => "223 NW Front Street",
             "street2"   => "",  # actually, might be null
             "street3"   => "",
             "city"      => "Salem",
             "state"     => "OR",
             "zip"       => "97301",
             "nation"    => "",
             "natn_code" => "",
             "county"    => "Marion" };

            For supporting subs that return a ref to an array of hashes (a table),
            make them part of the driver sub results by using a wrapper sub to join
            (see stu_schedule example below).

        # =========================================================
        #  stu_schedule - All students and their class schedule for
        #  given term and level.  Level may use wildcards.
        # =========================================================
        sub stu_schedule {
            my ($dbh, $term, $level) = @_;

            my (@stu_sched, $ra_students, $rh_stu, $ra_sched, $rh_class, $rh_new);

            $ra_students = term_students($dbh, $term, $level);

            foreach $rh_stu ( @{ $ra_students } ) {
                $ra_sched = schedule($dbh, $rh_stu->{pidm}, $term);
                foreach $rh_class ( @{ $ra_sched } ) {

                    $rh_new = { };  # get new memory
                    add2hash($rh_new, $rh_stu);  # "dup" $rh_stu

                    add2hash($rh_new, $rh_class);
                    push @stu_sched, $rh_new;
                }
            }

            return \@stu_sched;
        }

II.  Why?

      1.  Any Programmer can pull reliable data from system

          The problem with Database-level data models (i.e. the tables in
          Banner, for instance) is that even a good programmer can't trust
          herself to write reliable queries without a lot of research, until
          she has had a year of experience working with the data.  So maybe
          only one programmer on the staff can do quick queries from a given
          module of Banner, for example.  But with higher-level objects (like
          address, grades, schedule, gpa) available in subroutines, any
          programmer can hook these subs together and get quick, reliable
          results.  This functionality is provided by the perl modules mentioned
          above, and doesn't need obj_srvr.  An experienced programmer can
          benefit from the higher-level subs too, because they eliminate a lot
          of error checking, and guarantee uniformity in results.

      2.  Users can get good answers to ad hoc questions without asking a
          programmer

          obj_srvr provides an interface for hooking the subs together.  Users
          can do this through BI Query (in alpha testing), and programmers
          can do it by doing sql queries inside the obj_srvr parser.  For a
          programmer this is still better than writing even a simple perl script
          to hook together the subs, because the script would have to be checked
          for errors, since things can go wrong due to oversight in only a few
          lines of code.

III.  Why not just use a real Data Warehouse?

      1.  Data is live

      2.  Easier to change and add to model

          Changes can be made by merely adding field, table, and/or subnames 
          to obj_srvr.tables, and writing the needed perl subs.  With a
          warehouse you'd have to add fields to database tables or add tables,
          and mod the warehouse populate program accordingly.

      3.  Cleaner and more trustworthy

          Warehouse populate programs are multi-step processes that can fail
          at any point due to lack of disk space, lack of tablespace space,
          violation of integrity constraints, etc. If you plan on using the data
          you need to verify that the process ran to completion every day. 
          Initially you need to confirm for every step that any failure will
          abort the whole process.  If you modify the process you must be careful
          not to introduce code that could fail without aborting the process.
          

      4.  Enhancements to obj_srvr data model are available to regular programs,
          and vice-versa

          The subs for the data model are in perl modules, and can be used by
          the warehouse or by other perl programs and reports.  It doesn't
          matter if they were created for use in a report, or for use in the
          warehouse.

IV.  How implemented

      obj_srvr.exe is generated from obj_srvr.l and obj_srvr.y (lex and yacc),
      then compiled with exec_sql.c.  The SQL lexer and parser were copied from
      the O'Reilly Lex & Yacc book, 2nd Edition, 1995, by John R. Levine,
      Tony Mason, and Doug Brown, and then enhanced (embedded perl and some
      parser "action" blocks).

V. Building

     from this directory (developer$disk:[wosc_common.uid.obj_srvr]

         @make clean
         @make

     (There is a unix makefile in the directory, but didn't have make on VMS
      so wrote make.com.  Later installed make, but makefile will need mod'ed
      a bit to work in VMS.)

     Will work on unix using the unix makefile.  (at least it DID on SunOS, may
     have done some stuff in the VMS version that screwed it up for unix, will
     have to check.)  obj_srvr.pm handles unix or VMS (only a couple lines of code
     where it matters).  The Banner SIS perl modules have a little code that is
     specific to VMS, need to mod this for unix (mostly the @INC path).

VI. Using

     As a SQL-like Query Tool
     ========================

     from this directory (developer$disk:[wosc_common.uid.obj_srvr]

     orauser wouprd

     runpgm := $developer$disk:[wosc_common.uid.obj_srvr]obj_srvr.exe

     runpgm <uid> <passwd> I

     (I is for Interactive, i.e. command line rather than BI-Query through tnslsnr.)

     (wait about 10 seconds for the parser to embed perl)

     sql

     SQL> select lname, fname, major, city from term_students
          where term = '200203' and level = '%' and
          city in ('Monmouth', 'Independence') and state = 'OR' and
          majr_code like '5%' order by major, city, lname, fname;

     (will take about 2 minutes)

     sql

     SQL> (another query)

     quit

     (back to the vms prompt - you can also type quit from the SQL> prompt).

     ** any "where clause" operators that don't translate directly to perl need
        to be implemented in obj_srvr.y.  (I.e., "and" and "or" mean the same
        thing in perl as in sql, as do parentheses.  "=", "!=", "<", ">" do
        not, if the item being compared is a string, rather than a number.)
        Currently, am always translating "=" to "eq", need to leave alone for
        numbers.  (looks like it's okay to use eq, ne, lt, gt on numbers as long as
        both items being compared are numbers, so maybe this is okay.)
        HAVE implemented "like", "not like", "in", "not in", =, <>, !=, <>, <=, >=. 
        (=, !=, >, etc. implementation is translating to stringwise equivalent
        (eq, ne, gt, etc.), which seems to work for either strings or numbers, as
        long as both comparison items are either strings, or numbers (but not
        mixed, which seems pretty unlikely).
        Everything else is coming across unchanged. **

        (am using eval (actually reval) on the "where clause" expression to
        determine if any given record satisfies the "where clause". )


     From BI Query
     =============

     1.  Put a tnsnames.ora with the entry for obj_srvr in your windows directory
         (use tnsnames.ora in [.related] subdir as example) and set the properties
         for BI Query User to start in the windows directory.

     2.  In BI Query User, create a connection to obj_srvr.

     3.  STILL TO DO (see Spoofing Oracle, below) 
         ** This now works:  is in alpha testing. Need to add handling
         for rows longer than 2K, and clean up some stuff, etc. **


VII.  Spoofing Oracle (so a desktop client like BI Query can be used as a front-end)

    If you start a tns listener using the listener.ora and tnsnames.ora files
    in the [.related] subdir, it will look for the oracle server executable
    in the ORACLE_HOME/bin of each SID_NAME in the SID_LIST_LISTENER parameter.
    I have it set to /home/hjd so it will look for /home/hjd/bin/oracle on
    my linux PC (where I was testing this).  This "oracle" is a perl script.
    When it is started by tnslsnr, it sends tnslsnr a service_register command
    (See SERVICE_REGISTER entry in tns_cmds.txt).  This means that when tnslsnr
    gets connection requests for obj_srvr, it routes them to "oracle".  (You
    can call it another name, say obj_srvr, if you have a (PROGRAM=obj_srvr)
    entry in the SID_DESC in listener.ora.  Then the spoofing script should be
    named obj_srvr.)

    TO DO:

    Make "oracle" do connection conversation with BI Query client, get sql
    request from BI Query, and pass it to obj_srvr.exe. 
    ** Now Works:  connects, sends query, "oracle" sends back result from wouprd,
    and BI Query displays **

    You can see the bytes in the packets exchanged between the sqlplus client and
    the DB by creating a .sqlnet.ora file in your home directory (at least on
    unix), setting the trace level at the max, and running sqlplus and executing
    a query.  Use the .sqlnet.ora in the [.related] subdir, but set
    trace_level_client=16.  See cli_trace.log in [.related] for an example of the
    info this gives you.  In windows you can see the bytes from a BI-Query/DB
    conversation.  Go into Oracle Net8 Assistant and click on "Profile".  Set
    trace level to "SUPPORT" in "Tracing" - it may then force you to put in a TNS
    Time Out value in "Advanced".  It seems to have a problem closing the log file
    after a session; you may have to do some gyrations to be able to read it.
    Logging off the network and back on again frees up the file.

    The oracle packet type is given in bytes 11, 12.  13 is a sequence number.

----------------

Jeremy Hickerson, 6/13/2003
                  7/20/2003
                  8/24/2004
1	dpavlin	1	Object Server (Virtual Data Warehouse)
2			======================================
3
4			I. What is it?
5
6			Object Server is a sql parser with perl embedded, that maps virtual table
7			column names onto perl subroutines, and runs the subs needed based on the
8			columns in the "select" and "where" clauses of the query, then returns the
9			data after filtering it through the conditions in the "where" clause.
10
11			The column-to-subroutine map is in obj_srvr.tables, a file which obj_srvr.pm
12			reads at "embed time" (when the parser is started). (The parser embeds
13			obj_srvr.pl, which is a wrapper for obj_srvr.pm.)
14
15			The data model relies on various Banner SIS related perl modules in
16			woup:[wou_sis_mods.com]
17
18			WOU_Admit.pm
19			WOU_Person.pm
20			WOU_SIS_Util.pm
21			WOU_Student.pm
22			WOU_Util.pm
23
24			However, obj_srvr.pm can work with ANY data model. All you need is the
25			map in obj_srvr.tables, and the subs to return the data. (* obj_srvr.pm
26			needs subs in WOU_Util.pm, regardless of data model used. *)
27
28			Conventions of the Data Model:
29
30			1. "from" clause can only have one table.
31
32			You can build whatever joins you need with wrapper subs, see
33			stu_schedule "table" for example.
34
35			2. Driver sub (the sub associated with the virtual table in %table_objs)
36			returns a ref to an array of hash refs. The array is like a table,
37			and the hashes are like records. The data looks like this:
38
39			term_students = [
40			{ "pidm" => 12345,
41			"term" => "200203",
42			"level" => "UG",
43			"confidential" => "N",
44			"dead" => "N",
45			"majr_code" => "5CSC",
46			"minr_code" => "4PSY",
47			"con1_code" => "", # actually, might be null
48			"con2_code" => "",
49			"con3_code" => "",
50			"major" => "Computer Science",
51			"minor" => "Pyschology",
52			"conc1" => "",
53			"conc2" => "",
54			"conc3" => "",
55			"resident" => "R",
56			"deg_plan" => "PPF",
57			"admt_code" => "FT",
58			"styp_code" => "C",
59			"orsn_code" => "E",
60			"rate_code" => "BASIC" },
61
62			{ "pidm" => 12346,
63			"term" => "200203",
64			"level" => "UG",
65			"confidential" => "N",
66			"dead" => "N",
67			"majr_code" => "5CSM",
68			"minr_code" => "",
69			"con1_code" => "",
70			"con2_code" => "",
71			"con3_code" => "",
72			"major" => "Computer Science/Math",
73			"minor" => "",
74			"conc1" => "",
75			"conc2" => "",
76			"conc3" => "",
77			"resident" => "N",
78			"deg_plan" => "",
79			"admt_code" => "FR",
80			"styp_code" => "C",
81			"orsn_code" => "S1",
82			"rate_code" => "BASIC" },
83			.
84			.
85			. ];
86
87			3. Supporting subs (subs associated with the columns in %obj_accessor)
88			return a hash ref (a single record). Data looks like this for
89			get_addr ($dbh, $pidm, $atyp1, $atyp2, $atyp3):
90
91			{ "atyp" => "CU",
92			"street" => "223 NW Front Street",
93			"street2" => "", # actually, might be null
94			"street3" => "",
95			"city" => "Salem",
96			"state" => "OR",
97			"zip" => "97301",
98			"nation" => "",
99			"natn_code" => "",
100			"county" => "Marion" };
101
102			For supporting subs that return a ref to an array of hashes (a table),
103			make them part of the driver sub results by using a wrapper sub to join
104			(see stu_schedule example below).
105
106			# =========================================================
107			# stu_schedule - All students and their class schedule for
108			# given term and level. Level may use wildcards.
109			# =========================================================
110			sub stu_schedule {
111			my ($dbh, $term, $level) = @_;
112
113			my (@stu_sched, $ra_students, $rh_stu, $ra_sched, $rh_class, $rh_new);
114
115			$ra_students = term_students($dbh, $term, $level);
116
117			foreach $rh_stu ( @{ $ra_students } ) {
118			$ra_sched = schedule($dbh, $rh_stu->{pidm}, $term);
119			foreach $rh_class ( @{ $ra_sched } ) {
120
121			$rh_new = { }; # get new memory
122			add2hash($rh_new, $rh_stu); # "dup" $rh_stu
123
124			add2hash($rh_new, $rh_class);
125			push @stu_sched, $rh_new;
126			}
127			}
128
129			return \@stu_sched;
130			}
131
132			II. Why?
133
134			1. Any Programmer can pull reliable data from system
135
136			The problem with Database-level data models (i.e. the tables in
137			Banner, for instance) is that even a good programmer can't trust
138			herself to write reliable queries without a lot of research, until
139			she has had a year of experience working with the data. So maybe
140			only one programmer on the staff can do quick queries from a given
141			module of Banner, for example. But with higher-level objects (like
142			address, grades, schedule, gpa) available in subroutines, any
143			programmer can hook these subs together and get quick, reliable
144			results. This functionality is provided by the perl modules mentioned
145			above, and doesn't need obj_srvr. An experienced programmer can
146			benefit from the higher-level subs too, because they eliminate a lot
147			of error checking, and guarantee uniformity in results.
148
149			2. Users can get good answers to ad hoc questions without asking a
150			programmer
151
152			obj_srvr provides an interface for hooking the subs together. Users
153			can do this through BI Query (in alpha testing), and programmers
154			can do it by doing sql queries inside the obj_srvr parser. For a
155			programmer this is still better than writing even a simple perl script
156			to hook together the subs, because the script would have to be checked
157			for errors, since things can go wrong due to oversight in only a few
158			lines of code.
159
160			III. Why not just use a real Data Warehouse?
161
162			1. Data is live
163
164			2. Easier to change and add to model
165
166			Changes can be made by merely adding field, table, and/or subnames
167			to obj_srvr.tables, and writing the needed perl subs. With a
168			warehouse you'd have to add fields to database tables or add tables,
169			and mod the warehouse populate program accordingly.
170
171			3. Cleaner and more trustworthy
172
173			Warehouse populate programs are multi-step processes that can fail
174			at any point due to lack of disk space, lack of tablespace space,
175			violation of integrity constraints, etc. If you plan on using the data
176			you need to verify that the process ran to completion every day.
177			Initially you need to confirm for every step that any failure will
178			abort the whole process. If you modify the process you must be careful
179			not to introduce code that could fail without aborting the process.
180
181
182			4. Enhancements to obj_srvr data model are available to regular programs,
183			and vice-versa
184
185			The subs for the data model are in perl modules, and can be used by
186			the warehouse or by other perl programs and reports. It doesn't
187			matter if they were created for use in a report, or for use in the
188			warehouse.
189
190			IV. How implemented
191
192			obj_srvr.exe is generated from obj_srvr.l and obj_srvr.y (lex and yacc),
193			then compiled with exec_sql.c. The SQL lexer and parser were copied from
194			the O'Reilly Lex & Yacc book, 2nd Edition, 1995, by John R. Levine,
195			Tony Mason, and Doug Brown, and then enhanced (embedded perl and some
196			parser "action" blocks).
197
198			V. Building
199
200			from this directory (developer$disk:[wosc_common.uid.obj_srvr]
201
202			@make clean
203			@make
204
205			(There is a unix makefile in the directory, but didn't have make on VMS
206			so wrote make.com. Later installed make, but makefile will need mod'ed
207			a bit to work in VMS.)
208
209			Will work on unix using the unix makefile. (at least it DID on SunOS, may
210			have done some stuff in the VMS version that screwed it up for unix, will
211			have to check.) obj_srvr.pm handles unix or VMS (only a couple lines of code
212			where it matters). The Banner SIS perl modules have a little code that is
213			specific to VMS, need to mod this for unix (mostly the @INC path).
214
215			VI. Using
216
217			As a SQL-like Query Tool
218			========================
219
220			from this directory (developer$disk:[wosc_common.uid.obj_srvr]
221
222			orauser wouprd
223
224			runpgm := $developer$disk:[wosc_common.uid.obj_srvr]obj_srvr.exe
225
226			runpgm <uid> <passwd> I
227
228			(I is for Interactive, i.e. command line rather than BI-Query through tnslsnr.)
229
230			(wait about 10 seconds for the parser to embed perl)
231
232			sql
233
234			SQL> select lname, fname, major, city from term_students
235			where term = '200203' and level = '%' and
236			city in ('Monmouth', 'Independence') and state = 'OR' and
237			majr_code like '5%' order by major, city, lname, fname;
238
239			(will take about 2 minutes)
240
241			sql
242
243			SQL> (another query)
244
245			quit
246
247			(back to the vms prompt - you can also type quit from the SQL> prompt).
248
249			** any "where clause" operators that don't translate directly to perl need
250			to be implemented in obj_srvr.y. (I.e., "and" and "or" mean the same
251			thing in perl as in sql, as do parentheses. "=", "!=", "<", ">" do
252			not, if the item being compared is a string, rather than a number.)
253			Currently, am always translating "=" to "eq", need to leave alone for
254			numbers. (looks like it's okay to use eq, ne, lt, gt on numbers as long as
255			both items being compared are numbers, so maybe this is okay.)
256			HAVE implemented "like", "not like", "in", "not in", =, <>, !=, <>, <=, >=.
257			(=, !=, >, etc. implementation is translating to stringwise equivalent
258			(eq, ne, gt, etc.), which seems to work for either strings or numbers, as
259			long as both comparison items are either strings, or numbers (but not
260			mixed, which seems pretty unlikely).
261			Everything else is coming across unchanged. **
262
263			(am using eval (actually reval) on the "where clause" expression to
264			determine if any given record satisfies the "where clause". )
265
266
267			From BI Query
268			=============
269
270			1. Put a tnsnames.ora with the entry for obj_srvr in your windows directory
271			(use tnsnames.ora in [.related] subdir as example) and set the properties
272			for BI Query User to start in the windows directory.
273
274			2. In BI Query User, create a connection to obj_srvr.
275
276			3. STILL TO DO (see Spoofing Oracle, below)
277			** This now works: is in alpha testing. Need to add handling
278			for rows longer than 2K, and clean up some stuff, etc. **
279
280
281			VII. Spoofing Oracle (so a desktop client like BI Query can be used as a front-end)
282
283			If you start a tns listener using the listener.ora and tnsnames.ora files
284			in the [.related] subdir, it will look for the oracle server executable
285			in the ORACLE_HOME/bin of each SID_NAME in the SID_LIST_LISTENER parameter.
286			I have it set to /home/hjd so it will look for /home/hjd/bin/oracle on
287			my linux PC (where I was testing this). This "oracle" is a perl script.
288			When it is started by tnslsnr, it sends tnslsnr a service_register command
289			(See SERVICE_REGISTER entry in tns_cmds.txt). This means that when tnslsnr
290			gets connection requests for obj_srvr, it routes them to "oracle". (You
291			can call it another name, say obj_srvr, if you have a (PROGRAM=obj_srvr)
292			entry in the SID_DESC in listener.ora. Then the spoofing script should be
293			named obj_srvr.)
294
295			TO DO:
296
297			Make "oracle" do connection conversation with BI Query client, get sql
298			request from BI Query, and pass it to obj_srvr.exe.
299			** Now Works: connects, sends query, "oracle" sends back result from wouprd,
300			and BI Query displays **
301
302			You can see the bytes in the packets exchanged between the sqlplus client and
303			the DB by creating a .sqlnet.ora file in your home directory (at least on
304			unix), setting the trace level at the max, and running sqlplus and executing
305			a query. Use the .sqlnet.ora in the [.related] subdir, but set
306			trace_level_client=16. See cli_trace.log in [.related] for an example of the
307			info this gives you. In windows you can see the bytes from a BI-Query/DB
308			conversation. Go into Oracle Net8 Assistant and click on "Profile". Set
309			trace level to "SUPPORT" in "Tracing" - it may then force you to put in a TNS
310			Time Out value in "Advanced". It seems to have a problem closing the log file
311			after a session; you may have to do some gyrations to be able to read it.
312			Logging off the network and back on again frees up the file.
313
314			The oracle packet type is given in bytes 11, 12. 13 is a sequence number.
315
316			----------------
317
318			Jeremy Hickerson, 6/13/2003
319			7/20/2003
320			8/24/2004