Monday, March 15, 2010

Schema Extensibility in Commercial Enterprise Software

This describes some work I did in the late 90s. It has some limitations today, primarily because modern object-relational mapping systems, like JPA, do not handle run-time schema changes very well. Treat this as a historical document telling you how enterprise software was built in the days when every company also needed to write its own application platform.

This document describes how schema extensibility is typically achieved in commercial enterprise software applications; in particular, in customer relationship management (CRM) applications.


Flexibility Requirements

In CRM applications, implementing organizations need to extend the customer data model with business-specific information. The data model extensions consist of adding single-valued and/or multiple-valued properties to the customer data record.
  • The data model extensions must be efficient. The customer data tables typically have tens of thousands to hundreds of thousands of records in a business-to-business setting, and millions to tens of millions of records in a business-to-consumer setting.
  • The data model extensions must facilitate efficient query and join processing in the database.
  • The data model extensions must not prevent the vendor from easily upgrading the implementation in the field.
  • The data model extensions must allow the vendor to ship a single binary code image that can be run in all customer implementations.

General Approach

Vendors typically use a metadata-driven approach:
  • Data model extensions are limited to certain tables and are described with schema extension tables.
  • The customer of the application can extend the schema by describing the schema extension in a UI, which saves records in the schema extension tables.
  • Once the customer has finished designing the schema extension, the application extends the schema by adding columns to the extensible tables (for single-valued properties) or by creating new sub-tables (for multi-valued properties).
Example: Single-Valued Schema Extension

The implementing organization wishes to add a "favorite color" field to their customer records. The extension table has the following schema:


Extension Table Schema:

COLUMN NAMECOLUMN TYPE
Table_nameVarchar
Prop_nameVarchar
Prop_typeVarchar
Prop_lengthInteger
Prop_precisionInteger
Prop_singlevaluedBoolean

Through a UI, the user creates the following record:

COLUMN NAMECOLUMN VALUE
Table_nameCustomer
Prop_nameFavorite_Color
Prop_typeVarchar
Prop_length25
Prop_precisionNull
Prop_singlevaluedTrue

When the user is satisfied (ie when the user hits a button), the application executes the schema change by issuing the following SQL statement:

ALTER TABLE Customer
ADD COLUMN Favorite_Color varchar(25);

Notes:

The Column_precision column will only be used for NUMERIC fields.

Extension columns must typically be NULL-able or the user must provide a default value for the new column.

Example: Multi-Valued Schema Extension

The implementing organization wishes to associate a list of purchased products with a customer record. The schema extension record looks like this:

COLUMN NAMECOLUMN VALUE
Table_nameCustomer
Prop_namePurchases
Prop_typeInteger
Prop_length11 (or NULL)
Prop_precisionNull
Prop_singlevaluedFalse


The application executes the following schema change in response:

CREATE TABLE Customer_Purchases(
Customer_id INT NOT NULL,
Purchases_id INT NOT NULL,
Purchases_Value INT (11),
PRIMARY KEY (customer_id, purchases_id),
FOREIGN KEY (customer_id)
REFERENCES Customer (customer_id)
);

Thus, the Customer_Purchases table associates a set of Purchases values with the Customer record.

Scalability Issues

Record Length Limits and Views

Databases typically limit the size of a single record. Implementations deal with this by putting all the extension fields in a separate table joined back to the main table. If necessary, a view can be created to maintain the illusion of a single table:

-- this table is not extensible:
CREATE TABLE Customer_Base (
Customer_id INT NOT NULL PRIMARY KEY,
[OTHER FIELDS GO HERE]
);
-- this table is extensible
CREATE TABLE Customer_Ext (
Customer_id INT NOT NULL PRIMARY KEY,
);
-- add view
CREATE VIEW Customer AS
SELECT CB.*, CX.*
FROM Customer_Base CB JOIN Customer_Ext CX
ON CB.customer_id = CX.customer_id

The view is mainly there to hide the details of the customer schema from report writers.

Indexes

It is generally not a good idea to allow users to easily create indexes on the single-valued schema extension columns. Indexes impose overhead and should only be added to extension fields after it is determined that the index will result in a speedup that justifies the index maintenance overhead.

2 comments:

  1. Hi, found in google that you blogged on "NullPointerException: Illegal null name at com.tagish.auth.TypedPrincipal.".

    I have the same problem and am wondering on how you solved it; your post on this seems to be gone...

    ReplyDelete
  2. Move-yo: The tagish post is kind of obsolete. I've been using pluggable ldap authentication providers if I want to use Active Directory to authenticate java web apps. e.g. http://community.jboss.org/wiki/ldapextloginmodule

    ReplyDelete