Main

 
CENTRAL DATA MANAGEMENT

Central Data Management

The guiding principles that (1) corporate data will be centrally managed as a corporate resource and (2) corporate data will be managed according to operational type drive the central data management concept. Sound central data management both recognizes and exploits the distinct characteristics of the four types of data store: Reference, Transaction, Control and Decision Support.

Reference Data Management

Data which are collected and maintained for use in processing business transactions which may not be specifically known when the data is obtained are Reference Data. One of the easiest examples to understand is a manufacturing company's product and price list. Normally the marketing staff conceives of a product in anticipation of demand. The process of finalizing product details is usually independent of any specific product orders. Great care must be exercised to ensure that every single one of the product's details is precisely recorded in whatever reference medium the company uses. Whether the product is defined in a computer data base, or on a product specification sheet, or in a product manual, accuracy is critical. Whenever reference data is accessed (which could be some time after it was recorded) the user needs to be able to take data accuracy for granted.

Reference subject area databases are independent of all applications which require access to data within the areas they cover. While there are many desirable benefits from having data about important corporate subjects maintained in common, this separation means that the transaction-specific data management functions those various applications perform could also be lost. For example, in a non-central data management environment each application which collects client data currently should have rules concerning what constitutes valid client data. Similarly, each application should have a process to handle changes in data (such as a client's moving to a new address) if such a change is relevant to the application. Separating data from process means, among other things, that we must separate the process which records the fact that a client has moved from the processes which handle the business implications of that move. Applications must not be free to change shared data merely to suit their own purposes since such changes may adversely affect other applications which use the same data. On the other hand, the answer cannot be to mandate that shared data must remain unchangeable. If so, it would quickly become stale and out-dated. We need to be able to manage certain types of data centrally (reference) on behalf of all users in an application neutral way.

When it comes to the data maintained in shared subject data bases, existing application-centric data management processes will have to be replaced by common, application-independent subject data management. As a corporate asset, shared data must be safeguarded to preserve its value and ensure good return on investment. These business objectives require conscious attention to the process of collecting, validating, storing, updating and applying data within and to the various competing business activities for the enterprise.

The advantages of subject data management have been well known for a long time. There are the benefits of consistency which come from having all business processes accessing the same data values; there are the hardware savings from having to store and process only one version of data rather than having the same data showing up in the files of the various applications; and, there are the operational savings which flow from only having to apply update and maintenance activities on the data in one place. Obtaining these benefits, however, requires some investment.

What these means for Client data, for example, is that a Central Client Reference File will come not only with a data base and viewing screens, it will also come with standard routines for creating and for modifying client data. All applications which are concerned with the identification of a new client will use the same process to submit the new client data to the Central Client Reference File. Applying the same edits regardless of which business transaction initiates the addition will help ensure corporate-wide standardization in the quality of newly posted data. Furthermore, the ongoing monitoring of the client data base will help ensure that the data remains accurate and reliable. For example, our clients can change addresses, addresses can change owners, and so forth. Part of the job of managing shared client data is the generation and enforcement of standard rules to ensure consistency in the modification of existing data. A second - just as critical - responsibility is facilitating affected applications in processing the implications of such changes.

Transaction Data Management

This is the primary category of corporate data. It includes the details involved in any business transaction whether that transaction is to issue an invoice, process a payment, prepare a proposal, add a new employee to the payroll system, or order a box of pencils. Most of the details in the transaction (e.g., date, quantity, name, item description) are either immediately verifiable by the person creating the transaction or are reference data elements whose accuracy is to be assumed. As is true with all four of the data store types, transaction data may exist in any form; e.g., tabular, text, images, notes, annotations, etc.

Managing transaction data can be contrasted quite easily with reference data management. Reference data is shared and managed on behalf of multiple applications. Transaction data is not shared but is typically specific to one application. The data is essentially managed by and for the application itself. Whether managed by an application group or a central organization the data will still have to be secure, retained, accessible and well defined within the context of the application. All legacy data (that is, corporate data managed by systems that were implemented prior to the adoption of central data management and not yet converted to that concept) can be managed and is, to varying degrees, managed like transaction data.

Control Data Management

Control data is by definition derived data. It is a by-product of conducting business transactions. Examples include number of cases processed daily, net change in inventory, payroll liability for a pay period, number of cases outstanding, and transaction status counts. All financial accounting data are in the control category. Because control data is derived, it is based on algorithms. For example, the number of cases processed daily is determined by counting; the payroll liability for a pay period is determined by adding the wages due to all of the employees. Accuracy of control data, therefore, is assured when an accurate algorithm is applied to correct data.

Control data is the easiest type of data to manage since it is not subject to change. For example, the number of orders taken on a given day is an historical fact. If it was calculated correctly (and proper control over input data and calculation algorithms should ensure that) then it cannot change. Control data may become obsolete and replaced by more recent data, but its inherent stability makes its data management primarily a matter of safeguarding the data bases, files, tables etc. in which it is kept.

Decision Support Data Management

The difference between decision support data and the other three data store types is not related to content. Decision Support files or tables contain the same data as is found in Reference, Transaction, and Control data stores. The difference is based entirely on format and usage. Decision Support draws its data from the other three data store types and presents the data in a manner specifically designed to meet the needs of ad hoc query and analysis. The difference is one entirely of style, not content. Since the data already exists (except, perhaps for occasional summarization and enumeration), no special effort is required to ensure the accuracy of decision support data other than to guarantee that the correct data is being copied into the decision support files, that the data are copied correctly, and that they are protected from loss and destruction. (Efforts to "clean-up" data as it is loaded into decision support data stores are, for the most part, misplaced. The focus should be on ensuring that the source data is "clean" in the first place.)

It is also worth mentioning that the independent management of subject area data bases also offers us the opportunity to gain business benefits from direct analysis of the data. For example, having all the facts which the company has on a client in one data base greatly simplifies efforts to perform cross-marketing studies. Likewise, a central Contract Data Base can expedite all manner of decision support analyses. In both situations, however, it is essential that there be no misunderstanding concerning the meaning and reliability of the data. Otherwise, well intentioned but misinformed users could easily draw incorrect and potentially damaging conclusions from even the most sophisticated and comprehensive decision support sources. A Central Data Management service ensures that there is a custodian who can provide such background information (i.e., metadata) and who can ensure the valid application of the data.

Central Data Management Services

Beyond a clear appreciation for these four types of data stores, fully understanding Central Data Management requires recognizing that there is a broad range of specific services involved. The current state of our understanding in this area is as follows:

Data Security: the ability to ensure that specific data values can only be seen, created, modified, and/or deleted by authorized individuals and processes.

This service is enforced by security functions of the data base management system (e.g., DB2, Oracle), but the administration and maintenance of authorization tables sometimes is a big enough job to require supplemental software.

Data Retention: the assurance that data once posted to a DBMS will not be lost or destroyed.

This service is primarily provided by the backup, recovery and audit trail capabilities of the DBMS but is often augmented by physical backup and recovery procedures and physical security controls.

Data Access: the ability to post and retrieve required data under a broad range of operating conditions.

This service is provided by the interaction of the DBMS and the computer operating system. Its efficiency and effectiveness is primarily a function of the physical distribution of data, the setting of various DBMS performance parameters and the capacity of the platform.

Data Model Enforcement: the review and approval of physical data implementations to ensure conformity to proper data models.

Sound data management must be based on proper data models which, in turn, must drive the design and implementation of the data bases. With shared data, in particular, it is important to coordinate physical data structures to common logical and conceptual data models.

Data Creation Control: the enforcement of posting edits, cross-edits and referential integrity This ensures that actual data conforms to the definitions and specifics indicated by the corresponding metadata.

It is a data administration axiom that the more this activity is handled by the DBMS the better. For shared data, in particular, centralized control of data creation controls is essential. Data creation control includes the ability to ensure that all transactions within a unit of work are either committed or not committed in all appropriate places.

Metadata Management: the organization, evaluation, retention and application of our knowledge about data.

Data about data (such as definitions, system names, formats, etc.) is itself a corporate asset and warrants special management regardless of the actual data store in which it is kept. The vision for managing metadata is repository based central metadata management. The repository is just what it implies, a storage place, or database, for all models including diagrams, entity descriptions and attributes along with linkages between the business, systems and technology models. The repository should be managed as one logical unit although multiple repositories might exist on multiple platforms. Figure 1 is a logical view of the repository and where it fits. It is important to note that it not only contains metadata components (often referred to as the Data Dictionary) but also process and technology components. It is also important to note that it contains metadata about all phases of a systems life cycle from preliminary analysis through design, construction, implementation and production.

Figure 1 -Metadata Management - Logical View of Repository

 

 

Data Evolution Monitoring: the creation and processing of transactions to modify data as a result of the recognition that business conditions so warrant.

As business situations change, existing data may become incorrect or obsolete. Users of shared corporate data need assurance that the values they retrieve are, in fact, correct. This requires that there be processes in place which are able to detect situations in which existing data must be updated in order to maintain conformity to the underlying metadata.

Data Synchronization: the establishment of strict correspondence among various copies of centrally managed data.

For performance reasons it may be necessary to replicate data; that is: to make a copy of a file or table to reduce response time or to ensure local functionality. Users of the copy should be entitled to expect a specific level of correspondence (ideally 100%) between the data in the copy and the data in the original location. Ensuring this correspondence is usually based on the coordination of DBMS, network and operating system capabilities and managed by a data administration function. Data Synchronization may involve services for converting and/or transferring data from one operating platform to another (e.g., from mainframe DB2 to server-based Oracle.)

Data Archiving: the determination and implementation of decisions to move some of the contents of a data store to a more cost effective medium (or to delete it all together) due to age or obsolescence.

Central data management includes the management of archives so that data with a relatively low level of usage can be retrieved when required in a standard and predictable way.

I/O Support: the procedures which instruct the DBMS to post, modify, record, and/or delete data.

The overall management of shared corporate data requires considerable control and standardization in the way in which shared data resources are used. The central data management service includes standard I/O support for shared data. Application specific data, on the other hand, may require more customized I/O routines even while benefiting from the data security, retention and access services of central data management.

Subject Area Management: the establishment and discharge of policies, procedures and functions designed to administer the corporation's data assets in a particular subject area for the common good.

This is a business activity (although that does not necessary mean that it cannot be discharged by an IS unit) which employs the above technical capabilities to provide the company with reliable and responsive access to data in a particular subject area as defined in the Enterprise Data Model. Subject area management of common data may be provided by Data Administration or by a business unit charted to provide such services in an application neutral way.

 

 

Figure 2 illustrates the nesting characteristics of the above described services; that is, many of the services depend upon the proper functioning of others. For example, no one can get to any data without passing through the data security layer. As the diagram illustrates, the arrangement is not entirely symmetric. The anomaly is based on the distinction between common data (Reference) and application specific data (Legacy or Transaction). Both types can and will use Central Data Management services. Central Data Management does not, however, provide subject area management services to data which is specific to a single application. On the other hand, subject area management services, by definition, are common across applications which do use them. Also, for performance reasons, applications development teams will usually prefer to develop application specific I/O services where possible. Since there is no possibility of cross application interference where data is not being shared, the use of support services is not required for application-specific data.

 

 

Figure 2 - Central Data Management

 

 

 

Return to Home Page


© Peter V. Cooper
DataAssets@aol.com
781/449-1861