Star Schemas for Integration

 

 

 

The physical design of data can be seen in the schema. In this case, you can see that the star schema has been used. The star schema is a common and simple schema style.

The star schema consists of one or more fact tables which reference any number of dimension tables. The dimension tables provide the ability to view the data with different lenses. You will need to look at both the facts and the dimensions in order to identify the best way to connect then integrate the data.

It is possible that even though all of the schemas are similarly designed that attributes in the dimension tables or the dimension tables themselves may share names but have completely different content and definitions.

There are a few ways to standardize dimensions. The simplest of all is to define the taxonomy and definitions as standards for data management and system development. While this may be the simplest method, it can often turn out to be ineffective over time. The better option is to look toward a centralized solution.

It is possible to build a central reference/repository with shared dimensions and this is a common approach in many companies. Other companies choose to go with metadata and master data management systems.

For this case, we have decided to build a central repository with all of the shared dimensions. This means you need to homogenize the dimensional data. To do this you will need to work across all of the repositories and gather information about each of the dimension tables – for example:

  • Field names and examples of data in fields
  • Field definitions
  • Format definitions, such as: Char limits (character limits), Char types (character types), language, time zone, etc.

This is probably one of the most important parts of an integration project. Mainly because not only does it determine how you will integrate data now, but it will define how everyone views data across the company going forward. This activity is often one of the cornerstones of information governance: common and shared dimensions.

As you look at the dimensions think about future growth for the company and ensure flexibility. This may mean you need to allow for new currencies, languages and time zones. Before setting your dimensions in stone – it is a great idea to use FMEA to ensure you have not constrained future flexibility and expansion. One example would be having Customer ID fields that are unrealistically limited by the total number of characters.

What are some creative ways to manage customer IDs in a limitless way in your databases? (Remember to think about the dimensions) Answers:

  • Regional Customer IDs combined with geography dimensions
  • Customer account deprecation and ID re-use
  • Alpha-numeric ids
  • first/last name initials followed by X numbers
  • Combined initials, region and numeric id
  • Etc.

Jumping ahead. We now have commonly defined dimension tables in a system. Now you have to map the existing dimension table data to the new dimensions and without losing integrity. In some cases this will simply involve data migration and then setting the new dependency on the common dimension table. In other cases, you may actually have to assign new data.

Let’s say that we had to assign new customer IDs to our international customers.

What are some ways to manage a change to your customer IDs?

  • Change the ID and tell the customer their new number
  • Use a reference table and let the customer maintain their original number
  • IMPORTANT: think about your customer!!!!
The new IDs have been assigned and we have a reference table with the original customer IDs. This allows the customer and the service team to see the history of their account.