This article cites its sources but its page references ranges are too broad or incorrect. Please help adding a more precise page range. (June 2018) (Learn how and when to remove this message)

Dimensional modeling (DM) is part of the Business Dimensional Lifecycle methodology developed by Ralph Kimball which includes a set of methods, techniques and concepts for use in data warehouse design.[1]: 1258–1260 [2] The approach focuses on identifying the key business processes within a business and modelling and implementing these first before adding additional business processes, as a bottom-up approach.[1]: 1258–1260  An alternative approach from Inmon advocates a top down design of the model of all the enterprise data using tools such as entity-relationship modeling (ER).[1]: 1258–1260 

Description

Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across subject areas.[citation needed]

Dimensional modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. It is oriented around understandability and performance.[citation needed]

Design method

Designing the model

The dimensional model is built on a star-like schema or snowflake schema, with dimensions surrounding the fact table.[3][4] To build the schema, the following design model is used:

  1. Choose the business process
  2. Declare the grain
  3. Identify the dimensions
  4. Identify the fact
Choose the business process

The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of the data warehouse. The basics in the design build on the actual business process which the data warehouse should cover. Therefore, the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Model and Notation (BPMN) or other design guides like the Unified Modeling Language |UML).

Declare the grain

After describing the business process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore, the grain (sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is supposed to be able to deliver.

Identify the dimensions

The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday.

Identify the facts

After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore, most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.

Dimension normalization

The neutrality of this section is disputed. Relevant discussion may be found on the talk page. Please do not remove this message until conditions to do so are met. (June 2018) (Learn how and when to remove this message)

Dimensional normalization or snowflaking removes redundant attributes, which are known in the normal flatten de-normalized dimensions. Dimensions are strictly joined together in sub dimensions.

Snowflaking has an influence on the data structure that differs from many philosophies of data warehouses.[4] Single data (fact) table surrounded by multiple descriptive (dimension) tables

Developers often don't normalize dimensions due to several reasons:[5]

  1. Normalization makes the data structure more complex
  2. Performance can be slower, due to the many joins between tables
  3. The space savings are minimal
  4. Bitmap indexes can't be used
  5. Query performance. 3NF databases suffer from performance problems when aggregating or retrieving many dimensional values that may require analysis. If you are only going to do operational reports then you may be able to get by with 3NF because your operational user will be looking for very fine grain data.

There are some arguments on why normalization can be useful.[4] It can be an advantage when part of hierarchy is common to more than one dimension. For example, a geographic dimension may be reusable because both the customer and supplier dimensions use it.

Benefits of dimensional modeling

This section may rely excessively on sources too closely associated with the subject, potentially preventing the article from being verifiable and neutral. Please help improve it by replacing them with more appropriate citations to reliable, independent, third-party sources. (June 2018) (Learn how and when to remove this message)

Benefits of the dimensional model are the following:[6]

Dimensional models, Hadoop, and big data

The neutrality of this section is disputed. Relevant discussion may be found on the talk page. Please do not remove this message until conditions to do so are met. (June 2018) (Learn how and when to remove this message)

We still get the benefits of dimensional models on Hadoop and similar big data frameworks. However, some features of Hadoop require us to slightly adapt the standard approach to dimensional modelling.[citation needed]

Literature

References

  1. ^ a b c Connolly, Thomas; Begg, Carolyn (26 September 2014). Database Systems - A Practical Approach to Design, Implementation and Management (6th ed.). Pearson. Part 9 Business Intelligence. ISBN 978-1-292-06118-4.
  2. ^ Moody, Daniel L.; Kortink, Mark A.R. "From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design" (PDF). Dimensional Modelling. Archived (PDF) from the original on 17 May 2017. Retrieved 3 July 2018.
  3. ^ Ralph Kimball; Margy Ross; Warren Thornthwaite; Joy Mundy (10 January 2008). The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses (Second ed.). Wiley. ISBN 978-0-470-14977-5.
  4. ^ a b c Matteo Golfarelli; Stefano Rizzi (26 May 2009). Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill Osborne Media. ISBN 978-0-07-161039-1.
  5. ^ Ralph Kimball; Margy Ross (26 April 2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second ed.). Wiley. ISBN 0-471-20024-7.
  6. ^ Ralph Kimball; Margy Ross; Warren Thornthwaite; Joy Mundy; Bob Becker (January 2008). The Data Warehouse Lifecycle Toolkit (Second ed.). Wiley. ISBN 978-0-470-14977-5.