In a data warehouse designed for a large retail chain, the schema is organized as follows:
- Fact Constellation Schema with multiple fact tables for Sales, Inventory, and Customer Reviews.
- The Product Dimension has a hierarchy: Supplier → Category → Subcategory → Product.
- The Sales Measure includes metrics like Total Sales and Average Sales Price.
You need to perform an analysis that involves aggregating Total Sales across different Product Subcategories, and then using these aggregated results to compute the Average Sales Price at the Category level. Which schema and measure types best support this analysis while ensuring minimal redundancy and efficient aggregation?
Star Schema with Distributive Measures
Snowflake Schema with Holistic Measures
Fact Constellation Schema with Algebraic Measures
Snowflake Schema with Distributive Measures
Solution
- Star Schema with Distributive Measures: While a star schema simplifies queries and distributive measures can be aggregated efficiently, it is less suited for multiple fact tables and complex hierarchical aggregations.
- Snowflake Schema with Holistic Measures: A snowflake schema supports normalized dimensions, which helps in minimizing redundancy, but holistic measures require the entire dataset for accurate computation and aren’t ideal for hierarchical aggregation.
- Fact Constellation Schema with Algebraic Measures (Correct Answer): The fact constellation schema is well-suited for handling multiple fact tables and shared dimensions. Algebraic measures like Average Sales Price can be computed by aggregating distributive measures like Total Sales, making this combination optimal for complex aggregations and maintaining efficiency.
Snowflake Schema with Distributive Measures: Although a snowflake schema is effective in reducing redundancy, it doesn’t handle multiple fact tables as well as the fact constellation schema. Distributive measures are good for simple aggregation but may not fully address complex hierarchical analysis involving multiple fact tables.