|
|
12.1 ROLLUPIn Chapter 4, you saw how the GROUP BY clause, along with the aggregate functions, can be used to produce summary results. For example, if you want to print the monthly total sales for each region, you would probably execute the following query: SELECT R.NAME REGION,
TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY R.NAME, O.MONTH;
REGION MONTH SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic January 610697
Mid-Atlantic February 428676
Mid-Atlantic March 637031
Mid-Atlantic April 541146
Mid-Atlantic May 592935
Mid-Atlantic June 501485
Mid-Atlantic July 606914
Mid-Atlantic August 460520
Mid-Atlantic September 392898
Mid-Atlantic October 510117
Mid-Atlantic November 532889
Mid-Atlantic December 492458
New England January 509215
New England February 615746
New England March 566483
New England April 597622
New England May 566285
New England June 503354
New England July 559334
New England August 547656
New England September 575589
New England October 549648
New England November 461395
New England December 533314
SouthEast US January 379021
SouthEast US February 618423
SouthEast US March 655993
SouthEast US April 610017
SouthEast US May 661094
SouthEast US June 568572
SouthEast US July 556992
SouthEast US August 478765
SouthEast US September 635211
SouthEast US October 536841
SouthEast US November 553866
SouthEast US December 613700
36 rows selected.
As expected, this report prints the total sales for each region and month combination. However, in a more complex application, you may also want to have the subtotal for each region over all months, along with the total for all regions, or you may want the subtotal for each month over all regions, along with the total for all months. In short, you may need to generate subtotals and totals at more than one level. 12.1.1 Using UNION (The Old Way)In data warehouse applications, you frequently need to generate summary information over various dimensions, and subtotal and total across those dimensions. Generating and retrieving this type of summary information is a core goal of almost all data warehouse applications. By this time, you have realized that a simple GROUP BY query is not sufficient to generate the subtotals and totals described in this section. In order to illustrate the complexity of the problem, let's attempt to write a query that would return the following in a single output:
One way to generate multiple levels of summary (the only way prior to Oracle8i) is to write a UNION query. For example, the following UNION query will give us the desired three levels of subtotals: SELECT R.NAME REGION,
TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY R.NAME, O.MONTH
UNION ALL
SELECT R.NAME REGION, NULL, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY R.NAME
UNION ALL
SELECT NULL, NULL, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID;
REGION MONTH SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic January 610697
Mid-Atlantic February 428676
Mid-Atlantic March 637031
Mid-Atlantic April 541146
Mid-Atlantic May 592935
Mid-Atlantic June 501485
Mid-Atlantic July 606914
Mid-Atlantic August 460520
Mid-Atlantic September 392898
Mid-Atlantic October 510117
Mid-Atlantic November 532889
Mid-Atlantic December 492458
New England January 509215
New England February 615746
New England March 566483
New England April 597622
New England May 566285
New England June 503354
New England July 559334
New England August 547656
New England September 575589
New England October 549648
New England November 461395
New England December 533314
SouthEast US January 379021
SouthEast US February 618423
SouthEast US March 655993
SouthEast US April 610017
SouthEast US May 661094
SouthEast US June 568572
SouthEast US July 556992
SouthEast US August 478765
SouthEast US September 635211
SouthEast US October 536841
SouthEast US November 553866
SouthEast US December 613700
Mid-Atlantic 6307766
New England 6585641
SouthEast US 6868495
19761902
40 rows selected.
This query produced 40 rows of output, 36 of which are the sales for each month for every region. The last 4 rows are the subtotals and the total. The three rows with region names and NULL values for the month are the subtotals for each region over all the months, and the last row with NULL values for both the region and month is the total sales for all the regions over all the months. Now that you have the desired result, try to analyze the query a bit. You have a very small orders table with only 720 rows in this example. You wanted to have summary information over just two dimensions—region and month. You have 3 regions and 12 months. To get the desired summary information from this table, you have to write a query consisting of 3 SELECT statements combined together using UNION ALL. The EXPLAIN PLAN on this query is: Query Plan
---------------------------------------
SELECT STATEMENT Cost = 15
UNION-ALL
SORT GROUP BY
HASH JOIN
TABLE ACCESS FULL REGION
TABLE ACCESS FULL ORDERS
SORT GROUP BY
HASH JOIN
TABLE ACCESS FULL REGION
TABLE ACCESS FULL ORDERS
SORT AGGREGATE
NESTED LOOPS
TABLE ACCESS FULL ORDERS
INDEX UNIQUE SCAN PK7
14 rows selected.
As indicated by the EXPLAIN PLAN output, Oracle needs to perform the following operations to get the results:
In any practical application the orders table will consist of hundreds of thousands of rows, and performing all these operations would be time-consuming. Even worse, if you have more dimensions for which to prepare summary information than the two shown in this example, you have to write an even more complex query. The bottom line is that such a query badly hurts performance. 12.1.2 Using ROLLUP (The New Way)Oracle8i introduced several new features for generating multiple levels of summary information with one query. One such feature is a set of extensions to the GROUP BY clause. In Oracle8i, the GROUP BY clause comes with two extensions: ROLLUP and CUBE. Oracle9i introduces another extension: GROUPING SETS. We discuss ROLLUP in this section. CUBE and GROUPING SETS are discussed later in this chapter. ROLLUP is an extension to the GROUP BY clause, and therefore can only appear in a query with a GROUP BY clause. The ROLLUP operation groups the selected rows based on the expressions in the GROUP BY clause, and prepares a summary row for each group. The syntax of ROLLUP is: SELECT ... FROM ... GROUP BY ROLLUP (ordered list of grouping columns) Using ROLLUP, you can generate the summary information discussed at the beginning of this section in a much easier way than in our UNION ALL query. For example: SELECT R.NAME REGION,
TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY ROLLUP (R.NAME, O.MONTH);
REGION MONTH SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic January 610697
Mid-Atlantic February 428676
Mid-Atlantic March 637031
Mid-Atlantic April 541146
Mid-Atlantic May 592935
Mid-Atlantic June 501485
Mid-Atlantic July 606914
Mid-Atlantic August 460520
Mid-Atlantic September 392898
Mid-Atlantic October 510117
Mid-Atlantic November 532889
Mid-Atlantic December 492458
Mid-Atlantic 6307766
New England January 509215
New England February 615746
New England March 566483
New England April 597622
New England May 566285
New England June 503354
New England July 559334
New England August 547656
New England September 575589
New England October 549648
New England November 461395
New England December 533314
New England 6585641
SouthEast US January 379021
SouthEast US February 618423
SouthEast US March 655993
SouthEast US April 610017
SouthEast US May 661094
SouthEast US June 568572
SouthEast US July 556992
SouthEast US August 478765
SouthEast US September 635211
SouthEast US October 536841
SouthEast US November 553866
SouthEast US December 613700
SouthEast US 6868495
19761902
40 rows selected.
As you can see in this output, the ROLLUP operation produced subtotals across the specified dimensions and a grand total. The argument to the ROLLUP operation is an ordered list of grouping columns. Since the ROLLUP operation is used in conjunction with the GROUP BY clause, it first generates aggregate values based on the GROUP BY operation on the ordered list of columns. Then it generates higher level subtotals and finally a grand total. ROLLUP not only simplifies the query, it results in more efficient execution. The explain plan for this ROLLUP query is as follows: Query Plan
--------------------------------------
SELECT STATEMENT Cost = 7
SORT GROUP BY ROLLUP
HASH JOIN
TABLE ACCESS FULL REGION
TABLE ACCESS FULL ORDERS
Rather than the multiple table scans, joins, and other operations required by the UNION version of the query, the ROLLUP query needs just one full table scan on REGION, one full table scan on ORDERS, and one join to generate the required output. If you want to generate subtotals for each month instead of for each region, all you need to do is change the order of columns in the ROLLUP operation, as in the following example: SELECT R.NAME REGION,
TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY ROLLUP (O.MONTH, R.NAME);
REGION MONTH SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic January 610697
New England January 509215
SouthEast US January 379021
January 1498933
Mid-Atlantic February 428676
New England February 615746
SouthEast US February 618423
February 1662845
Mid-Atlantic March 637031
New England March 566483
SouthEast US March 655993
March 1859507
Mid-Atlantic April 541146
New England April 597622
SouthEast US April 610017
April 1748785
Mid-Atlantic May 592935
New England May 566285
SouthEast US May 661094
May 1820314
Mid-Atlantic June 501485
New England June 503354
SouthEast US June 568572
June 1573411
Mid-Atlantic July 606914
New England July 559334
SouthEast US July 556992
July 1723240
Mid-Atlantic August 460520
New England August 547656
SouthEast US August 478765
August 1486941
Mid-Atlantic September 392898
New England September 575589
SouthEast US September 635211
September 1603698
Mid-Atlantic October 510117
New England October 549648
SouthEast US October 536841
October 1596606
Mid-Atlantic November 532889
New England November 461395
SouthEast US November 553866
November 1548150
Mid-Atlantic December 492458
New England December 533314
SouthEast US December 613700
December 1639472
19761902
49 rows selected.
Adding dimensions does not result in additional complexity. The following query rolls up subtotals for the region, the month, and the year for the first quarter: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY ROLLUP (O.YEAR, O.MONTH, R.NAME);
YEAR MONTH REGION SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
2000 January Mid-Atlantic 1221394
2000 January New England 1018430
2000 January SouthEast US 758042
2000 January 2997866
2000 February Mid-Atlantic 857352
2000 February New England 1231492
2000 February SouthEast US 1236846
2000 February 3325690
2000 March Mid-Atlantic 1274062
2000 March New England 1132966
2000 March SouthEast US 1311986
2000 March 3719014
2000 10042570
2001 January Mid-Atlantic 610697
2001 January New England 509215
2001 January SouthEast US 379021
2001 January 1498933
2001 February Mid-Atlantic 428676
2001 February New England 615746
2001 February SouthEast US 618423
2001 February 1662845
2001 March Mid-Atlantic 637031
2001 March New England 566483
2001 March SouthEast US 655993
2001 March 1859507
2001 5021285
15063855
27 rows selected.
12.1.3 Generating Partial ROLLUPsIn a ROLLUP query with N dimensions, the grand total is considered the top level. The various subtotal rows of N-1 dimensions constitute the next lower level, the subtotal rows of (N-2) dimensions constitute yet another level down, and so on. In the most recent example, you have three dimensions (year, month, and region), and the total row is the top level. The subtotal rows for the year represent the next lower level, because these rows are subtotals across two dimensions (month and region). The subtotal rows for the year and month combination are one level lower, because these rows are subtotals across one dimension (region). The rest of the rows are the result of the regular GROUP BY operation (without ROLLUP), and form the lowest level. If you want to exclude some subtotals and totals from the ROLLUP output, you can only move top to bottom, i.e., exclude the top-level total first, then progressively go down to the next level subtotals, and so on. To do this, you have to take out one or more columns from the ROLLUP operation, and put them in the GROUP BY clause. This is called a partial ROLLUP. As an example of a partial ROLLUP, let's see what happens when you take out the first column, which is O.YEAR, from the previous ROLLUP operation and move it into the GROUP BY clause. SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY O.YEAR ROLLUP (O.MONTH, R.NAME);
YEAR MONTH REGION SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
2000 January Mid-Atlantic 1221394
2000 January New England 1018430
2000 January SouthEast US 758042
2000 January 2997866
2000 February Mid-Atlantic 857352
2000 February New England 1231492
2000 February SouthEast US 1236846
2000 February 3325690
2000 March Mid-Atlantic 1274062
2000 March New England 1132966
2000 March SouthEast US 1311986
2000 March 3719014
2000 10042570
2001 January Mid-Atlantic 610697
2001 January New England 509215
2001 January SouthEast US 379021
2001 January 1498933
2001 February Mid-Atlantic 428676
2001 February New England 615746
2001 February SouthEast US 618423
2001 February 1662845
2001 March Mid-Atlantic 637031
2001 March New England 566483
2001 March SouthEast US 655993
2001 March 1859507
2001 5021285
26 rows selected.
The query in this example excludes the grand-total row from the output. By taking out O.YEAR from the ROLLUP operation, you are asking the database not to roll up summary information over the years. Therefore, the database rolls up summary information on region and month. When you proceed to remove O.MONTH from the ROLLUP operation, the query will not generate the roll up summary for the month dimension, and only the region-level subtotals will be printed in the output. For example: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY O.YEAR, O.MONTH ROLLUP (R.NAME);
YEAR MONTH REGION SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
2000 January Mid-Atlantic 1221394
2000 January New England 1018430
2000 January SouthEast US 758042
2000 January 2997866
2000 February Mid-Atlantic 857352
2000 February New England 1231492
2000 February SouthEast US 1236846
2000 February 3325690
2000 March Mid-Atlantic 1274062
2000 March New England 1132966
2000 March SouthEast US 1311986
2000 March 3719014
2001 January Mid-Atlantic 610697
2001 January New England 509215
2001 January SouthEast US 379021
2001 January 1498933
2001 February Mid-Atlantic 428676
2001 February New England 615746
2001 February SouthEast US 618423
2001 February 1662845
2001 March Mid-Atlantic 637031
2001 March New England 566483
2001 March SouthEast US 655993
2001 March 1859507
24 rows selected.
|
|
|