Outer Joins and Optimizer Issues - forum topic by wpitterl

Hi all,

I'm having some performance issues due to the way certain queries are being processed by the optimizer. I have a set of views that I want to join together, and each of these views has the same JOIN/WHERE clause combination:
SELECT ...

FROM datatable t1 INNER JOIN sectable t2ON t1.FilterCol = t2.FilterColSELECT ... FROM datatable t1 INNER JOIN sectable t2 ON t1.FilterCol = t2.FilterCol WHERE USER = t2.USERNAME
WHERE USER = t2.USERNAME

The query joins together 7 of these views, inner joining between 2 and then left outer joining to 5 more:

SELECT v1.TYPE_CD, v2.TYPE_CD, v3.TYPE_CD, v4.TYPE_CD, v5.TYPE_CD, v6.TYPE_CD, v7.TYPE_CD, COUNT(*) FROM VIEW1 v1 INNER JOIN VIEW2 v2 ON (v1.SK = v2.SK AND v1.DT_SK = v2.DT_SK AND v1.TYPE_CD = v2.TYPE_CD AND v1.NUM_SK = v2.NUM_SK) LEFT OUTER JOIN VIEW3 v3 ON (v1.SK = v3.SK AND v1.DT_SK = v3.DT_SK AND v1.TYPE_CD = v3.TYPE_CD AND v1.NUM_SK = v3.NUM_SK) LEFT OUTER JOIN VIEW4 v4 ON (v1.SK = v4.SK AND v1.DT_SK = v4.DT_SK AND v1.TYPE_CD = v4.TYPE_CD AND v1.NUM_SK = v4.NUM_SK) LEFT OUTER JOIN VIEW5 v5 ON (v2.SK = v5.SK AND v2.DT_SK = v5.DT_SK AND v2.TYPE_CD = v5.TYPE_CD AND v2.NUM_SK = v5.NUM_SK AND v2.LINE_NUM_SK = v5.LINE_NUM_SK) LEFT OUTER JOIN VIEW6 v6 ON (v2.SK = v6.SK AND v2.DT_SK = v6.DT_SK AND v2.TYPE_CD = v6.TYPE_CD AND v2.NUM_SK = v6.NUM_SK AND v2.LINE_NUM_SK = v6.LINE_NUM_SK) LEFT OUTER JOIN VIEW7 v7 ON (v2.SK = v7.SK AND v2.DT_SK = v7.DT_SK AND v2.TYPE_CD = v7.TYPE_CD AND v2.NUM_SK = v7.NUM_SK AND v2.LINE_NUM_SK = v7.LINE_NUM_SK) WHERE v1.FROM_DT = '2012-12-01' GROUP BY 1,2,3,4,5,6,7 ORDER BY 1,2,3,4,5,6,7 ;

These tables are all quite large (some in the billions of rows). The basic breakdown of how the optimizer executes this is that it filters the first view (v1) on the date in the WHERE clause, then filters it on the join condition from the view definition, then spools that result. Next it joins that spool with the second view (v2), matching on the conditions in the ON clauses, which effectively filters that table based on the date in the WHERE clause as well. However, for the other five views (the ones that are outer joined), the optimizer decides to first join each of these tables to the tables from their view definitions individually (doing an all-rows scan in the process) before joining them to the spool that contains the data which has already been filtered on the WHERE clause. For some queries it makes almost no difference in execution time, but for a query like this in which the WHERE clause should be eliminating 99% of the processing, its quite a problem. The same query executed on the same data set, but with no joins in the view definitions, executes in a couple minutes versus a couple hours with the view definition give above.

I'm having trouble tracking down the reason why the optimizer isn't choosing a better way to do this. Removing the JOIN/WHERE clauses from the view definitions results in the query executing as expected - filtering the first table on the WHERE clause from the query, then joining each additional table to those results one by one, effectively filtering every join based on the WHERE clause. I can see from the first part of the execution plan that the optimizer knows that it can do the same thing with the new view definitions, because thats exactly what it does when it joins the first two tables, but I can't figure out why it won't do it for the rest of them. I'm guessing it has something to do with the outer joins, but why does the same query work as expected when the view definitions don't contain any joins?

Appreciate any insight on this!

Tags: