Quantcast
Channel: Teradata Forums - Database
Viewing all articles
Browse latest Browse all 14773

Query on Skew-Sensitivity in the TD12 above Optimizer - response (12) by Santanu84

$
0
0

Hi All

 

I have not received any update on my above query. However, I found few things by myself. I am sharing that. Hope this might be helpful to others.

 

1. I create the below table.

 

 

CREATE TABLE SCPLN.CLASSES, NO FALLBACK

(

CLASSUID INTEGER NOT NULL,

COURSEUID INTEGER NOT NULL,

CREATEDDATE TIMESTAMP(6),

ETL_ACTION VARCHAR(1)

)

UNIQUE PRIMARY INDEX(CLASSUID)

;

 

where each course id may have many class id. So course:class has 1:many relation. Then I did

 

COLLECT STATS ON SCPLN.CLASSES COLUMN(COURSEUID);

COLLECT STATS ON SCPLN.CLASSES COLUMN(ETL_ACTION);

 

 

 

2. Now the 2nd table is

 

CREATE TABLE SCPLN.COURSES, NO FALLBACK

(

COURSEUID INTEGER NOT NULL,

COURSENAME VARCHAR(20)

)

UNIQUE PRIMARY INDEX(COURSEUID)

;

 

This table has around 30000 rows with even distribution.

 

 

 

3. Then I ran the below SQL with join between 2 tables.

 

 

 

SELECT O.CLASSUID, O.CREATEDDATE, C.COURSEUID, C.COURSENAME

FROM SCPLN.COURSES C

INNER JOIN 

SCPLN.CLASSES O

ON C.COURSEUID = O.COURSEUID

WHERE O.ETL_ACTION = 'I'

;

 

 

The inital count of CLASSES was 246077 out of which COURSEUID = 383712 had count 4529.

 

First time the explain of SQL said,

 

4) We do an all-AMPs RETRIEVE step from SCPLN.O by way of an

     all-rows scan with a condition of ("SCPLN.O.ETL_ACTION = 'I'")

     into Spool 2 (all_amps), which is redistributed by the hash code

     of (SCPLN.O.COURSEUID) to all AMPs.  The size of Spool 2 is

     estimated with high confidence to be 246,077 rows (7,628,387

     bytes).  The estimated time for this step is 0.08 seconds. 

  5) We do an all-AMPs JOIN step from SCPLN.C by way of an all-rows

     scan with no residual conditions, which is joined to Spool 2 (Last

     Use) by way of an all-rows scan.  SCPLN.C and Spool 2 are joined

     using a single partition hash join, with a join condition of (

     "SCPLN.C.COURSEUID = COURSEUID").  The result goes into Spool 1

     (group_amps), which is built locally on the AMPs.  The size of

     Spool 1 is estimated with low confidence to be 246,077 rows (

     15,502,851 bytes).  The estimated time for this step is 0.03

     seconds. 

 

 

 

4. I inserted another 100000 rows (approx). Now the row count for CLASSES is 327292 out of which COURSEUID = 383712 had count 85744.

 

 

5. I ran the SQL once again. Below is the explain

 

  4) We execute the following steps in parallel. 

       1) We do an all-AMPs RETRIEVE step from SCPLN.O by way of an

          all-rows scan with a condition of ("SCPLN.O.ETL_ACTION =

          'I'") into Spool 2 (all_amps), which is built locally on the

          AMPs.  The size of Spool 2 is estimated with high confidence

          to be 327,292 rows (10,146,052 bytes).  The estimated time

          for this step is 0.02 seconds. 

       2) We do an all-AMPs RETRIEVE step from SCPLN.C by way of an

          all-rows scan with no residual conditions into Spool 3

          (all_amps), which is duplicated on all AMPs.  The size of

          Spool 3 is estimated with high confidence to be 2,758,860

          rows (68,971,500 bytes).  The estimated time for this step is

          0.06 seconds. 

  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an

     all-rows scan, which is joined to Spool 3 (Last Use) by way of an

     all-rows scan.  Spool 2 and Spool 3 are joined using a single

     partition hash join, with a join condition of ("COURSEUID =

     COURSEUID").  The result goes into Spool 1 (group_amps), which is

     built locally on the AMPs.  The size of Spool 1 is estimated with

     low confidence to be 327,292 rows (20,619,396 bytes).  The

     estimated time for this step is 0.05 seconds. 

 

 

 

So, I think optimizer is smart enough to decide whether data distribution is getting skewed or not. Accordingly it will change the join plan to make sure join processing is similar on each AMP.

 

Please correct me if I am missing any point.

 

Thanks

Santanu


Viewing all articles
Browse latest Browse all 14773

Trending Articles