Partitioning by row and column - forum topic by skunkwerk

March 30, 2016, 4:24 pm

≫ Next: help optimizing GROUP BY query - forum topic by skunkwerk

≪ Previous: Current Perm Vs Skew Factor - forum topic by gdev

I am trying to create a table with both row and column partitioning to see if it will help speed up my queries.
The original table has 500 million rows of data.
When I try to create a partitioned table like this, it runs out of CPU time and does not complete:
CREATE TABLE rtl.SpeedTest4 AS

(

SELECT

t1.*

FROM

rtl.SpeedTest3 t1

WHERE dt BETWEEN '2010-01-01' AND '2016-01-01'

) WITH DATA PRIMARY INDEX (dt, country, product, channel)

PARTITION BY (

RANGE_N(dt BETWEEN '2010-01-01' AND '2016-01-01' EACH INTERVAL '1' DAY));

Should I change the interval to '7' DAY?

Also, how do I partition on columns as well?

When I try this:

DATA UNIQUE INDEX (dt, country, product, channel)
PARTITION BY (COLUMN, RANGE_N(dt BETWEEN '2010-01-01' AND '2016-01-01' EACH INTERVAL '7' DAY));

I get the error: "Partitioning is not applicable to a secondary index."
Do I have to create a hash that's unique, and then index on that?

Tags:

column partitioning

Forums:

Database

↧

help optimizing GROUP BY query - forum topic by skunkwerk

March 30, 2016, 10:26 pm

≫ Next: fastload to NOPI table gives skew - forum topic by erics01

≪ Previous: Partitioning by row and column - forum topic by skunkwerk

This is my query:

CREATE TABLE rtl.intermediate AS (
SELECT
   customer_id,
   MAX(new_to) AS new_to, 
   MIN(age) AS age,
   MIN(gender) AS gender,
   MIN(existing) AS existing
FROM rtl.base
WHERE
   country='China'
AND
   product='cereal'
AND
   dt BETWEEN '2015-01-01' AND '2016-01-01'
GROUP BY customer_id
) WITH DATA 
UNIQUE PRIMARY INDEX (customer_id, new_to, gender);

It currently takes about 10 seconds to run, and I would like to bring it down to 2 seconds. The rtl.base table is partitioned on date (every 7 days) and has a primary index on customer_id, product, country, date (called dt). I have collected statistics on the partition and the age column.

This is the explain:

1) First, we lock a distinct rtl."pseudo table" for read on a
RowHash to prevent global deadlock for
rtl.base.
2) Next, we lock rtl.intermediate for
exclusive use, and we lock rtl.base for read.
3) We lock a distinct DBC."pseudo table" for read on a RowHash for
deadlock prevention.
4) We lock DBC.DBase for read on a RowHash.
5) We do a single-AMP ABORT test from DBC.DBase by way of the unique
primary index "Field_1 = 'rtl'" with a residual condition of (
"'0000BF0A'XB= DBC.DBase.Field_2").
6) We create the table header.
7) We do an all-AMPs SUM step to aggregate from 53 partitions of
rtl.base with a condition of (
"(rtl.base.dt >= DATE '2015-01-01') AND
((rtl.base.dt <= DATE '2016-01-01') AND
((rtl.base.country = 'CHN') AND
(rtl.base.product = 'cereal')))")
, grouping by field1 ( rtl.base.customer_id).
Aggregate Intermediate Results are computed globally, then placed
in Spool 3. The size of Spool 3 is estimated with no confidence
to be 8,142,324 rows (293,123,664 bytes). The estimated time for
this step is 0.28 seconds.
8) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is redistributed
by the hash code of (rtl.base.customer_id,
rtl.base.new_to,
rtl.base.gender) to all AMPs. Then we do a
SORT to order Spool 1 by row hash. The size of Spool 1 is
estimated with no confidence to be 8,142,324 rows (227,985,072
bytes). The estimated time for this step is 0.15 seconds.
9) We do an all-AMPs MERGE into
rtl.intermediate from Spool 1 (Last Use).
The size is estimated with no confidence to be 8,142,324 rows.
The estimated time for this step is 1 minute and 27 seconds.
10) We lock a distinct DBC."pseudo table" for write on a RowHash for
deadlock prevention, we lock a distinct DBC."pseudo table" for
write on a RowHash for deadlock prevention, and we lock a distinct
DBC."pseudo table" for write on a RowHash for deadlock prevention.
11) We lock DBC.Indexes for write on a RowHash, we lock DBC.TVFields
for write on a RowHash, we lock DBC.TVM for write on a RowHash,
and we lock DBC.AccessRights for write on a RowHash.
12) We execute the following steps in parallel.
1) We do a single-AMP ABORT test from DBC.TVM by way of the
unique primary index "Field_1 = '0000BF0A'XB, Field_2 =
'INTERMEDIATE'".
2) We do an INSERT into DBC.Indexes (no lock required).
3) We do an INSERT into DBC.Indexes (no lock required).
4) We do an INSERT into DBC.Indexes (no lock required).
5) We do an INSERT into DBC.TVFields (no lock required).
6) We do an INSERT into DBC.TVFields (no lock required).
7) We do an INSERT into DBC.TVFields (no lock required).
8) We do an INSERT into DBC.TVFields (no lock required).
9) We do an INSERT into DBC.TVFields (no lock required).
10) We do an INSERT into DBC.TVM (no lock required).
11) We INSERT default rights to DBC.AccessRights for
rtl.intermediate.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.

thanks!

Tags:

optimization

Forums:

Database

↧

fastload to NOPI table gives skew - forum topic by erics01

March 31, 2016, 12:52 am

≫ Next: Partitioning by row and column - response (1) by dnoeth

≪ Previous: help optimizing GROUP BY query - forum topic by skunkwerk

a large table created with a NOPI is loaded with more than 1.000.000.000 rows to a 6800 system with 126 AMPs.
88 AMPS have much more data than the other 38, for example the first 4 AMPS give:
TableName VPROC CurrentPerm
CCT001HND 0      197.323.776
CCT001HND 1   2.246.341.632
CCT001HND 2      197.011.968
CCT001HND 3 2.246.341.632
..
The same pattern occurs with similar NOPI tables loaded.
When a copy is made of the table with CREATE TABLE XX AS CCT001HND WITH DATA suddenly all AMPs have around the same space (about 2.246.xxx.xxx currentperm)
How is this possible ? Is there some kind of compression when loading the data ? is something wrong with the currentperm figure with NOPI tables ? How do you check the number of rows per AMP with a NOPI table ?

Tags:

fastload nopi skew

Forums:

Database

↧

Partitioning by row and column - response (1) by dnoeth

March 31, 2016, 2:31 am

≫ Next: help optimizing GROUP BY query - response (1) by dnoeth

≪ Previous: fastload to NOPI table gives skew - forum topic by erics01

runs out of CPU time and does not complete = workload CPU limit?
Changing the granularity to '7' DAY will not help, it's still sorting the same number of rows.

Columnar will not help, too, as it needs more CPU.
You probably need to change the index order to

PARTITION BY (COLUMN, RANGE_N(dt BETWEEN '2010-01-01' AND '2016-01-01' EACH INTERVAL '7' DAY))

,UNIQUE INDEX (dt, country, product, channel);

But why do you want a USI on those columns? It's just CPU/IO/perm overhead and probbaly never used. And if this combination is unique why it's not defined as UPI in your first create?

Do you need to change the PI, otherwise it should be faster because there's no redistribution needed.

Btw, if you run TD15.10 there are additional options for Columnar tables like Primary AMP Index...

↧

help optimizing GROUP BY query - response (1) by dnoeth

March 31, 2016, 2:39 am

≫ Next: Group by, distinct, PARTITION BY - forum topic by Phanto

≪ Previous: Partitioning by row and column - response (1) by dnoeth

Why do you create a three column UPI if customer_id is already unique?
Simply use UNIQUE PRIMARY INDEX (customer_id) and there's no redistribution needed.

Did you create the base table?
You might try changing the PI, too, can you add some demographics for customer_id, product, country?

↧

Group by, distinct, PARTITION BY - forum topic by Phanto

March 31, 2016, 3:08 am

≫ Next: Group by, distinct, PARTITION BY - response (1) by dnoeth

≪ Previous: help optimizing GROUP BY query - response (1) by dnoeth

Hi everybody,
At first I want to say this forum helps me really a lot, but in this case I can't find a solution that helps me out.
I try to have a summery of a 2 tables I've joined.

SELECT 

distinct a.par_nbr,
count(a.shp_nbr) OVER (PARTITION BY a.par_nbr),

sum(case 
when b.wgt_typ_cd  = 'L' then 
cast((b.tot_wgt*0.453592) AS DECIMAL(6,2))
	else  b.tot_wgt
end) OVER (PARTITION BY a.shp_nbr) as Wgt

from event_history a
inner visibility b
on a.shp_nbr = b.shp_nbr

where 
a.evnt_dt between '2016-03-12' and '2016-03-18'
and a.par_cons_nbr eq any ('A')

That's the result

par_nbr Group Count(shp_nbr) Wgt

216621739627 5 1,36

216621739627 5 2,36

216621739627 5 2,45

216621739627 5 2,49

216621739627 5 3,08

What I want is

par_nbr Group Count(shp_nbr) Wgt

216621739627 5 11,74

Is there any solution I'm not thing about?

Thank you very much for every hind you can give.

Regards

Forums:

Database

↧

Group by, distinct, PARTITION BY - response (1) by dnoeth

March 31, 2016, 3:30 am

≫ Next: Group by, distinct, PARTITION BY - response (2) by Phanto

≪ Previous: Group by, distinct, PARTITION BY - forum topic by Phanto

What's the relationship between shp_nbr and par_nbr?
Are there multiple par_nbr per shp_nbr or 1:n or m:n?

↧

Group by, distinct, PARTITION BY - response (2) by Phanto

March 31, 2016, 3:36 am

≫ Next: Group by, distinct, PARTITION BY - response (3) by dnoeth

≪ Previous: Group by, distinct, PARTITION BY - response (1) by dnoeth

Hi Dieter,
there are many shp_nbr to realated to one par_nbr.
Regards

↧

Group by, distinct, PARTITION BY - response (3) by dnoeth

March 31, 2016, 3:49 am

≫ Next: Group by, distinct, PARTITION BY - response (4) by Phanto

≪ Previous: Group by, distinct, PARTITION BY - response (2) by Phanto

Well, seems like you don't need an OLAP-function but good old GROUP BY in that case:

SELECT
   a.par_nbr,
   COUNT(a.shp_nbr),
   SUM(CASE
          WHEN b.wgt_typ_cd  = 'L'
          THEN CAST((b.tot_wgt*0.453592) AS DECIMAL(6,2))
          ELSE b.tot_wgt
       END) AS Wgt
FROM event_history a 
JOIN visibility b
  ON a.shp_nbr = b.shp_nbr
WHERE a.evnt_dt BETWEEN '2016-03-12' AND '2016-03-18'
  AND a.par_cons_nbr EQ ANY ('A')
GROUP BY a.par_nbr

↧

Group by, distinct, PARTITION BY - response (4) by Phanto

March 31, 2016, 4:00 am

≫ Next: Group by, distinct, PARTITION BY - response (5) by Phanto

≪ Previous: Group by, distinct, PARTITION BY - response (3) by dnoeth

Thank you so much Dieter.
I tried it before only with grouping but it failed.
however it works now.

Thanks a million.

Regards Sven

↧

Group by, distinct, PARTITION BY - response (5) by Phanto

March 31, 2016, 4:36 am

≫ Next: fastload to NOPI table gives skew - response (1) by erics01

≪ Previous: Group by, distinct, PARTITION BY - response (4) by Phanto

Sorry, I have to come back in this case.
I recognize that it could be that the same shp_nbr appear more than one time with the same data.
But I need it a only one time. I tried it with distinct the shp_nbr but for the weight it calculates more than once.
e.g.
shp_nbr Wgt
AAA1 5
AAA1 5
AAA2 8
AAA3 2
AAA2 8
my result with the query is
shp_nbr Wgt
AAA1 10
AAA2 16
AAA3 2
it should be
shp_nbr Wgt
AAA1 5
AAA2 8
AAA3 2

that at the end i Have
par_nbr Wgt
A 15

Hope not explained to complicated.

Regards Sven

↧

fastload to NOPI table gives skew - response (1) by erics01

March 31, 2016, 5:16 am

≫ Next: Formulae to calculate total available CPU Seconds per Day - forum topic by HarshaKudumula

≪ Previous: Group by, distinct, PARTITION BY - response (5) by Phanto

-----------+-----------+
|            |Compr-|Estimated |Estimated % |                      Distribution of data block sizes                       |   Data block    |   Total    |   Total   |
|            |ession| Compres- | of Blocks  |                       (by range of number of sectors)                       | size statistics |   Number   |   Number  |
|  Table ID  |Status|  sion    |   Un-      +----+----+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+    (sectors)    |     of     |     of    |
|            |      |  Ratio   | compressed |  1-|  9-| 25-| 65-|121-|169-|217-|257-|361-|457-|513-|761-|1025-|1305-|1633-|-----+-----+-----+    Data    | Cylinders |
|            |      |          |            |  8 | 24 | 64 |120 |168 |216 |256 |360 |456 |512 |760 |1024|1304 |1632 |2048 | Min | Avg | Max |   Blocks   |           |
|    0  5560 |  PC  |  90.45%  |  79.53%    |    |    | 30%|    |    |    |    |    |    | 70%|    |    |     |     |     |   4 | 371 | 512 |   1079883  |     19224*|
+------------+------+----------+------------+----+----+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+------------+-----------+
* Reported in units of Large Cylinders
  (A Large Cylinder is 6 times the size of a small cylinder)

 "Compression Status" :  C = Fully Compressed
                        PC = Partially Compressed
                         U = Fully Uncompressed
                         N = Not Compressible

This is the ferret showblocks output.. There seem to be data compressed
But difficult to see why only 38 AMPs are compressed

↧

Formulae to calculate total available CPU Seconds per Day - forum topic by HarshaKudumula

March 31, 2016, 6:20 am

≫ Next: fastload to NOPI table gives skew - response (2) by ToddAWalter

≪ Previous: fastload to NOPI table gives skew - response (1) by erics01

Hi All,
we recently migrated to 6800 system and implemented PMCOD 75%.
I am using following formulae for available CPU Cycles in 6800 system.
So estimated total available CPU seconds/day is #Nodes*#CPUs*Seconds/day = 4*32*86,400 CPU Seconds/day . i.e. 11,059,200 CPU Seconds/day.

As per my understanding, 75% PMCOD means CPU will theoretically be 33% higher compared to no COD. It is nothing but, the access to CPU is reduced by internal mechanisms that stop the CPU from doing work for a percent of the time for each core on the node. So CPU Utilization recorded in DBQL will be higher by an amount that represents the inverse of the PMCOD level.
So bottom line is, the total available CPU cycles/day won’t change but with COD, CPU utilization by the user will be higher as compared to no COD.

Is my understanding correct or is it little more complicated?

Forums:

Database

↧

fastload to NOPI table gives skew - response (2) by ToddAWalter

March 31, 2016, 7:48 am

≫ Next: fastload to NOPI table gives skew - response (3) by CarlosAL

≪ Previous: Formulae to calculate total available CPU Seconds per Day - forum topic by HarshaKudumula

Executing UpdateSpace and checking again would allow determination whether it is a space accounting issue.

I suggest opening a teradata incident in any case. Skew is expected if the table is small but in a table this size it should be quite evenly distributed.

↧

fastload to NOPI table gives skew - response (3) by CarlosAL

March 31, 2016, 8:30 am

≫ Next: Group by, distinct, PARTITION BY - response (6) by dnoeth

≪ Previous: fastload to NOPI table gives skew - response (2) by ToddAWalter

Hi.
>>"But difficult to see why only 38 AMPs are compressed"
Temperature-based block-level compression perhaps?
Cheers.
Carlos.

↧

Group by, distinct, PARTITION BY - response (6) by dnoeth

March 31, 2016, 8:34 am

≫ Next: Formulae to calculate total available CPU Seconds per Day - response (1) by Fred

≪ Previous: fastload to NOPI table gives skew - response (3) by CarlosAL

Hi Sven,
then you need a Derived Table as DISTINCT is processed after GROUP BY:

SELECT
   par_nbr,
   COUNT(*),
   SUM(Wgt)
FROM 
 (
   SELECT DISTINCT 
      a.par_nbr,
      a.shp_nbr,
      CASE
          WHEN b.wgt_typ_cd  = 'L'
          THEN CAST((b.tot_wgt*0.453592) AS DECIMAL(6,2))
          ELSE b.tot_wgt
       END AS Wgt
   FROM event_history a 
   JOIN visibility b
     ON a.shp_nbr = b.shp_nbr
   WHERE a.evnt_dt BETWEEN '2016-03-12' AND '2016-03-18'
     AND a.par_cons_nbr IN ('A')
 ) AS dt
GROUP BY par_nbr

↧

Formulae to calculate total available CPU Seconds per Day - response (1) by Fred

March 31, 2016, 2:00 pm

≫ Next: TPT ODBC - forum topic by john9

≪ Previous: Group by, distinct, PARTITION BY - response (6) by dnoeth

A 6800H node has two E5-2697 V3 processors with 14 cores each and 2 pipelines per core. That would make your multiplier 56 per node not 32.
Some of those CPU cycles will be consumed by the OS, so typically we might figure 80% available to the database.

Yes, with PMCOD there are just as many CPU seconds as on a node with no COD, but each CPU second can do less work - so for the same workload the apparent CPU utilization will be higher.

↧

TPT ODBC - forum topic by john9

March 31, 2016, 2:01 pm

≫ Next: Inaccurate Space accounting - response (1) by Fred

≪ Previous: Formulae to calculate total available CPU Seconds per Day - response (1) by Fred

Hi All -
I am facing issue when i try to run the TPT script which extracts the data from oracle and load in to teradata.Please find below is the error i am getting.
ODBC_Operator: TPT17122: Error: unable to connect to data source
ODBC_Operator: TPT17101: Fatal error received from ODBC driver:
STATE=IM002D, CODE=0,
MSG='[DataDirect][ODBC lib] System information file not found. Please check the ODBCINI environment variable.'
ODBC_Operator: disconnecting sessions
ODBC_Operator: TPT17124: Error: unable to disconnect from data source
ODBC_Operator: TPT17101: Fatal error received from ODBC driver:
STATE=53, CODE=0,
MSG='523 52'

Below is my script:
DEFINE JOB ODBC_LOAD
DESCRIPTION 'ODBC LOAD SUPL DEFINITION TABLE'
(
DEFINE SCHEMA Sanity_Test_ODBC_Schema
(
CTRY_CDE VARCHAR(2),
CTRY_NAME VARCHAR(100)
);
DEFINE OPERATOR DDL_Operator
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'ddl_log',
VARCHAR TdpId = 'TGT_SERVER',
VARCHAR UserName = 'USER',
VARCHAR UserPassword = 'PWD',
VARCHAR ARRAY ErrorList = ['3807','3803']
);
DEFINE OPERATOR ODBC_Operator
DESCRIPTION 'Teradata Parallel Transporter ODBC Operator'
TYPE ODBC
SCHEMA Sanity_Test_ODBC_Schema
ATTRIBUTES
(
VARCHAR PrivateLogName = 'odbc_log',
VARCHAR DSNName = 'SRC_IP',
VARCHAR UserName = 'SRC_USER',
VARCHAR UserPassword = 'PWD',
VARCHAR SelectStmt = 'Select CTRY_CDE,CTRY_NAME from KYC.CTRY where rownum < 6;'
);
DEFINE OPERATOR Load_Operator
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'load_log',
VARCHAR TdpId = 'TGT_SERVER',
VARCHAR UserName = 'USER',
VARCHAR UserPassword = 'RMP2prod$',
VARCHAR TargetTable = 'DB.ORATEST',
VARCHAR LogTable = 'DB.ORATEST_log',
VARCHAR ErrorTable1 = 'DB.ORATEST_er1',
VARCHAR ErrorTable2 = 'DB.ORATEST_er2'
);
Step Setup_Into_Tables
(
APPLY
('drop table DB.ORATEST_er1;' ),
('drop table DB.ORATEST_er2;' ),
('drop table DB.ORATEST ;' ),
('create multiset table DB.ORATEST (
CTRY_CDE VARCHAR(2),
CTRY_NAME VARCHAR(100)
)
PRIMARY INDEX ( CTRY_CDE );'
)
TO OPERATOR (DDL_Operator);
);
Step Insert_Into_Tables
(
APPLY
'INSERT INTO DB.ORATEST
(
:CTRY_CDE,
:CTRY_NAME
);'
TO OPERATOR (Load_Operator)
Select *
FROM OPERATOR (ODBC_Operator);
);
);
I am able to connect to oracle from the same unix server using sqlplus.

Can any one please help to run my script successfully.

Thanks
John

Forums:

Database

↧

Inaccurate Space accounting - response (1) by Fred

March 31, 2016, 2:13 pm

≫ Next: How PDE increase Teradata performance? - response (4) by Fred

≪ Previous: TPT ODBC - forum topic by john9

Perhaps someone neglected to end online archive logging, e.g. after a backup job failure.
Check dbc.ArchiveLoggingObjsV

↧

How PDE increase Teradata performance? - response (4) by Fred

March 31, 2016, 2:22 pm

≫ Next: Joining varchar to integer - forum topic by usmans

≪ Previous: Inaccurate Space accounting - response (1) by Fred

PDE doesn't create hyperthreads, and doesn't boost performance.
Teradata's optimizer and architecture are designed to maximize parallelism automatically. See the "Introduction to Teradata" manual available via www.info.teradata.com for more details.

↧

Latest Images