The "SHOW STATISTICS VALUES COLUMN col ON db.tab; " shows the Biased values of the statistics. In my opinion this is exactly a very good list for multi value compression.
The following scripts should give a starting point for very fast multi value compression out of actual statistics. It is not tested for every feature the teradata database provides.
In worst case you loose statistics for one table, but this statistics is saved in the dump files and can submitted again.
The success of compression is connected to the amount of actual field statistics.
To limit the number of values a percentage of 1% is used to get the most used values. The number of null values for compression is also checked.
The advantages are:
- No costs for getting the values for compression
- Good compression results with easy algorithmus
This easy solution has some disadvantages:
- Compress is only performed on columns with statistics
- Statistics have to be actual
- Procedure doesn't take care of previous values list
First generate as list of commands to get statistics for regenerating statistics:
SELECT
'SHOW STATISTICS VALUES COLUMN '||(trim (both from a.columnname))||' on '||(trim(both from a.databasename))||'.'||(trim(both from a.tablename))||';' as stmt
FROM
dbc.ColumnStatsV a
INNER JOIN
dbc.columns b
ON
a.databasename=b.databasename
AND
a.tablename=b.tablename
AND
a.columnname=b.columnname
LEFT OUTER JOIN
dbc.PartitioningConstraintsV c
ON
a.databasename=c.databasename
AND
a.tablename=c.tablename
AND
c.constrainttext LIKE '%'||a.columnname||'%'
WHERE
c.constrainttext is null
AND
a.indexnumber is null
AND
a.databasename='dbtest'
AND
b.columntype<>'TS'
AND
(a.databasename,a.tablename,a.columnname) not in (select databasename,tablename,columnname from dbc.indices)
order by a.databasename,a.tablename,a.columnname;
Which produces:
SHOW STATISTICS VALUES COLUMN col1 on dbtest.tab1;
SHOW STATISTICS VALUES COLUMN col2 on dbtest.tab1;
SHOW STATISTICS VALUES COLUMN col3 on dbtest.tab1;
SHOW STATISTICS VALUES COLUMN col1 on dbtest.tab2;
SHOW STATISTICS VALUES COLUMN col2 on dbtest.tab2;
Putting the bteq output of these command in the following gawk scriptBEGIN { CUTPERCENTAGE=1;
print ".errorlevel (3582) severity 0";
}
/ COLUMN \(/ { COL=$3; }
/ ON / { DBTAB=$2; }
/^ \/\*\* / { BIASEDON=0; }
/NumOfRows/ { CUTROWS=$4*CUTPERCENTAGE/100;
BIASED=="";
if (0+CUTROWS<0+NULLROWS) BIASED="NULL,";
}
/\/\* NumOfNulls/ { NULLROWS=$4; }
/^ \/\*\* Biased:/ { BIASEDON=1;}
/^ \/\* / { if (BIASEDON==1)
{
if (CUTROWS < 0+gensub(".*,","","",gensub(",? ?$","","g")))
BIASED=BIASED gensub("^ */[^/]*/","","g",gensub(",[0-9 ]*,? ?$","","g")) ",";
}
}
/^COLLECT STATISTICS/ { COLSTATON=1; }
{ if (COLSTATON==1) COLSTAT=COLSTAT "\n" $0; }
/^);/ { BIASEDON=0;
COLSTATON=0;
if (BIASED=="")
{
COLSTAT="";
next;
}
if (DBTAB!=DBTABOLD)
{
if (DBTABOLD!="")
{
print DROPSTATS;
print ALTERTABLE ";";
print COLSTATALL;
COLSTATALL="";
}
ALTERTABLE="ALTER TABLE " DBTAB " ADD " COL " COMPRESS (" gensub(",$","","",BIASED) ")";
DROPSTATS="DROP STATISTICS COLUMN " COL " ON " DBTAB ";";
DBTABOLD=DBTAB;
}
else
{
ALTERTABLE=ALTERTABLE "\n"" ,ADD " COL " COMPRESS (" gensub(",$","","",BIASED) " )";
DROPSTATS=DROPSTATS "\n""DROP STATISTICS COLUMN " COL " ON " DBTAB ";";
}
COLSTATALL=COLSTATALL "\n" COLSTAT;
COLSTAT="";
BIASED="";
}
END {
print ";";
}
Produces:
DROP STATISTICS column col1 on dbtest.tab1;
DROP STATISTICS column col2 on dbtest.tab1;
DROP STATISTICS column col3 on dbtest.tab1;
ALTER TABLE dbtest.tab1 add col1 compress ( ...)
, add col2 compress ( ...)
, add col3 compress ( ...)
;
COLLECT STATISTICS COLUMN ( col1 ) ON dbtest.tab1 VALUES (...);
COLLECT STATISTICS COLUMN ( col2 ) ON dbtest.tab1 VALUES (...);
COLLECT STATISTICS COLUMN ( col3 ) ON dbtest.tab1 VALUES (...);
DROP STATISTICS column col1 on dbtest.tab2;
DROP STATISTICS column col2 on dbtest.tab2;
ALTER TABLE dbtest.tab2 add col1 compress ( ...)
, add col2 compress ( ...)
;
COLLECT STATISTICS COLUMN ( col1 ) ON dbtest.tab2 VALUES (...);
COLLECT STATISTICS COLUMN ( col2 ) ON dbtest.tab2 VALUES (...);
Executing this output compresses the fields. Done :-)
Comments, remarks and improvements are very welcomed.
Best Regards,
Roland Sagner