We recently started with a teradata database, before TD we worked mainly with MS-sqlserver and Oracle databases. Now I have a small challenge, so it looked like
We receive often small or medium sized datasets to upload for cases we have to resolve one of our databases. The data format is never the same, the challenge lays in format string for a US-English and European data (number) formats.
US/English are delivering number string usually; 987,654,321.01
European clients differ in their stile; 987.654.321,01
and also the lazy ones ; 987654321,01 / 987654321.01
with or without leading zero or spaces. You name it we recieve it!
A slight difference, slowly a head deck for me. US/ENG use the comma as thousand separator and the dot as decimal separator. In Europa it usually reversed.
I’m trying to load my example data by BTEQ (I know SQL assistant proved also an import option, but some datasets are just a little to big)
I’m using BTEQ in this case from desktop windows pc. I know larger load can be done by mload, that is my next challenge and not to speak of repeating everything on the Unix platform J
Data string_example.txt
------------------------------------------------------------------------------------------
REC_ID|OPS_US1|OPS_US2|OPS_EU1|OPS_EU2|OPS|AREACODE
1|2,339.23|2339.23|2.339,23|2339,23|two
or
REC_ID|OPS_US1|OPS_US2|OPS_EU1|OPS_EU2|OPS|AREACODE
2|32,734.952|32734.952|32.734,952|32734,952|three
or
REC_ID|OPS_US1|OPS_US2|OPS_EU1|OPS_EU2|OPS|AREACODE
3|1.1|1.1|1,1|1,1|one
or
REC_ID|OPS_US1|OPS_US2|OPS_EU1|OPS_EU2|OPS|AREACODE
4|1.814|1814|1,814|1814|zero
or
REC_ID|OPS_US1|OPS_US2|OPS_EU1|OPS_EU2|OPS|AREACODE
5|001.814|001814|001,814|001814|zero_zero
or
REC_ID|OPS_US1|OPS_US2|OPS_EU1|OPS_EU2|OPS|AREACODE
6| 1.814| 1814| 1,814| 1814|space_zero
To demonstrate how data arrives, all kind of examples above. Usually only one of the examples applies :-)
Created a table:
------------------------------------------------------------------------------------------
CREATESETTABLE SANDBOX.CASEDATA,
NOFALLBACK,
NOBEFOREJOURNAL,
NOAFTERJOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
( REC_ID INTEGER notnull,
OPS_US1 DECIMAL(10,3),
OPS_US2 DECIMAL(10,3),
OPS_EU1 DECIMAL(10,3),
OPS_EU2 DECIMAL(10,3),
AREACODE INTEGER VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,
CONSTRAINT pk_CASEDATA PRIMARYKEY ( REC_ID ) );
Created a BTEQ script (that currently is not working), can’t find right formatting :-(
------------------------------------------------------------------------------------------
/* leave first line empty to prevent log-date entry*/
.LOGON xx.xx.xx.xx/user,pwd;
DATABASE SANDBOX;
.SET TITLEDASHES OFF;
.IMPORT VARTEXT '|' FILE='\\serv.local\ifs\ string_example.txt' skip=1;
.QUIET ON;
.REPEAT * PACK 10
USING REC_ID (VARCHAR(25)),
OPS_US1 (VARCHAR(25)),
OPS_US2 (VARCHAR(25)),
OPS_EU1 (VARCHAR(25)),
OPS_EU2 (VARCHAR(25)),
AREACODE (VARCHAR(25))
INSERTINTO SANDBOX.CASEDATA values (
:REC_ID,
cast(:OPS_US1 as decimal(10,3) format 'xxxxxxxxxxxxxxxxxxxxx'),
cast(:OPS_US2 as decimal(10,3) format 'xxxxxxxxxxxxxxxxxxxxx'),
cast(:OPS_EU1 as decimal(10,3) format 'xxxxxxxxxxxxxxxxxxxxx'),
cast(:OPS_EU2 as decimal(10,3) format 'xxxxxxxxxxxxxxxxxxxxx'),
:AREACODE );
.QUIET OFF;
.LOGOFF;
.EXIT;
I found out the hard way that with USING, I always use CHAR or VARCHAR. But currently I’m can’t find the right format string, search the forum and internet, tried RTFM, but until now not successful..
Whom can help (will) me?
THX!
Forums: