The explain looks like expected, redistributing CUST_T and then a local join to KEY_T.
There's "no confidence" for join and aggregate (which stats are collected?), if the actual number is much larger than expected it will be much slower. But you should never use wall clock time as it's dependant on the load on your system.
Do you have access to the Querylog? The QryLogStepsV will have all details about estimated vs. actual.
How often is this query supposed to run?
The explain looks like expected, redistributing CUST_T and then a local join to KEY_T.
There's "no confidence" for join and aggregate (which stats are collected?), if the actual number is much larger than expected it will be much slower. But you should never use wall clock time as it's dependant on the load on your system.
Do you have access to the Querylog? The QryLogStepsV will have all details about estimated vs. actual.
How often is this query supposed to run?