Did you compare the actual runtime and CPU/IO from DQBL (preferably QryLogSteps)?
You didn't show the full SQL, but i assume there are some joins in the previous steps. Depending on the actual data/PK/FK you might try to aggregate the large table(s) before the join. Otherwise materialize the data without aggregation in a Volatile Table with a PI on the GROUP BY columns and then aggregate on this VT, thus you'll get "computed locally".
Did you compare the actual runtime and CPU/IO from DQBL (preferably QryLogSteps)?
You didn't show the full SQL, but i assume there are some joins in the previous steps. Depending on the actual data/PK/FK you might try to aggregate the large table(s) before the join. Otherwise materialize the data without aggregation in a Volatile Table with a PI on the GROUP BY columns and then aggregate on this VT, thus you'll get "computed locally".
Dieter