Hi San,
The base of decisions in defining MLPPI is to improve access, so you need to consider access and then decide about which col should be at the top and which at the lowest level.
The most fundamental thing in MLPPI is the number of data blocks you are going to touch. The more the number of data blocks are read, the worst the performance.
So first of you need to check the data blocks that will be constructed against your combined partition. If the number of datablock for the combined partitions is large, then the order of partitions does not matter.
The second thing is to put the partition expression at the top that will be mostly access and have chances of partition elimination. Third you must define expression with max partitions at the lowest level, but if it is being accessed most then you can move it to top.
So in all cases, you need to consider the number of data blocks that will be accessed. try to design your expression that ultimately lead to minimum data blocks to be accessed.
for your scenario, Col1 with 50000 row/val will create 50 partitions and col2 with 1000 distinct values will create 10 partition, so I suggest you can define Col2 partitions at level 1, say in a queary 2 of these 10 partitions are being probed, and then each of these two partitions will lead to second level, second level contains more values per lock so the number of data blocks against these 2 level 1 partitions will be less.
On the contrary, if you define col1 at level 1, out of the 50 partitions say 5 partitions are left after elmination, against each of these partitions, more second level partitions will be probed, and as the number of rows per block are less, there are chances that more blocks will be access and it will degrade the performance.
Hope I am able to explain what I wanted to say. Please let me know if you have any confusions.
Hi San,
The base of decisions in defining MLPPI is to improve access, so you need to consider access and then decide about which col should be at the top and which at the lowest level.
The most fundamental thing in MLPPI is the number of data blocks you are going to touch. The more the number of data blocks are read, the worst the performance.
So first of you need to check the data blocks that will be constructed against your combined partition. If the number of datablock for the combined partitions is large, then the order of partitions does not matter.
The second thing is to put the partition expression at the top that will be mostly access and have chances of partition elimination. Third you must define expression with max partitions at the lowest level, but if it is being accessed most then you can move it to top.
So in all cases, you need to consider the number of data blocks that will be accessed. try to design your expression that ultimately lead to minimum data blocks to be accessed.
for your scenario, Col1 with 50000 row/val will create 50 partitions and col2 with 1000 distinct values will create 10 partition, so I suggest you can define Col2 partitions at level 1, say in a queary 2 of these 10 partitions are being probed, and then each of these two partitions will lead to second level, second level contains more values per lock so the number of data blocks against these 2 level 1 partitions will be less.
On the contrary, if you define col1 at level 1, out of the 50 partitions say 5 partitions are left after elmination, against each of these partitions, more second level partitions will be probed, and as the number of rows per block are less, there are chances that more blocks will be access and it will degrade the performance.
Hope I am able to explain what I wanted to say. Please let me know if you have any confusions.