Quantcast
Channel: SCN: Message List
Viewing all articles
Browse latest Browse all 3019

Re: Partition Pruning in HANA

$
0
0

This is off topic, but it relates to the statement you made above about inefficient compression. 

 

We will probably move to the SALES_DATE for our partitioning column since this is how the model

is currently designed, and the designers tell me that whatever date related filtering is plugged in,

it should translate to SALES_DATE.  There is no hash partitioning here.  1 level Range only. 

 

 

 

But your comment about compression concerns me. Currently, we had 4/5 fiscal weeks/partition (comprising a month). For 5 years of data, this gives us 12 partitions/year, and 60 partitions in total with actual data, with 50-70 million rows of data/partition.  We had this partitioned out into the future and past so that we could just drop partitions as they roll off. Therefore we also had a number of empty ones as well.  All this spread across 3 nodes. 

 

 

 

With SALES_DATE, if we go with the same month logic, that is 28/35 days/month, 364 days/year with the same number of partitions.

 

 

 

Do you mean that applying the general rule of thumb that the fewer the number of distinct values,

the better the compression, that similar data spread across more partitions means more compression dictionaries and higher memory utilization?  I can understand that.  I suppose in that case, fewer partitions would be better.  It seems difficult to judge the balance between reducing the number of rows touched and degree of parallelism vs. size of structures in memory.  By the SAP recommendation of 250+ million/partition, we could go with quarterly divisions.  But conventional wisdom seems to suggest that scanning fewer rows across more partitions is faster.  We were not necessarily thinking in terms of memory consumptiuon. 

 

 

 

With regard to data distribution, although we sell a large number of core items, part of our business is

seasonal in nature.  Data will be somewhat skewed per month in terms of the number of records, in that this is transactional data and we sell a lot more in the last quarter of the year.  But we can balance the year/months out across the nodes.  Data for some of the columns would be somewhat skewed across partitions for the seasonal part of our business, like sku number, vendor number, etc. I would admit that in terms of raw numbers, I do not know the extent of that skew.  I would think that the bulk of the column(s) would be more evenly distributed.

 

 

 

Your book has helped me understand some of this a bit better as it relates to HANA.

 

Thanks.


Viewing all articles
Browse latest Browse all 3019

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>