Intro To Histograms. Three Basic Questions Answered.
As a DBA, one day or another you will come across histograms.
Either you will need to implement/delete/gather histograms to fix a performance problem,
or
you will need to explain someone what histograms are,
or
you might even be part of a conversation about histograms.
Either way, you will need a basic understanding of histograms.
And as a good DBA, you want to feel valuable and useful, so it is a good idea to sharpen up your skills and learn about histograms.
As always, I’ll try to break down the subject into small, bite size, easy to understand pieces.
1.What Are Histograms?
2.Why Do We Need Histograms?
3.When Are Histograms Created?
1.What Are Histograms?
Histograms are additional, “special” column statistics.
These statistics have information about the data distribution in the column.
The values of the column that has histograms are sorted into buckets. These buckets are statistic buckets, as the data itself on the disk is not sorted.
Based on the number of distinct values in the column, the database will pick one of the four types of histograms (as of 12c and up):
-Frequency Histograms
-Top Frequency Histograms
-Height-Balanced Histograms (legacy)
-Hybrid Histograms
These histogram types, will be discussed in a future blog post.
2.Why Do We Need Histograms?
Without histograms, the optimizer assumes the data is distributed uniformly in the column. How would that look like?
Lets take for example table A that has 100 rows, with 4 distinct values for the column color:
red, blue, green, yellow.
Without histograms, the optimizer assumes that there are 25 rows with color red, 25 rows with color blue, 25 rows with color green, 25 rows with color yellow.
Why is this a bad assumption?
What if there is only one row with color red, 1 row with color blue, and 1 row with color green, and then there are 97 rows of color yellow.
In this case the data distribution is not uniform. We can say that the data is skewed. If the data is skewed, then the optimizer might (and most likely will) generate inaccurate cardinality estimates, which can lead to sub-optimal execution plans.
Histograms could make the difference in the optimizer picking an execution plan with a full table scan or an execution plan with an index scan.
3.When Are Histograms Created?
If you gather stats on a table using dbms_stats, with the METHOD_OPT set to SIZE AUTO (which is the default), then the database will create histograms automatically if needed.
How does the database know that histograms are needed?
After you gathered stats on the table, and ran some select statements, the following dictionary table: SYS.COL_USAGE$ is updated, with information about previous predicates used in queries.
Then you gather stats again. Now the database will check SYS.COL_USAGE$, to see which columns will need histograms. If needed, it will gather histograms.
If you enjoyed this article, and would like to learn more about databases, please sign up below, and you will receive
The Ultimate 3 Step Guide To Find The Root Cause Of The Slow Running SQL!
–Diana
Awesome thanks.
[…] If you need an introduction to histograms, you can check out last week’s post: Intro To Histograms. Three Basic Questions Answered. […]
BIGGGGGG Like!!!
thank you!