If you really want to confuse a candidate for a DBA job interview, just ask the question:

“Can you explain to me the difference between selectivity and cardinality?” and look for their face expression.

Depending on how long they’ve been a DBA, the face expression might tell you different things.

Some DBAs will be surprised as they have no idea what these words are, and no idea they should know the meaning.
Other DBAs will panic, and secretly wish you would not have asked that questions.
While other DBAS will keep on talking, just for the sake of talking.
And finally there will be those few DBAs that will know the answer.

Which DBA would you be, if you were to get asked this questions? Don’t need to tell me, but answer that questions honestly to yourself!

I’ll be honest with you.
When I read about selectivity and cardinality, for the next few days I am the DBA that knows exactly what each one is.
And then…a week or two later, life happens, and I forget all about it.

Why is that? Well, because I don’t use these terms all the time, and they are not ingrained in my brain. I bet I am not the only one!

So here it is for those DBAs that want to know the answer, and want to have an easy to remember reference on selectivity and cardinality, this post is for you!

1. What Is Selectivity?
2. What Is Cardinality?
3. Three Simple Examples On Selectivity And Cardinality
4. Example When Things Get More Complex
5. Why Do I Need To Know All This?

1. What Is Selectivity?

Before coming up with a definition on Selectivity, remember that selectivity is a number between 0 and 1, or, it can be a number expressed as a percentage, ie. 20%.

Selectivity represents a fraction, or a percentage of the data (rows) returned or accessed by an operation (select, update …).

Think about it this way. If a table has 100 rows, and I am selecting only 10 rows from the table, then the fraction or the rows returned is 0.1 or 10%. How did I get to this result?

100rows .... 100%
 10rows ....  x

x=(10 * 100)/100 = 10 % or 0.1

Selectivity = number of rows returned / total number of rows

2. What Is Cardinality?

Cardinality is the number of rows returned or accessed by an operation. From the above example, when the query returned 10 rows, cardinality equals 10.
Some publications define the cardinality as number of distinct rows returned. I like to use the definition that Christian Antognini is using as well, and the one that Oracle is also referencing.

The relationship between Cardinality and Selectivity is the following:

Cardinality = Selectivity * Total Number Of Rows

3. Three Simple Examples On Selectivity And Cardinality

How do you better remember a concept? If I give you the definitions, or, if I show you some examples?
I believe the latter is the answer.

Lets go through some easy to understand examples.

Lets say we have a table: tickets with 180 rows

a. select * from tickets;

selectivity = ?
cardinality = ?

As there is no where clause, all of the rows will be returned from the table:

selectivity = number of rows returned/total number of rows = 180/180 = 1 (100% of the rows were returned)
cardinality = number of rows returned = 180

b. select * from tickets where country=’CANADA’;

lets assume there are 10 rows returned.

selectivity = ?
cardinality = ?

selectivity = number of rows returned/total number of rows = 10/180 = 0.05 (5% of the rows were returned)
cardinality = number of rows returned = 10

c. select * from tickets where price = 0 ;

lets assume there are 0 rows returned.

selectivity = ?
cardinality = ?

selectivity = number of rows returned/total number of rows = 0/180 = 0 (0% of the rows were returned)
cardinality = number of rows returned = 0

4. Examples When Things Get More Complex

Did you notice something in common with the examples from the previous paragraph?

These examples were all very easy to figure out. But what happens when we introduce aggregate functions, such as sum, count, min, max?
Lets see below!

a. select max(price) from tickets;

selectivity = ?
cardinality = ?

The above query returns one row. But in order to return that one row, it has to actually read all the rows in the tickets table, so it has to access all the rows.

selectivity = number of rows accessed/total number of rows = 180/180 = 1 (100% of the rows were accessed)
cardinality = number of rows accessed = 180

b. select max(price) from tickets where country = ‘CANADA’;

Based on the above example, there are 10 rows in the table for country = ‘CANADA’. The query returns only one row, the max(price).

selectivity = ?
cardinality = ?

selectivity = number of rows accessed/total number of rows = 10/180 = 0.05 (5% of the rows were accessed)
cardinality = number of rows accessed = 10

5. Why Do I Need To Know All This?

You are probably wondering why do you need to know all of this selectivity and cardinality stuff. How will this help you in your DBA job?

If I were you, I would ask the exact same questions!

The Oracle optimizer, uses the selectivity of an operation to determine the best access path, that is, the most efficient access path for a query or an operation.

How does the optimizer know about the selectivity?

It uses statistics, if available, or it uses dynamic statistics or an internal default value.
Selectivity is not visible in execution plans.
What you see in execution plans, is the cardinality estimate: “rows” column of the execution plan.

Do you want to know what does weak selectivity or strong selectivity mean? Don’t miss next week’s post!

Until then, here’s a riddle for you!

What do you think is more important to know by the optimizer, that an operations returns 100,000 rows (cardinality), OR, that the operation returns 1% of the data (selectivity)?

Comment below, I read all the comments!

If you enjoyed this article, and would like to learn more about databases, please sign up below, and you will receive
The Ultimate 3 Step Guide To Find The Root Cause Of The Slow Running SQL!

–Diana


13 thoughts on “5 Things To Better Understand Selectivity And Cardinality

  1. Hello Ma’am,
    In order to generate a optimized plan, optimizer needs both the information otherwise it might lead to generate a sub optimal execution plan. Your valuable inputs is always welcome.

    Regards,
    OrclFan

    1. Hi OrclFan! The selectivity is more important than cardinality, when Oracle generates the execution plan!
      Cheers!

  2. Selectivity is more important for the optimizer.Knowing that the operation returns 1% of the data will make use of an index access path if available

  3. Hello,
    I would like to congratulate you for the post, I think it’s great to remember concepts that I do not use in day-to-day.

    I was left with a doubt about cardinality, at one point it is explained: “Cardinality is the number of rows returned or accessed by an operation. From the above example, when the query returned 10 rows, cardinality equals 10.”

    “Cardinality = Selectivity * Total Number Of Rows”

    B. Select * from tickets where country = ‘CANADA’;

    Lets assume there are 10 rows returned.

    Number of rows returned = total number of rows = 10/180 = 0.05 (5% of the rows were returned)
    Cardinality = 0.05 * 180 = 9 (based on the formula “Cardinality = Selectivity * Total Number Of Rows”)

    My doubt is that the result of the formula is not equal to the answer presented.

    1. Hi Juliano, thank you for your kind words, and thanks for catching the error! I should have not rounded 10/180 to 0.05. I rounded it for simplicity. If you use a calculator the answer will be 0.0555555…I will make a correction on the blog post.
      However, cardinality is indeed the number of rows returned or accessed by the operation, and the formula is correct: cardinality=selectivity * total number of rows.
      Cheers!
      Diana

Leave a Reply

Your email address will not be published. Required fields are marked *