MySQL Online Help

Everything about MySQL

How MySQL Uses Indexes

Posted by Nilnandan Joshi on November 13, 2008

How MySQL Uses Indexes

Types of Indexes

There are several types of indexes to choose from in MySQL:

  1. “Normal” Indexes“Normal” indexes are the most basic indexes, and have no restraints such as uniqueness. These can be added by creating an index (CREATE INDEX name_of_index ON tablename (columns_to_index);), altering the table (ALTER TABLE tablename ADD INDEX [name_of_index] (columns_to_index);), or when creating the table (CREATE TABLE tablename ( [...], INDEX [name_of_index] (columns_to_index) );).
  2. Unique Indexes – Unique indexes are the same as “Normal” indexes with one difference: all values of the indexed column(s) must only occur once. These can be added by creating an index (CREATE UNIQUE INDEX name_of_index ON tablename (columns_to_index);), altering the table (ALTER TABLE tablename ADD UNIQUE [name_of_index] (columns_to_index);) or when creating the table (CREATE TABLE tablename ( [...], UNIQUE [name_of_index] (columns_to_index) );).
  3. Primary keys – Primary keys are unique indexes that must be named “PRIMARY”. If you have used AUTO_INCREMENT columns, you’re probably familiar with these. These indexes are almost always added when creating the table (CREATE TABLE tablename ( [...], PRIMARY KEY (columns_to_index) );), but may also be added by altering the table (ALTER TABLE tablename ADD PRIMARY KEY (columns_to_index);). Note that you may only have one primary key per table.
  4. Full-text indexesFull-text indexes are used by MySQL in full-text searches. Because full-text search is so new and would add unnecessary complexity to this article, I won’t explain it here. Should you want more information?

How MySQL Uses Indexes

Indexes are used to find rows with specific column values fast. Without an index MySQL has to start with the first record and then read through the whole table to find the relevant rows. The bigger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly get a position to seek to in the middle of the data file without having to look at all the data. If a table has 1000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1000 rows, it is faster to read sequentially, because that minimizes disk seeks.

All MySQL indexes (PRIMARY KEY, UNIQUE, and INDEX) are stored in B-trees. Strings are automatically prefix- and end-space compressed.

Indexes are used in the following ways:

  • To quickly find the rows that match a WHERE clause.
  • To retrieve rows from other tables when performing joins.
  • To find the MAX() or MIN() value for a specific indexed column. This is optimised by a preprocessor that checks if you are using WHERE key_part_# = constant on all key parts < N. In this case MySQL will do a single key lookup and replace the MIN() expression with a constant. If all expressions are replaced with constants, the query will return at once:
·                SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10
  • To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part_1,key_part_2 ). The key is read in reverse order if all key parts are followed by DESC.
  • In some cases a query can be optimised to retrieve values without consulting the datafile. If all used columns for some table are numeric and form a leftmost prefix for some key, the values may be retrieved from the index tree for greater speed:
·                SELECT key_part3 FROM table_name WHERE key_part1=1

Suppose you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimiser to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL can’t use a partial index if the columns don’t form a leftmost prefix of the index. Suppose you have the SELECT statements shown here:

mysql> SELECT * FROM tbl_name WHERE col1=val1;
mysql> SELECT * FROM tbl_name WHERE col2=val2;
mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first of the preceding queries uses the index. The second and third queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

MySQL also uses indexes for LIKE comparisons if the argument to LIKE is a constant string that doesn’t start with a wildcard character. For example, the following SELECT statements use indexes:

mysql> SELECT * FROM tbl_name WHERE key_col LIKE "Patrick%";
mysql> SELECT * FROM tbl_name WHERE key_col LIKE "Pat%_ck%";

In the first statement, only rows with "Patrick" <= key_col < "Patricl" are considered. In the second statement, only rows with "Pat" <= key_col < "Pau" are considered.

The following SELECT statements will not use indexes:

mysql> SELECT * FROM tbl_name WHERE key_col LIKE "%Patrick%";
mysql> SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

MySQL 4.0 does another optimization on LIKE. If you use ... LIKE "%string%" and string is longer than 3 characters, MySQL will use the Turbo Boyer-Moore algorithm to initialise the pattern for the string and then use this pattern to perform the search quicker.

Searching using column_name IS NULL will use indexes if column_name is an index.

MySQL normally uses the index that finds the smallest number of rows. An index is used for columns that you compare with the following operators: =, >, >=, <, <=, BETWEEN, or a LIKE with a pattern that begins with a non-wildcard prefix like 'something%'.

Any index that doesn’t span all AND levels in the WHERE clause is not used to optimise the query. In other words: To be able to use an index, a prefix of the index must be used in every AND group.

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3
... WHERE index=1 OR A=10 AND index=2      /* index = 1 OR index = 2 */
... WHERE index_part1='hello' AND index_part_3=5
          /* optimised like "index_part1='hello'" */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
          /* Can use index on index1 but not on index2 or index 3 */

These WHERE clauses do not use indexes:

... WHERE index_part2=1 AND index_part3=2  /* index_part_1 is not used */
... WHERE index=1 OR A=10                  /* Index is not used in
                                                        both AND parts */
... WHERE index_part1=1 OR index_part2=10  /* No index spans all rows  */

Note that sometime MySQL will not use an index, even if one is available. One instance of this is when use of the index would require MySQL to access more than 30% of the rows in the table. (In this case a table scan is probably much faster, as it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, as it can much more quickly find the few rows to return in the result.

Analyzing Index Efficiency

You have some ideas on which indexes to use, but you’re not sure which is the most efficient. Well, you’re in luck, because MySQL has a built-in SQL statement to do this, known as EXPLAIN. The general syntax for this is EXPLAIN select statement;Here’s an example:

EXPLAIN SELECT peopleid FROM people WHERE firstname='Mike'
AND lastname='Sullivan' AND age='17';

This will return a somewhat cryptic result that will look usually look similar to this:

[Note: table split across two rows for readability]

+--------+------+-----------------+-----------------+
| table  | type | possible_keys   | key             |
+--------+------+-----------------+-----------------+  ...
| people | ref  | fname_lname_age | fname_lname_age |
+--------+------+-----------------+-----------------+

+---------+-------------------+------+------------+
| key_len | ref               | rows | Extra      |
... +---------+-------------------+------+------------+
| 102     | const,const,const | 1    | Where used |
+---------+-------------------+------+------------+

Let’s break this down column by column.

  • table – This is the name of the table. This will become important when you have large joins, as each table will get a row.
  • type – The type of the join. Here’s what the MySQL documentation has to say about the ref type: All rows with matching index values will be read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key, or if the key is not UNIQUE or a PRIMARY KEY (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this join type is good. In this case, since our index isn’t UNIQUE, this is the best join type we can get. In summary, if the join type is listed as “ALL” and you aren’t trying to select most of the rows in the table, then MySQL is doing a full table scan which is usually very bad. You can fix this by adding more indexes. If you want more information, the MySQL manual covers this value with much more depth.
  • possible_keys – The name of the indexes that could possibly be used. This is where nicknaming your index helps. If you leave the name field blank, the name defaults to the name of the first column in the index (in this case, it would be “firstname”), which isn’t very descriptive.
  • key – This shows the name of the index that MySQL actually uses. If this is empty (or NULL), then MySQL isn’t using an index.
  • key_len – The length, in bytes, of the parts of the index being used. In this case, it’s 102 because firstname takes 50 bytes, lastname takes 50, and age takes 2. If MySQL were only using the firstname part of the index, this would be 50.
  • ref – This shows the name of the columns (or the word “const”) that MySQL will use to select the rows. Here, MySQL references three constants to find the rows.
  • rows – The number of rows MySQL thinks it has to go through before knowing it has the correct rows. Obviously, one is the best you can get.
  • Extra – There are many different options here, most of which will have an adverse effect on the query. In this case, MySQL is simply reminding us that it used the WHERE clause to limit the results.
Disadvantages of Indexing

So far, I’ve only discussed why indexes are great. However, they do have several disadvantages.

First, they take up disk space. Usually this isn’t significant, but if you decided to index every column in every possible combination, your index file would grow much more quickly than the data file. If you have a large table, the index file could reach your operating system’s maximum file size.

Second, they slow down the speed of writing queries, such as DELETE, UPDATE, and INSERT. This is because not only does MySQL have to write to the data file, it has to write everything to the index file as well. However, you may be able to write your queries in such a way that the performance degradation is not very noticeable.

Conclusion

Indexes are one of the keys to speed in large databases. No matter how simple your table, a 500,000-row table scan will never be fast. If you have a site with a 500,000-row table, you should really spend time analyzing possible indexes and possibly consider rewriting queries to optimize your application.

Query Reference

Adding a “normal” index via CREATE INDEX:
CREATE INDEX [index_name] ON tablename (index_columns);

Example: CREATE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a unique index via CREATE INDEX:
CREATE UNIQUE INDEX [index_name] ON tablename (index_columns);

Example: CREATE UNIQUE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a “normal” index via ALTER TABLE:
ALTER TABLE tablename ADD INDEX [index_name] (index_columns);

Example: ALTER TABLE people ADD INDEX fname_lname_age (firstname,lastname,age);

Adding a unique index via ALTER TABLE:
ALTER TABLE tablename ADD UNIQUE [index_name] (index_columns);

Example: ALTER TABLE people ADD UNIQUE fname_lname_age (firstname,lastname,age);

Adding a primary key via ALTER TABLE:
ALTER TABLE tablename ADD PRIMARY KEY (index_columns);

Example: ALTER TABLE people ADD PRIMARY KEY (peopleid);

Adding a “normal” index via CREATE TABLE:
CREATE TABLE tablename (
rest of columns,
INDEX [index_name] (index_columns)
[other indexes]
);

Example:
CREATE TABLE people (
peopleid SMALLINT UNSIGNED NOT NULL,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
INDEX fname_lname_age (firstname,lastname,age)
);

Adding a unique index via CREATE TABLE:
CREATE TABLE tablename (
rest of columns,
UNIQUE [index_name] (index_columns)
[other indexes]
);

Example:
CREATE TABLE people (
peopleid SMALLINT UNSIGNED NOT NULL,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
UNIQUE fname_lname_age (firstname,lastname,age)
);

Adding a primary key via CREATE TABLE:
CREATE TABLE tablename (
rest of columns,
INDEX [index_name] (index_columns)
[other indexes]
);

Example:
CREATE TABLE people (
peopleid SMALLINT NOT NULL AUTO_INCREMENT,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
PRIMARY KEY (peopleid)
);

Dropping (removing) a “normal” or unique index via ALTER TABLE:
ALTER TABLE tablename DROP INDEX index_name; Example: ALTER TABLE people DROP INDEX fname_lname_age;

Dropping (removing) a primary key via ALTER TABLE:
ALTER TABLE tablename DROP PRIMARY KEY; Example: ALTER TABLE people DROP PRIMARY KEY;

When MySQL uses indexes

  • Using >, >=, =, <, <=, IF NULL and BETWEEN on a key.
    • SELECT * FROM table_name WHERE key_part1=1 and key_part2 > 5;
    • SELECT * FROM table_name WHERE key_part1 IS NULL;
  • When you use a LIKE that doesn’t start with a wildcard.
    • SELECT * FROM table_name WHERE key_part1 LIKE 'jani%'
  • Retrieving rows from other tables when performing joins.
    • SELECT * from t1,t2 where t1.col=t2.key_part
  • Find the MAX() or MIN() value for a specific index.
    • SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10
  • ORDER BY or GROUP BY on a prefix of a key.
    • SELECT * FROM foo ORDER BY key_part1,key_part2,key_part3
  • When all columns used in the query are part of one key.
    • SELECT key_part3 FROM table_name WHERE key_part1=1

When MySQL doesn’t use an index

  • Indexes are NOT used if MySQL can calculate that it will probably be faster to scan the whole table. For example if key_part1 is evenly distributed between 1 and 100, it’s not good to use an index in the following query:
    • SELECT * FROM table_name where key_part1 > 1 and key_part1 < 90
  • If you are using HEAP tables and you don’t search on all key parts with =
  • When you use ORDER BY on a HEAP table
  • If you are not using the first key part
    • SELECT * FROM table_name WHERE key_part2=1
  • If you are using LIKE that starts with a wildcard
    • SELECT * FROM table_name WHERE key_part1 LIKE '%jani%'
  • When you search on one index and do an ORDER BY on another
    • SELECT * from table_name WHERE key_part1 = # ORDER BY key2

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>