How MySQL Uses Indexes

How MySQL Uses Indexes

Types of Indexes

There are several types of indexes to choose from in MySQL:

  1. “Normal” Indexes“Normal” indexes are the most basic indexes, and have no restraints such as uniqueness. These can be added by creating an index (CREATE INDEX name_of_index ON tablename (columns_to_index);), altering the table (ALTER TABLE tablename ADD INDEX [name_of_index] (columns_to_index);), or when creating the table (CREATE TABLE tablename ( [...], INDEX [name_of_index] (columns_to_index) );).
  2. Unique Indexes – Unique indexes are the same as “Normal” indexes with one difference: all values of the indexed column(s) must only occur once. These can be added by creating an index (CREATE UNIQUE INDEX name_of_index ON tablename (columns_to_index);), altering the table (ALTER TABLE tablename ADD UNIQUE [name_of_index] (columns_to_index);) or when creating the table (CREATE TABLE tablename ( [...], UNIQUE [name_of_index] (columns_to_index) );).
  3. Primary keys – Primary keys are unique indexes that must be named “PRIMARY”. If you have used AUTO_INCREMENT columns, you’re probably familiar with these. These indexes are almost always added when creating the table (CREATE TABLE tablename ( [...], PRIMARY KEY (columns_to_index) );), but may also be added by altering the table (ALTER TABLE tablename ADD PRIMARY KEY (columns_to_index);). Note that you may only have one primary key per table.
  4. Full-text indexesFull-text indexes are used by MySQL in full-text searches. Because full-text search is so new and would add unnecessary complexity to this article, I won’t explain it here. Should you want more information?

How MySQL Uses Indexes

Indexes are used to find rows with specific column values fast. Without an index MySQL has to start with the first record and then read through the whole table to find the relevant rows. The bigger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly get a position to seek to in the middle of the data file without having to look at all the data. If a table has 1000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1000 rows, it is faster to read sequentially, because that minimizes disk seeks.

All MySQL indexes (PRIMARY KEY, UNIQUE, and INDEX) are stored in B-trees. Strings are automatically prefix- and end-space compressed.

Indexes are used in the following ways:

  • To quickly find the rows that match a WHERE clause.
  • To retrieve rows from other tables when performing joins.
  • To find the MAX() or MIN() value for a specific indexed column. This is optimised by a preprocessor that checks if you are using WHERE key_part_# = constant on all key parts < N. In this case MySQL will do a single key lookup and replace the MIN() expression with a constant. If all expressions are replaced with constants, the query will return at once:
·                SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10
  • To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part_1,key_part_2 ). The key is read in reverse order if all key parts are followed by DESC.
  • In some cases a query can be optimised to retrieve values without consulting the datafile. If all used columns for some table are numeric and form a leftmost prefix for some key, the values may be retrieved from the index tree for greater speed:
·                SELECT key_part3 FROM table_name WHERE key_part1=1

Suppose you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimiser to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL can’t use a partial index if the columns don’t form a leftmost prefix of the index. Suppose you have the SELECT statements shown here:

mysql> SELECT * FROM tbl_name WHERE col1=val1;
mysql> SELECT * FROM tbl_name WHERE col2=val2;
mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first of the preceding queries uses the index. The second and third queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

MySQL also uses indexes for LIKE comparisons if the argument to LIKE is a constant string that doesn’t start with a wildcard character. For example, the following SELECT statements use indexes:

mysql> SELECT * FROM tbl_name WHERE key_col LIKE "Patrick%";
mysql> SELECT * FROM tbl_name WHERE key_col LIKE "Pat%_ck%";

In the first statement, only rows with "Patrick" <= key_col < "Patricl" are considered. In the second statement, only rows with "Pat" <= key_col < "Pau" are considered.

The following SELECT statements will not use indexes:

mysql> SELECT * FROM tbl_name WHERE key_col LIKE "%Patrick%";
mysql> SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

MySQL 4.0 does another optimization on LIKE. If you use ... LIKE "%string%" and string is longer than 3 characters, MySQL will use the Turbo Boyer-Moore algorithm to initialise the pattern for the string and then use this pattern to perform the search quicker.

Searching using column_name IS NULL will use indexes if column_name is an index.

MySQL normally uses the index that finds the smallest number of rows. An index is used for columns that you compare with the following operators: =, >, >=, <, <=, BETWEEN, or a LIKE with a pattern that begins with a non-wildcard prefix like 'something%'.

Any index that doesn’t span all AND levels in the WHERE clause is not used to optimise the query. In other words: To be able to use an index, a prefix of the index must be used in every AND group.

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3
... WHERE index=1 OR A=10 AND index=2      /* index = 1 OR index = 2 */
... WHERE index_part1='hello' AND index_part_3=5
          /* optimised like "index_part1='hello'" */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
          /* Can use index on index1 but not on index2 or index 3 */

These WHERE clauses do not use indexes:

... WHERE index_part2=1 AND index_part3=2  /* index_part_1 is not used */
... WHERE index=1 OR A=10                  /* Index is not used in
                                                        both AND parts */
... WHERE index_part1=1 OR index_part2=10  /* No index spans all rows  */

Note that sometime MySQL will not use an index, even if one is available. One instance of this is when use of the index would require MySQL to access more than 30% of the rows in the table. (In this case a table scan is probably much faster, as it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, as it can much more quickly find the few rows to return in the result.

Analyzing Index Efficiency

You have some ideas on which indexes to use, but you’re not sure which is the most efficient. Well, you’re in luck, because MySQL has a built-in SQL statement to do this, known as EXPLAIN. The general syntax for this is EXPLAIN select statement;Here’s an example:

EXPLAIN SELECT peopleid FROM people WHERE firstname='Mike'
AND lastname='Sullivan' AND age='17';

This will return a somewhat cryptic result that will look usually look similar to this:

[Note: table split across two rows for readability]

+--------+------+-----------------+-----------------+
| table  | type | possible_keys   | key             |
+--------+------+-----------------+-----------------+  ...
| people | ref  | fname_lname_age | fname_lname_age |
+--------+------+-----------------+-----------------+

+---------+-------------------+------+------------+
| key_len | ref               | rows | Extra      |
... +---------+-------------------+------+------------+
| 102     | const,const,const | 1    | Where used |
+---------+-------------------+------+------------+

Let’s break this down column by column.

  • table – This is the name of the table. This will become important when you have large joins, as each table will get a row.
  • type – The type of the join. Here’s what the MySQL documentation has to say about the ref type: All rows with matching index values will be read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key, or if the key is not UNIQUE or a PRIMARY KEY (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this join type is good. In this case, since our index isn’t UNIQUE, this is the best join type we can get. In summary, if the join type is listed as “ALL” and you aren’t trying to select most of the rows in the table, then MySQL is doing a full table scan which is usually very bad. You can fix this by adding more indexes. If you want more information, the MySQL manual covers this value with much more depth.
  • possible_keys – The name of the indexes that could possibly be used. This is where nicknaming your index helps. If you leave the name field blank, the name defaults to the name of the first column in the index (in this case, it would be “firstname”), which isn’t very descriptive.
  • key – This shows the name of the index that MySQL actually uses. If this is empty (or NULL), then MySQL isn’t using an index.
  • key_len – The length, in bytes, of the parts of the index being used. In this case, it’s 102 because firstname takes 50 bytes, lastname takes 50, and age takes 2. If MySQL were only using the firstname part of the index, this would be 50.
  • ref – This shows the name of the columns (or the word “const”) that MySQL will use to select the rows. Here, MySQL references three constants to find the rows.
  • rows – The number of rows MySQL thinks it has to go through before knowing it has the correct rows. Obviously, one is the best you can get.
  • Extra – There are many different options here, most of which will have an adverse effect on the query. In this case, MySQL is simply reminding us that it used the WHERE clause to limit the results.
Disadvantages of Indexing

So far, I’ve only discussed why indexes are great. However, they do have several disadvantages.

First, they take up disk space. Usually this isn’t significant, but if you decided to index every column in every possible combination, your index file would grow much more quickly than the data file. If you have a large table, the index file could reach your operating system’s maximum file size.

Second, they slow down the speed of writing queries, such as DELETE, UPDATE, and INSERT. This is because not only does MySQL have to write to the data file, it has to write everything to the index file as well. However, you may be able to write your queries in such a way that the performance degradation is not very noticeable.

Conclusion

Indexes are one of the keys to speed in large databases. No matter how simple your table, a 500,000-row table scan will never be fast. If you have a site with a 500,000-row table, you should really spend time analyzing possible indexes and possibly consider rewriting queries to optimize your application.

Query Reference

Adding a “normal” index via CREATE INDEX:
CREATE INDEX [index_name] ON tablename (index_columns);

Example: CREATE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a unique index via CREATE INDEX:
CREATE UNIQUE INDEX [index_name] ON tablename (index_columns);

Example: CREATE UNIQUE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a “normal” index via ALTER TABLE:
ALTER TABLE tablename ADD INDEX [index_name] (index_columns);

Example: ALTER TABLE people ADD INDEX fname_lname_age (firstname,lastname,age);

Adding a unique index via ALTER TABLE:
ALTER TABLE tablename ADD UNIQUE [index_name] (index_columns);

Example: ALTER TABLE people ADD UNIQUE fname_lname_age (firstname,lastname,age);

Adding a primary key via ALTER TABLE:
ALTER TABLE tablename ADD PRIMARY KEY (index_columns);

Example: ALTER TABLE people ADD PRIMARY KEY (peopleid);

Adding a “normal” index via CREATE TABLE:
CREATE TABLE tablename (
rest of columns,
INDEX [index_name] (index_columns)
[other indexes]
);

Example:
CREATE TABLE people (
peopleid SMALLINT UNSIGNED NOT NULL,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
INDEX fname_lname_age (firstname,lastname,age)
);

Adding a unique index via CREATE TABLE:
CREATE TABLE tablename (
rest of columns,
UNIQUE [index_name] (index_columns)
[other indexes]
);

Example:
CREATE TABLE people (
peopleid SMALLINT UNSIGNED NOT NULL,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
UNIQUE fname_lname_age (firstname,lastname,age)
);

Adding a primary key via CREATE TABLE:
CREATE TABLE tablename (
rest of columns,
INDEX [index_name] (index_columns)
[other indexes]
);

Example:
CREATE TABLE people (
peopleid SMALLINT NOT NULL AUTO_INCREMENT,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
PRIMARY KEY (peopleid)
);

Dropping (removing) a “normal” or unique index via ALTER TABLE:
ALTER TABLE tablename DROP INDEX index_name; Example: ALTER TABLE people DROP INDEX fname_lname_age;

Dropping (removing) a primary key via ALTER TABLE:
ALTER TABLE tablename DROP PRIMARY KEY; Example: ALTER TABLE people DROP PRIMARY KEY;

When MySQL uses indexes

  • Using >, >=, =, <, <=, IF NULL and BETWEEN on a key.
    • SELECT * FROM table_name WHERE key_part1=1 and key_part2 > 5;
    • SELECT * FROM table_name WHERE key_part1 IS NULL;
  • When you use a LIKE that doesn’t start with a wildcard.
    • SELECT * FROM table_name WHERE key_part1 LIKE 'jani%'
  • Retrieving rows from other tables when performing joins.
    • SELECT * from t1,t2 where t1.col=t2.key_part
  • Find the MAX() or MIN() value for a specific index.
    • SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10
  • ORDER BY or GROUP BY on a prefix of a key.
    • SELECT * FROM foo ORDER BY key_part1,key_part2,key_part3
  • When all columns used in the query are part of one key.
    • SELECT key_part3 FROM table_name WHERE key_part1=1

When MySQL doesn’t use an index

  • Indexes are NOT used if MySQL can calculate that it will probably be faster to scan the whole table. For example if key_part1 is evenly distributed between 1 and 100, it’s not good to use an index in the following query:
    • SELECT * FROM table_name where key_part1 > 1 and key_part1 < 90
  • If you are using HEAP tables and you don’t search on all key parts with =
  • When you use ORDER BY on a HEAP table
  • If you are not using the first key part
    • SELECT * FROM table_name WHERE key_part2=1
  • If you are using LIKE that starts with a wildcard
    • SELECT * FROM table_name WHERE key_part1 LIKE '%jani%'
  • When you search on one index and do an ORDER BY on another
    • SELECT * from table_name WHERE key_part1 = # ORDER BY key2

Full–Text Index For MyISAM in MySQL

Full–Text Index For MyISAM

  • A query expansion search is a modification of a natural language search. The search string is used to perform a natural language search. Then words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The WITH QUERY EXPANSION modifier specifies a query expansion search.

  • When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first. Relevance values are non-negative floating point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word.

MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier:

Boolean full-text searches have these characteristics:

They do not use the 50% threshold.

They do not automatically sort rows in order of decreasing relevance. You can see this from the preceding query result: The row with the highest relevance is the one that contains “MySQL” twice, but it is listed last, not first.

They can work even without a FULLTEXT index, although a search executed in this fashion would be quite slow.

The minimum and maximum word length full-text parameters apply.

The stopword list applies.

The boolean full-text search capability supports the following operators:

+

A leading plus sign indicates that this word must be present in each row that is returned.

-

A leading minus sign indicates that this word must not be present in any of the rows that are returned.

Note: The – operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by – returns an empty result. It does not return “all rows except those containing any of the excluded terms.”

(no operator)

By default (when neither + nor – is specified) the word is optional, but the rows that contain it are rated higher. This mimics the behavior of MATCH() … AGAINST() without the IN BOOLEAN MODE modifier.

> <

These two operators are used to change a word’s contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it. See the example following this list.

( )

Parentheses group words into subexpressions. Parenthesized groups can be nested.

~

A leading tilde acts as a negation operator, causing the word’s contribution to the row’s relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the – operator.

*

The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.

A phrase that is enclosed within double quote (‘”’) characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words, performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include non-word characters in the phrase. As of MySQL 5.0.3, non-word characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, “test phrase” matches “test, phrase” in MySQL 5.0.3, but not before.

If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.

The following examples demonstrate some search strings that use boolean full-text operators:

‘apple banana’

Find rows that contain at least one of the two words.

‘+apple +juice’

Find rows that contain both words.

‘+apple macintosh’

Find rows that contain the word “apple”, but rank rows higher if they also contain “macintosh”.

‘+apple -macintosh’

Find rows that contain the word “apple” but not “macintosh”.

‘+apple ~macintosh’

Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for ‘+apple -macintosh’, for which the presence of “macintosh” causes the row not to be returned at all.

‘+apple +(>turnover <strudel)’

Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”.

‘apple*’

Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.

‘”some words”‘

Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the ‘”’ characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotes that enclose the search string itself.

Full-Text Searches with Query Expansion

Full-text search supports query expansion (and in particular, its variant “blind query expansion”). This is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine lacks. For example, a user searching for “database” may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS” all are phrases that should match “databases” and should be returned, too. This is implied knowledge.

Blind query expansion (also known as automatic relevance feedback) is enabled by adding WITH QUERY EXPANSION following the search phrase. It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”. The following example shows this difference:

mysql> SELECT * FROM articles
    -> WHERE MATCH (title,body) AGAINST ('database');
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)
 
mysql> SELECT * FROM articles
    -> WHERE MATCH (title,body)
    -> AGAINST ('database' WITH QUERY EXPANSION);
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
|  3 | Optimizing MySQL  | In this tutorial we will show ...        |
+----+-------------------+------------------------------------------+
3 rows in set (0.00 sec)

Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell “Maigret”. A search for “Megre and the reluctant witnesses” finds only “Maigret and the Reluctant Witnesses” without query expansion. A search with query expansion finds all books with the word “Maigret” on the second pass.

Note: Because blind query expansion tends to increase noise significantly by returning non-relevant documents, it is meaningful to use only when a search phrase is rather short.

Fine-Tuning MySQL Full-Text Search

MySQL’s full-text search capability has few user-tunable parameters. You can exert more control over full-text searching behavior if you have a MySQL source distribution because some changes require source code modifications.

Note that full-text search is carefully tuned for the most effectiveness. Modifying the default behavior in most cases can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing.

Most full-text variables described in this section must be set at server startup time. A server restart is required to change them; they cannot be modified while the server is running.

Some variable changes require that you rebuild the FULLTEXT indexes in your tables. Instructions for doing this are given at the end of this section.

· The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file:

·                [mysqld]
·                ft_min_word_len=3

Then you must restart the server and rebuild your FULLTEXT indexes. Note particularly the remarks regarding myisamchk in the instructions following this list.

· To override the default stopword list, set the ft_stopword_file system variable.The variable value should be the pathname of the file containing the stopword list, or the empty string to disable stopword filtering. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.

The stopword list is free-form. That is, you may use any non-alphanumeric character such as newline, space, or comma to separate stopwords. Exceptions are the underscore character (‘_’) and a single apostrophe (‘'’) which are treated as part of a word. The character set of the stopword list is the server’s default character set.

· The 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in myisam/ftdefs.h:

·                #define GWS_IN_USE GWS_PROB

Change that line to this:

#define GWS_IN_USE GWS_FREQ

Then recompile MySQL. There is no need to rebuild the indexes in this case. Note: By making this change, you severely decrease MySQL’s ability to provide adequate relevance values for the MATCH() function. If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold.

· To change the operators used for boolean full-text searches, set the ft_boolean_syntax system variable. This variable can be changed while the server is running, but you must have the SUPER privilege to do so. No rebuilding of indexes is necessary in this case.

· If you want to change the set of characters that are considered word characters, you can do so in two ways. Suppose that you want to treat the hyphen character (‘-’) as a word character. Use either of these methods:

o Modify the MySQL source: In myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL.

o Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes.

After making the modification, you must rebuild the indexes for each table that contains any FULLTEXT indexes.

If you modify full-text variables that affect indexing (ft_min_word_len, ft_max_word_len, or ft_stopword_file), or if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server. To rebuild the indexes in this case, it is sufficient to do a QUICK repair operation:

mysql> REPAIR TABLE tbl_name QUICK;

Each table that contains any FULLTEXT index must be repaired as just shown. Otherwise, queries for the table may yield incorrect results, and modifications to the table will cause the server to see the table as corrupt and in need of repair.

Note that if you use myisamchk to perform an operation that modifies table indexes (such as repair or analyze), the FULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum word length, and stopword file unless you specify otherwise. This can result in queries failing.

The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files. To avoid the problem if you have modified the minimum or maximum word length or stopword file values used by the server, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values to myisamchk that you use for mysqld. For example, if you have set the minimum word length to 3, you can repair a table with myisamchk like this:

shell> myisamchk --recover --ft_min_word_len=3 tbl_name.MYI

To ensure that myisamchk and the server use the same values for full-text parameters, place each one in both the [mysqld] and [myisamchk] sections of an option file:

[mysqld]
ft_min_word_len=3
 
[myisamchk]
ft_min_word_len=3

An alternative to using myisamchk is to use the REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE, or ALTER TABLE statements. These statements are performed by the server, which knows the proper full-text parameter values to use.

InnoDB Performance Tuning Tips

InnoDB Performance Tuning Tips

• In InnoDB, having a long PRIMARY KEY wastes a lot of disk space because its value must be stored with every secondary index record. Create an AUTO_INCREMENT column as the primary key if your primary key is long.

• If the UNIX top tool or the Windows Task Manager shows that the CPU usage percentage with your workload is less than 70%, your workload is probably disk-bound. Maybe you are making too many transaction commits, or the buffer pool is too small. Making the buffer pool bigger can help, but do not set it equal to more than 80% of physical memory.

• Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second, which constrains the number of commits to the same 167th of a second if the disk does not “fool” the operating system.

• If you can afford the loss of some of the latest committed transactions if a crash occurs, you can set the innodb_flush_log_at_trx_commit parameter to 0. InnoDB tries to flush the

log once per second anyway, although the flush is not guaranteed.

• Make your log files big, even as big as the buffer pool. When InnoDB has written the log files full, it has to write the modified contents of the buffer pool to disk in a checkpoint. Small log files cause many unnecessary disk writes. The drawback of big log files is that the recovery time is longer.

• Make the log buffer quite large as well (on the order of 8MB).

• Use the VARCHAR data type instead of CHAR if you are storing variable-length strings or if the column may contain many NULL values. A CHAR(N) column always takes N characters to store data, even if the string is shorter or its value is NULL. Smaller tables fit better in the buffer pool and reduce disk I/O. When using row_format=compact (the default InnoDB record format in MySQL 5.0) and variable-length character sets, such as utf8 or sjis, CHAR(N) will occupy a variable amount of space, at least N bytes.

• In some versions of GNU/Linux and Unix, flushing files to disk with the Unix fsync() call (which InnoDB uses by default) and other similar methods is surprisingly slow. If you are dissatisfied with database write performance, you might try setting the innodb_flush_method parameter to O_DSYNC. Although O_DSYNC seems to be slower on most systems, yours might not be one of them.

• When using the InnoDB storage engine on Solaris 10 for x86_64 architecture (AMD Opteron), it is important to mount any filesystems used for storing InnoDB-related files using the forcedirectio option. (The default on Solaris 10/x86_64 is not to use this option.) Failure to use forcedirectio causes a serious degradation of InnoDBs speed and performance on this platform. When using the InnoDB storage engine with a large innodb_buffer_pool_size value on any release of Solaris 2.6 and up and any platform (sparc/x86/x64/amd64), a significant performance gain can be achieved by placing InnoDB data files and log files on raw devices or on a separate direct I/O UFS filesystem (using mount option forcedirectio; see mount_ufs(1M)). Users of the Veritas filesystem VxFS should use the mount option convosync= direct. Other MySQL data files, such as those for MyISAM tables, should not be placed on a direct I/O filesystem. Executables or libraries must not be placed on a direct I/O filesystem.

• When importing data into InnoDB, make sure that MySQL does not have autocommit mode enabled because that requires a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET AUTOCOMMIT and COMMIT statements: SET AUTOCOMMIT=0;

… SQL import statements …

COMMIT;

If you use the mysqldump option –opt, you get dump files that are fast to import into an InnoDB table, even without wrapping them with the SET AUTOCOMMIT and COMMIT statements.

• Beware of big rollbacks of mass inserts: InnoDB uses the insert buffer to save disk I/O in inserts, but no such mechanism is used in a corresponding rollback. A disk-bound rollback cantake 30 times as long to perform as the corresponding insert. Killing the database process does not help because the rollback starts again on server startup. The only way to get rid of a runaway rollback is to increase the buffer pool so that the rollback becomes CPU-bound and runs fast, or to use a special procedure.

• Beware also of other big disk-bound operations. Use DROP TABLE and CREATE TABLE to empty a table, not DELETE FROM tbl_name.

• Use the multiple-row INSERT syntax to reduce communication overhead between the client and the server if you need to insert many rows:

INSERT INTO yourtable VALUES (1,2), (5,5), …;

This tip is valid for inserts into any table, not just InnoDB tables.

• If you have UNIQUE constraints on secondary keys, you can speed up table imports by temporarily turning off the uniqueness checks during the import session:

SET UNIQUE_CHECKS=0;

… import operation …

SET UNIQUE_CHECKS=1;

For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. Be certain that the data contains no duplicate keys. UNIQUE_CHECKS allows but does not require storage engines to ignore duplicate keys.

• If you have FOREIGN KEY constraints in your tables, you can speed up table imports by turning the foreign key checks off for the duration of the import session:

SET FOREIGN_KEY_CHECKS=0;

… import operation …

SET FOREIGN_KEY_CHECKS=1;

For big tables, this can save a lot of disk I/O.

• If you often have recurring queries for tables that are not updated frequently, use the query cache:

[mysqld]

query_cache_type = ON

query_cache_size = 10M

• Unlike MyISAM, InnoDB does not store an index cardinality value in its tables. Instead, InnoDB computes a cardinality for a table the first time it accesses it after startup. With a large number of tables, this might take significant time. It is the initial table open operation that is important, so to “warm up” a table for later use, you might want to use it immediately after start up by issuing a statement such as SELECT 1 FROM tbl_name LIMIT 1.

Reset MySQL root password

Reset MySQL root password

Step # 1: Stop mysql service

# /etc/init.d/mysql stop
Output:

Stopping MySQL database server: mysqld.

Step # 2: Start to MySQL server w/o password:

# mysqld_safe --skip-grant-tables &
Output:

[1] 5988
Starting mysqld daemon with databases from /var/lib/mysql
mysqld_safe[6025]: started

Step # 3: Connect to mysql server using mysql client:

# mysql -u root
Output:

Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 1 to server version: 4.1.15-Debian_1-log
 
Type 'help;' or 'h' for help. Type 'c' to clear the buffer.
 
mysql>

Step # 4: Setup new MySQL root user password

mysql> use mysql;
mysql> update user set password=PASSWORD("NEW-ROOT-PASSWORD") where User='root';
mysql> flush privileges;
mysql> quit

Step # 5: Stop MySQL Server:

# /etc/init.d/mysql stop
Output:

Stopping MySQL database server: mysqld
STOPPING server from pid file /var/run/mysqld/mysqld.pid
mysqld_safe[6186]: ended
 
[1]+  Done                    mysqld_safe --skip-grant-tables

Step # 6: Start MySQL server and test it

# /etc/init.d/mysql start
# mysql -u root -p

How to Monitor MySQL’s performance

How to Monitor MySQL’s performance

Here are some ideas, how you can monitor the database performance of your MySQL installation. Monitoring is always an iterative and continuous process. You need to learn what patterns are OK for your database and what are the signs of slight problems or even dangerous situations.

Below are the main items you can use to monitor your system:
- mysqladmin extended (absolute values)
- mysqladmin extended -i10 -r (relative values)
- mysqladmin processlist
- mysql -e “show innodb status”
- OS data. vmstat/iostat
- MySQL error log
- InnoDB tablespace info.

1) mysqladmin extended (absolute values)

The values making most sense to monitor are:
* Slave_running: If the system is a slave replication server, this is an indication of the slave’s health.
* Threads_connected: The number of clients currrently connected. This should be less than some preset value (like 200), but you can also monitor that it is larger than some value to ensure that clients are active.
* Threads_running: If the database is overloaded you’ll get an increased number of queries running. That also should be less than some preset value (20?). It is OK to have values over the limit for very short times. Then you can monitor some other values, when the Threads_running was more than the preset value and when it did not fall back in 5 seconds.

2) mysqladmin extended (counters)
The idea is that you store the performance counter value and compute the difference with the new values. The interval between the recordings should be more than 10 seconds. The following values are good candidates for checking:

* Aborted_clients: The number of clients that were aborted (because they did not properly close the connection to the MySQL server). For some applications this can be OK, but for some other applications you might want to track the value, as aborted connects may indicate some sort of application failure.
* Questions: Number of queries you get per second. Also, it’s total queries, not number per second. To get number per second, you must divide Questions by Uptime.
* Handler_*: If you want to monitor low-level database load, these are good values to track. If the value of Handler_read_rnd_next is abnormal relative to the value that you normally would expect, it may indicate some optimization or index problems. Handler_rollback will show the number of queries that have been rolled back. You might want to wish to investigate them.
* Opened_tables: Number of table cache misses. If the value is large, you probably need to increase table_cache. Typically you would want this to be less than 1 or 2 opened tables per second.
* Select_full_join: Joins performed without keys. This should be zero. This is a good way to catch development errors, as just a few such queries can degrease the system’s performance.
* Select_scan: Number of queries that performed a full table scan. In some cases these are OK but their ratio to all queries should be constant. if you have the value growing it can be a problem with the optimizer, lack of indexes or some other problem
* Slow_queries: Number of queries longer than –long-query-time or that are not using indexes. These should be a small fraction of all queries. If it grows, the system will have performance problems.
* Threads_created: This should be low. Higher values may mean that you need to increase the value of thread_cache or you have the amount of connections increasing, which also indicates a potential problem.

3) mysqladmin processlist or “SHOW FULL PROCESSLIST” command
You can get the number of threads connected and running by using other statistics, but this is a good way to check how long queries that are running take. If there are some very long-running queries (e.g. due to being badly formulated) the admin should be informed. You might also want to check how many queries are in “Locked” state – these are not counted as running but are inactive, i.e. a user is waiting on the database to respond.

4) “SHOW INNODB STATUS”
This statement produces a great deal of information, from which you should extract the parts in which you are interested. The first thing you need to check is: “Per second averages calculated from the last xx seconds”. InnoDB rounds stats each minute.

* Pending normal aio reads: These are InnoDB IO request queue sizes. If they are bigger than 10-20 you might have some bottleneck.
* reads/s, avg bytes/read, writes/s, fsyncs/s: These are IO statistics. Large values for reads/writes means the IO subsystem is being loaded. Proper values for these depend on your system configuration.
* Buffer pool hit rate: The hit rate also depends a lot on your application. Check your hit rate, when there are problems.
* inserts/s, updates/s, deletes/s, reads/s: These are low level row operations that InnoDB does. You might use these to check your load if it is in expected range.

4) OS Data. Good tools to see the system status are vmstat/iostat/mpstat.

To see what kind of information these tools can provide for you,
read their man pages.

5) MySQL error log – Nothing should written to the error log, after the server has completed its initialization sequence, so everything appearing in the log should be brought to admin’s attention immediately.

6) InnoDB tablespace info.
With InnoDB the only danger is that the tablespace gets full – the logs can’t get full. Best way to check this is to do: show table status;

You can use any InnoDB table for monitoring the InnoDB table space free space.

shell> mysqladmin variables

MySQL uses algorithms that are very scalable, so you can usually run with very little memory. If you, however, give MySQL more memory, you will normally also get better performance.

When tuning a MySQL server, the two most important variables to use are key_buffer_size and table_cache. You should first feel confident that you have these right before trying to change any of the other variables.

If you have much memory (>=256M) and many tables and want maximum performance with a moderate number of clients, you should use something like this:

shell> safe_mysqld -O key_buffer=64M -O table_cache=256 \
-O sort_buffer=4M -O read_buffer_size=1M &

If you have only 128M and only a few tables, but you still do a lot of sorting, you can use something like:

shell> safe_mysqld -O key_buffer=16M -O sort_buffer=1M

If you have little memory and lots of connections, use something like this:

shell> safe_mysqld -O key_buffer=512k -O sort_buffer=100k \
-O read_buffer_size=100k &
or even:

shell> safe_mysqld -O key_buffer=512k -O sort_buffer=16k \
-O table_cache=32 -O read_buffer_size=8k -O net_buffer_length=1K &

If you are doing a GROUP BY or ORDER BY on files that are much bigger than your available memory you should increase the value of record_rnd_buffer to speed up the reading of rows after the sorting is done.

When you have installed MySQL, the `support-files’ directory will contain some different `my.cnf’ example files, `my-huge.cnf’, `my-large.cnf’, `my-medium.cnf’, and `my-small.cnf’, you can use as a base to optimise your system.

If there are very many connections, “swapping problems” may occur unless mysqld has been configured to use very little memory for each connection. mysqld performs better if you have enough memory for all connections, of course.

Note that if you change an option to mysqld, it remains in effect only for that instance of the server.

To see the effects of a parameter change, do something like this:

shell> mysqld -O key_buffer=32m –help

Make sure that the –help option is last; otherwise, the effect of any options listed after it on the command-line will not be reflected in the output.

These parameters can be seen here

We are going to see one example now this is the best tuned my.cnf file

[mysqld]
back_log = 75
skip-innodb
max_connections = 500
key_buffer = 384M
myisam_sort_buffer_size = 64M
join_buffer_size = 1M
read_buffer_size = 1M
sort_buffer_size = 2M
table_cache = 1800
thread_cache_size = 384
wait_timeout = 7200
connect_timeout = 10
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 64M
max_connect_errors = 1000
read_rnd_buffer_size = 524288
bulk_insert_buffer_size = 8M
query_cache_limit = 4M
query_cache_size =128M
query_cache_type = 1
query_prealloc_size = 65536
query_alloc_block_size = 131072
default-storage-engine = MyISAM

[mysqld_safe]
nice = -5
open_files_limit = 8192

[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer = 64M
sort_buffer = 64M
read_buffer = 16M
write_buffer = 16M

Let’s just look at the important bits.

max_connections = 500 – You need to use mysqlreport tool(http://hackmysql.com/mysqlreport) to check how many current connections you have, and under very heavy load (2000 simultaneous users) You may rarely hit 400 concurrent connections to the database. This is because most connections only last for a few milliseconds.

key_buffer = 384M – When tuning a MySQL server, key_buffer_size is very important. This number works well for me and with the mysqlreport script I rarely use 50% of the available memory.

table_cache = 1800 – After key_buffer the next most important variable is your table cache. Again this is set for vBulletin so you may be able to significantly reduce this value depending on the number of tables in your database.

wait_timeout = 7200 – This variable determines the timeout in seconds before mysql will dump a connection. If set to low you will likely receive mySQL server has gone away errors in your log, which in vBulletin’s case is quite common.

max_allowed_packet = 16M – Again if set to low (the default is 8M) users will likely experience errors. 16M has always worked fine for my production environments.

One more important tool we have already menctioned mysqlreport this is the best tool available for free you can check your mysql database server performance.

Optimize Mysql Database response time

The most important part for getting a system fast is of course the basic design. You also need to know what kinds of things your system will be doing, and what your bottlenecks are.

The most common bottlenecks are:

Disk seeks. It takes time for the disk to find a piece of data. With modern disks in 1999, the mean time for this is usually lower than 10ms, so we can in theory do about 100 seeks a second. This time improves slowly with new disks and is very hard to optimise for a single table. The way to optimise this is to spread the data on more than one disk.

Disk reading/writing. When the disk is at the correct position we need to read the data. With modern disks in 1999, one disk delivers something like 10-20 MB. This is easier to optimise than seeks because you can read in parallel from multiple disks.

CPU cycles. When we have the data in main memory (or if it already were there) we need to process it to get to our result. Having small tables compared to the memory is the most common limiting factor. But then, with small tables speed is usually not the problem.

Memory bandwidth. When the CPU needs more data than can fit in the CPU cache the main memory bandwidth becomes a bottleneck. This is an uncommon bottleneck for most systems, but one should be aware of it.

Some of the available options we will see now

Disable DNS Hostname Lookup

open you /etc/mysql/my.cnf file and use –skip-name-resolve option file looks like below.

[mysqld]
…..
……
skip-name-resolve

When this option is activated, you can only use IP numbers in the MySQL Grant table.

How MySQL uses DNS

When a new thread connects to mysqld, mysqld will spawn a new thread to handle the request. This thread will first check if the hostname is in the hostname cache. If not the thread will call gethostbyaddr_r() and gethostbyname_r() to resolve the hostname.

If the operating system doesn’t support the above thread-safe calls, the thread will lock a mutex and call gethostbyaddr() and gethostbyname() instead. Note that in this case no other thread can resolve other hostnames that is not in the hostname cache until the first thread is ready.

You can disable DNS host lookup by starting mysqld with –skip-name-resolve. In this case you can however only use IP names in the MySQL privilege tables.

If you have a very slow DNS and many hosts, you can get more performance by either disabling DNS lookop with –skip-name-resolve or by increasing the HOST_CACHE_SIZE define (default: 128) and recompile mysqld.

You can disable the hostname cache with –skip-host-cache. You can clear the hostname cache with FLUSH HOSTS or mysqladmin flush-hosts.

If you don’t want to allow connections over TCP/IP, you can do this by starting mysqld with –skip-networking.

Activate Query Cache

mysql> SHOW STATUS LIKE ‘Qcache%’;

If you find any problems with  query cache memory was left. It was neccessary to increase the query cache size.

To get an overview of your query_cache variables state, use the following syntax:

mysql> SHOW VARIABLES LIKE ‘%query_cache%’;

You need to have the query cache enabled in the first place (have_query_cache | YES) and make sure that query_cache_type is set to ON. This is usually activated by default on most linux distribution.

Now, you can increase the query cache size (let say you want 50M) using:

mysql> SET GLOBAL query_cache_size = 52428800;

If you want this setting to be kept when restarting mysql, add:

[mysqld]


query_cache_size = 52428800;
query_cache_type = 1