MySQL Server Performance Tips

Specific Query Performance Tips (see also database design tips for tips on indexes):

1. Use EXPLAIN to profile the query execution plan
2. Use Slow Query Log (always have it on!)
3. Don’t use DISTINCT when you have or could use GROUP BY
4. Insert performance
1. Batch INSERT and REPLACE
2. Use LOAD DATA instead of INSERT
5. LIMIT m,n may not be as fast as it sounds
6. Don’t use ORDER BY RAND() if you have > ~2K records
7. Use SQL_NO_CACHE when you are SELECTing frequently updated data or large sets of data
8. avoid wildcards at the start of LIKE queries
9. avoid correlated subqueries and in select and where clause (try to avoid in)
10. no calculated comparisons — isolate indexed columns
11. ORDER BY and LIMIT work best with equalities and covered indexes
12. separate text/blobs from metadata, don’t put text/blobs in results if you don’t need them
13. derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs w/out sorting them. (self-join can speed up a query if 1st part finds the IDs and use it to fetch the rest)
14. ALTER TABLE…ORDER BY can take data sorted chronologically and re-order it by a different field — this can make queries on that field run faster (maybe this goes in indexing?)
15. Know when to split a complex query and join smaller ones
16. Delete small amounts at a time if you can
17. make similar queries consistent so cache is used
18. Have good SQL query standards
19. Don’t use deprecated features
20. Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT), after 5.0 the index_merge should pick stuff up.
21. Don’t use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT FOUND_ROWS()
22. Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT
23. use groupwise maximum instead of subqueries

Scaling Performance Tips:

1. Use benchmarking
2. isolate workloads don’t let administrative work interfere with customer performance. (ie backups)
3. Debugging sucks, testing rocks!
4. as your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.

Network Performance Tips:

1. Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
2. Don’t use SELECT *
3. Be wary of lots of small quick queries if a longer query can be more efficient
2. use multi_query if appropriate to reduce round-trips

OS Performance Tips:

1. Use proper data partitions
1. For Cluster. Start thinking about Cluster *before* you need them
2. Keep the database host as clean as possible. Do you really need a windowing system on that server?
3. Utilize the strengths of the OS
4. pare down cron scripts
5. create a test environment
6. source control schema and config files
7. for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll forward
8. partition appropriately
9. partition your database when you have real data — do not assume you know your dataset until you have real data

MySQL Server Overall Tips:

1. innodb_flush_commit=0 can help slave lag
2. Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help determine the smallest data type for your needs.
3. use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive lock. share mode vs. FOR UPDATE
4. if you can, compress text/blobs
5. compress static data
6. don’t back up static data as often
7. enable and increase the query and buffer caches if appropriate
8. config params — http://docs.cellblue.nl/easy_mysql_performance_tweaks/ is a good reference

Config variables & tips:

1. use one of the supplied config files
2. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables
3. be aware of global vs. per-connection variables
4. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
5. be aware of swapping esp. with Linux, “swappiness” (bypass OS filecache for innodb data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
6. defragment tables, rebuild indexes, do table maintenance
7. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller
8. more RAM is good so faster disk speed
9. use 64-bit architectures
10. –skip-name-resolve
11. increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)
12. look up memory tuning parameter for on-insert caching
13. increase temp table size in a data warehousing environment (default is 32Mb) so it doesn’t write to disk (also constrained by max_heap_table_size, default 16Mb)
14. Run in SQL_MODE=STRICT to help identify warnings
15. /tmp dir on battery-backed write cache
16. consider battery-backed RAM for innodb logfiles
17. use –safe-updates for client
18. Redundant data is redundant

Storage Engine Performance Tips:

1. InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary key very large
2. Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table.
3. BLACKHOLE engine and replication is much faster than FEDERATED tables for things like logs.
4. Know your storage engines and what performs best for your needs, know that different ones exist.
1. ie, use MERGE tables ARCHIVE tables for logs
2. Archive old data — don’t be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables
5. use row-level instead of table-level locking for OLTP workloads
6. try out a few schemas and storage engines in your test environment before picking one.

Database Design Performance Tips:

1. Design sane query schemas. don’t be afraid of table joins, often they are faster than denormalization
2. Don’t use boolean flags
3. Use Indexes
4. Don’t Index Everything
5. Do not duplicate indexes
6. Do not use large columns in indexes if the ratio of SELECTs:INSERTs is low.
7. be careful of redundant columns in an index or across indexes
8. Use a clever key and ORDER BY instead of MAX
9. Normalize first, and denormalize where appropriate.
10. Databases are not spreadsheets, even though Access really really looks like one. Then again, Access isn’t a real database
11. use INET_ATON and INET_NTOA for IP addresses, not char or varchar
12. make it a habit to REVERSE() email addresses, so you can easily search domains (this will help avoid wildcards at the start of LIKE queries if you want to find everyone whose e-mail is in a certain domain)
13. In 5.1 BOOL/BIT NOT NULL type is 1 bit, in previous versions it’s 1 byte.
14. A NULL data type can take more room to store than NOT NULL
15. Choose appropriate character sets & collations — UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.
16. Use Triggers wisely
17. use min_rows and max_rows to specify approximate data size so space can be pre-allocated and reference points can be calculated.
18. Use HASH indexing for indexing across columns with similar data prefixes
19. Use myisam_pack_keys for int data
20. be able to change your schema without ruining functionality of your code
21. segregate tables/databases that benefit from different configuration variables

Discussion about optimize MySQL to handle a high traffic website.

Discussion about optimize MySQL to handle a high traffic website.

MySQL settings, many concurrent users. 
 
I run a site for a client that has over 3000 users that log in for about 5-7 hours  per day each. So, at peak times, we have to handle about 2000 concurrent users. When configured correctly, PHP and MySQL can handle this load wonderfully on fairly cheap Intel architecture. First off, hardware.
 
1) It is better to have 2 separate servers for Apache/PHP and MySQL with the Linux of your choice. 
2) Try not to run too much else on either box; leave the resources for Apache/PHP and MySQL. 
 
Here are the specs on each box in my config: 
1) Apache/PHP: Pentium 3, 600 MHZ, 512 megs ram. 
2) MySQL: Dual Pentium 3, 750 MHZ (1500 MHZ total), 2 gigs ram. 
 
The reason for this configuration is that it is very database heavy; it is a member’s only web site with username and password required for login, fully personalized. It is an online school, so each student has their suite of tools for attending school, their courses, report cards, time logging, and much more. Teachers have web based tools to create their courses, including lessons, text to speech audio, and more. 
 
1) PHP coding: be sure to use persistent connections! Opening and closing a connection from your Apache/PHP box to your MySQL box is a very heavy load. By using persistent connections, a high capacity site will open connections and share them to exchange data rather than opening a connection on each page request, sending the data, then closing, and repeating that process at least once for every user click! Be sure to use "mysql_pconnect" instead of "mysql_connect" and also that appropriate changes are made in "php.ini" or overridden by using the command "ini_set". 
You can find more documentation on doing this at the php web site. 
 
2) Apache set up ("httpd.conf"): I've changed these various settings, and played with them until they seem to keep the most "idle %" reported in "top". 
 
   MinSpareServers 10 
   MaxSpareServers 20 
   StartServers 70 
   MaxClients 255 
 
3) Mysql set up ("my.cnf"). The MySQL config file, my.cnf. 
   
Here is what to add under the [mysqld] heading. The two lines, "max_connections" and "max_user_connections" are where the magic happens. Since your Apache/PHP box is connecting to MySQL, it appears as a single user. MySQL defaults to 1 max connection, with 1 max connection per user. The following lines make it so your Apache/PHP box can connect to your MySQL box up to the number you have set "MaxClients" to in the Apache config above. By using persistent connections, you can pretty much get Apache up, have it connect to MySQL upon start up, and just use the persistent connections to pass data between the two boxes rather than opening connections. Its much more efficient that way. 
 
set-variable = max_connections = 300 
(this must be higher than "MaxClients" set in Apache, or you won't fully maximize use) 
set-variable = max_user_connections = 300 
set-variable = table_cache=1200 
(Max number of tables in join multiplied by max_user_connections) 
 
A few other MySQL tunings: 
set-variable = max_allowed_packet=1M (sanity check to stop runaway queries) 
set-variable = max_connect_errors=999999 
(stop mysqld from shutting down if there are connect errors - this defaults to 1 error and mysqld stops!)