Whether you're coding on SQL Server, Oracle, DB2, Sybase, MySQL, or some other relational platform, your goal is the same: You want the database to support as many concurrent users as practical while processing queries as quickly as it can. That means you need to minimize locking, I/O, and network traffic, while optimizing space and resource management.
Tuning a database is both an art and a science. Here are 21 tried and true rules for making your database faster and more efficient.
1. Avoid cursors when you can
This should be a no brainer. Cursors not only suffer from speed problems, they can also cause one operation to block other operations for longer than is necessary. This greatly decreases concurrency in your system.
2. When you can't avoid cursors, use temp tables
There may be times when you need to use a cursor. In these cases, it's better run cursor operations against a temp table instead of a live table. You'll have a single
UPDATE statement against the live table that's much smaller. It holds locks only for a short time and can greatly increase concurrency.
3. Use temp tables wisely
You can use temp tables in a number of other situations as well. For example, if you must join a table to a large table and there's a condition on that large table, you can improve performance by pulling out the subset of data you need from the large table into a temp table and joining with that instead. This will greatly decrease the processing power required, and can be helpful if you have several queries in the procedure that have to make similar joins to the same table.
4. Pre-stage your data
This is an old technique that's often overlooked. If you have reports or procedures that will do similar joins to large tables, pre-stage the data by joining the tables ahead of time and persisting them into a table. Now the reports can run against that pre-staged table and avoid the large join.
You may not always be able to use this technique, but in most environments there are popular tables that get joined all the time. There's no reason why they can't be pre-staged, and it's an excellent way to save server resources.
5. Minimize your nested views
Views are great for obscuring large queries from users, but when you start to nest one view inside another view that's inside yet another view (and so on) you can cause severe performance lags. Too many nested views can result in massive amounts of data returned for every query, slowing your database to a crawl. Or, worse, the query optimizer will simply give up and return nothing.
Unraveling nested views can drop query response times from minutes to seconds.
6. Use Take this scenario: You're inserting data into a temp table and you need it to display a certain value if another value exists. Maybe you're pulling from the Customer table and you want anyone with more than $100,000 in orders to be labeled as "Preferred." Thus, you insert the data into the table and run an
CASE instead of
UPDATE-statement to set the CustomerRank column to "Preferred" for anyone who has more than $100,000 in orders.
Seems logical, right? The problem is that the
UPDATE statement is logged, which means it has to write twice for every single write to the table. The way around this, of course, is to use an inline CASE statement in the SQL query itself. This tests every row for the order amount condition and sets the "Preferred" label before it's written to the table. The performance increase can be staggering.
7. Use table-valued functions instead of scalar
Here's a pro tip: When you use a scalar function in the
SELECT list of a query, you can boost performance by converting it to a table-valued function and use
CROSS APPLY in the query. This can slash your query times in half.
8. Use partitions in SQL Server
SQL Server Enterprise users can take advantage of the data engine's automatic partitioning features to speed performance. In SQL Server, even simple tables are created as single partitions, which you can later split into multiple ones as needed. So when you need to move large amounts of data between tables, you can use the
SWITCH command instead of
DELETE. Because you're only changing the metadata for a single table, instead of deleting and inserting large amounts of data between tables, it takes mere seconds to run.
9. Delete and update in batches
Deleting or updating large amounts of data from huge tables can be a nightmare. The problem is both of these statements run as a single transaction, and if you need to kill them or if something happens while they're working, the system has to roll back the entire transaction. This can take a very long time, as well as also block other transactions for their duration, essentially bottlenecking the system.
The solution is to do deletes or updates in smaller batches. If the transaction gets killed, it only has a small number of rows to roll back, so the database comes back online much faster. And while the smaller batches are committing to disk, others can sneak in and do some work, so concurrency is greatly enhanced.
10. Take your time
Some developers have it stuck in their heads that these delete and update operations must be completed in the same day. That's not always true, especially if you're archiving. You can stretch that operation out as long as you need to, and the smaller batches help accomplish that. If you can take longer to do these intensive operations, spend the extra time and don't bring your system down.
11. Shun the ORMs
Object-relational mappers (ORMs) produce some of the worst code on the planet and are responsible for the majority of performance issues you're likely to encounter. But if you can't avoid them, you can minimize the downside by writing your own stored procedures and have the ORM call them instead of creating its own queries.
12. Use stored procedures whenever you can
Besides leading to more elegant code, stored procedures have a number of other advantages. They greatly reduce traffic, because the calls will always be shorter. They're easier to trace using tools like Profiler, which allows you to get performance statistics and identify potential issues faster. You can describe them in a more consistent way, which means you're more likely to re-use your execution plans, and they're easier to use for edge cases and auditing than ad-hoc queries.
It's true that many .NET coders believe business logic belongs on the front end of the application, not in the database. But they're wrong (well, most of the time).
13. Avoid double-dipping
Sometimes using stored procedures can lead to "double-dipping" - running separate queries on large tables, putting them into temp tables, then joining the tables together. This can be a huge drag on performance. It's much better to query large tables only once whenever possible.
A slightly different scenario is when a subset of a large table is needed by several steps in a process, which causes the large table to be queried each time. Avoid this by querying for the subset and persisting it elsewhere, then pointing the subsequent steps to your smaller data set.
14. Split large transactions into smaller ones
Handling multiple tables in a single transaction can lock all of them until the transaction is finished, leading to tons of blocking. The solution is to break this transaction into multiple routines, each operating on a single table at a time. This reduces the amount of blocking and frees up the other tables so other operations can continue to work.
15. Don't pull those triggers
Using triggers can lead to a similar problem, in that whatever you want them to do will be performed in the same transaction in the original operation. That means you could end up locking multiple tables until the trigger has completed. Splitting them into separate transactions locks up fewer resources and makes it easier to roll back changes if needed. Avoid triggers whenever you can.
16. Avoid clustering on GUIDs
Don't use globally unique identifiers (GUIDs) to order table data. These randomly generated 16-bit numbers can cause your table to fragment much faster. Much better to order your data using a steadily increasing value such as
IDENTITY. The same rules apply to any column that is volatile - tables can fragment dramatically in just a few minutes.
17. Don't count everything in the table
Let's say you need to see if data exists in a table or for a customer, and based on the results of that check, you're going to perform some action.
I can't tell you how often I've seen someone do a
SELECT COUNT(*) FROM dbo.T1 to check for the existence of that data:
SET @CT = (SELECT COUNT(*) FROM dbo.T1); If @CT > 0 BEGIN
It's completely unnecessary. If you want to check for existence, then do this:
If EXISTS (SELECT 1 FROM dbo.T1) BEGIN
In other words, don't count everything in the table. Just get back the first row you find. SQL Server is smart enough to use
EXISTS properly, and the second block of code returns superfast. The larger the table, the bigger difference this will make.
18. Use the system table to count rows
If you really do need to count the rows on a big table, you can pull it from the system table.
SELECT rows from sysindexes will get you the row counts for all of the indexes.
And because the clustered index represents the data itself, you can get the table rows by adding
WHERE indid = 1.
Then simply include the table name and you're golden. So the final query is
SELECT rows FROM sysindexes WHERE object_name(id) = 'T1' AND indexid = 1
19. Only pull the number of columns you need
It's all too easy to code all your queries with
SELECT * instead of listing the columns individually. The problem again is that it pulls more data than you need. A developer does a
SELECT * query against a table with 120 columns and millions of rows, but winds up using only three to five of them. At that point, you're not only processing much more data than you need but you're also taking resources away from other processes.
20. Rewrite queries to avoid negative searches
When you need to compare data row by row with a query that can't use an index - like
SELECT * FROM Customers WHERE RegionID 3 - it's better to rewrite the query so it can use the index. Like so:
SELECT * FROM Customers WHERE RegionID < 3 UNION ALL SELECT * FROM Customers WHERE RegionID
If your data set is large, using the index could greatly outperform the table scan version. Of course, it could also perform worse, so test this before you implement it.
I realize this query breaks Tip No. 13 ("no double dipping"), but that goes to show there are no hard and fast rules. Though we're double dipping here, we're doing it to avoid a costly table scan.
21. Don't blindly re-use code
It's very easy to copy someone else's code because you know it pulls the data you need. The problem is that quite often it pulls [i]much]/i] more data than you need, and developers rarely bother trimming it down, so they end up with a huge superset of data. This usually comes in the form of an extra outer join or an extra condition in the
WHERE clause. You can get huge performance gains if you trim reused code to your exact needs.
And there you have it. Just remember that not all of these techniques will work for every situation. You'll have to experiment to see which methods work best. In general though, careful use of these SQL tips will increase concurrency, speed performance, and make everyone's lives - from DBAs to end users - a lot easier.