Wednesday, February 14, 2018

Performance Basics: Key lookup operator and query optimization

One of the most basic things to improve performance in our queries is using indexes, but this doesn't mean that we can blindly create them for all our tables, or just thinking that it is enough to create them only for the columns we use to "search", yes, this could be beneficial if our tables are now big enough, our system is not under a great I/O pressure.

But for high concurrency systems, queries to large tables or more complex T-SQL statements, we could tune indexes a little bit more, and try to avoid disk reads as possible as we can do it.

What we can do to improve performance is to avoid key lookups whenever possible.

Key lookup icon
The Key Lookup operator works this way: you have an index defined for your search argument, the index is used to locate the record, and then once located, an additional search is made to locate the record and retrieve the required columns in our query.

So... for example, if the query is not used often and only to locate a single record, this is not an issue, but for queries that manipulate or select big sets of data, an additional disk read for each record can painfully affect your performance.

I will demonstrate in the following example how to locate a Key lookup and how to fix it

  • Let us locate the query, I use a simple select from AdventureWorks database and the current index definition :

  • select ProductNumber
    from Production.Product
    where ProductNumber = 'CA-7457'
    
    --Index
    CREATE UNIQUE NONCLUSTERED INDEX [AK_Product_ProductNumber] 
    ON [Production].[Product]
    (
     [ProductNumber] ASC
    
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, 
    IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    

  • With the current index definition for that table, if we execute the select statement, an index seek is performed (everything ok until now)


  • But what if, we add another column to the SELECT statement, let's say Name:

    select ProductNumber, Name
    from Production.Product
    where ProductNumber = 'CA-7457'
    

  • Now a key lookup is performed because the index is used just to locate the product number record, and an additional disk read must be performed to retrieve the name:


  • What if we change the index definition to include the Name column?

    CREATE UNIQUE NONCLUSTERED INDEX [AK_Product_ProductNumber]
    ON [Production].[Product]
    (
     [ProductNumber] ASC
    )
    INCLUDE ([Name]) 
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
    SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, 
    DROP_EXISTING = OFF, ONLINE = OFF,
    ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    
  • Now if we execute the same query, an index seek is performed, we eliminated the key lookup and removed the additional I/O overhead just with including that column in the index.



  • The previous example was just for a simple select just for one table, and the performance is almost the same for both cases, but for the next example we can see the query execution is improved using index seeks instead of key lookups:

    Fist execution, indexes doesn't have included columns, key lookups are used

    Second execution: After adding included columns for the index in one of the tables

    Third execution: After adding included columns for the index in the remaining table


Always remember that you must test any of these changes prior to applying it to production, you must measure the performance before and after doing these changes to check if the additional overhead modifying your indexes worth the bonus in performance you could obtain.




Thursday, February 1, 2018

Considerations for dealing with big databases

Stress everywhere, user wants everything "Para ayer"
Photo by Bernard Goldbach


If you are like me, and have SQL Server instances with databases of Terabytes of data, also known as very large database or VLDB, and you have to perform any kind of task on a reasonable time window, while minimizing the outage and keep the user happy with overall performance of the instance.

I want to share you some tips to help to make your life easier with a variety of common tasks you can encounter.

 

Optimize disk space


Maybe the most common task to deal with VLDs, a real headache if you have been handover the server with no further info and you have to optimize the disk space.

SQL Files location

Be aware of the location of each data and log file for your instance, find out how much space is taking each database file and if they are splitted across several files/folders.
Next step is to order the datafiles following best practices, separate datafiles in different drives as follows (if possible):
  • System databases (except for tempdb).
  • Tempdb database.
  • User datafiles (.mdf and .ndf files).
  • User logfiles (.ldf files) 

 

Use multiple filegroups

Most of the databases start with only one default filegroup. As you database becomes bigger and bigger, you should consider splitting your database into multiple filegroups, you can move big historical or log tables into a different filegroup and even mark them as read only, or separate them by business rules so that way, more critical data can be backed up more often using filegroup backups (more on this later on).

Monitor File growth

Put in place regular database growth monitoring (how you achieve that is up to you), but a proper alarm in time can avoid you a unpleasant outage or a critical ticket.
You can set an alarm (email, SMS, IM) when certain threshold is reached, for example a disk having 80% space used, so you have enough response time to check what is going on.

When creating new datafiles, set the expected size upon creation, don't rely on the auto-growth feature, as is recommended to have it enabled as a safeguard, always expand your datafiles manually with the expected size.
Remove the percent autogrowth feature, 10% of 1 GB is not the same as 10% of 1 TB, always prefer fixed sizes.

Data compression

Another feature of SQL Server is to row and page compression to reduce the disk space, the only drawback is that you have to consume more system resources to do the decompression when data is accessed.
Page compression usually saves more disk space, but also have a higher resource consumption for compress and decompress data.
If your database has high resource consumption, maybe this is not for you.


Backup/restore


If you don't have an adequate configuration for your VLDBs, the backup and restore windows can be also an issue, especially if you have tight RTOs for critical systems.

Backup compression

Out there since 2008, this feature is not commonly used, because it is not active by default, this can save you a lot of disk space for backups and reduce the backup time.

Filegroup backup

As stated above, consider separate your data into multiple datafiles and filegroups, this way using the backup filegroup option, you can use any business logic you want to achieve that, so more critical data can be backed up more frequently than the whole set, this also reduces your RTOs for your databases.

Differential backup is your friend

A good backup strategy must consider using differential backups for big databases, since these backups only store the changes from last full backup checkpoint, the size of the file will become larger as the time of the full backup increases, so you have to find a balance between full/diff backups that better suits your needs.



High Availability solutions


Mandatory for critical databases, even when the configuration of those can be a little more difficult due to the size of the databases, in case of a failure this can save your life, is always faster just to move to your secondary database copy than perform a full restore of the backups.
Which high availability solution you use depends a lot on your budget and the system configuration, but anything is better that nothing at all.


Transaction log


A couple of times in my personal experience, sudden high TLog growths have filled the disks and the alarms start to fire everywhere.  None these times was funny at all!


Backup your transaction log regularly

The more frequently you backup your log, the smaller the files will be, due the fact you need all the transaction log backups from the last full/diff, you have also to find the balance between backup frequency/file size.

Simple recovery mode

If your database is big due to a lot of historical data used mainly as read only, why bother having log backups? this can save up space you anyway will never use.
If your database have historical and high transactional data, consider split it into separate databases and configure the reporting ones as simple recovery mode.

Avoid shrinking whenever possible

Database shrinking can lead to fragmentation, always use it with caution and only in exceptional situations.
If your log grows  way often, then you have a issue with a query or queries that needs to be optimized.


T-SQL performance


For your application performance and object maintenance, especially if you are a hard-coded database administrator, you really cannot do much about, when another team is in charge of the database programming part, what you can do is to generate reports and advise user on improvements that could benefit all together.

Check for unused database objects

You will be surprised on the number of unused tables/indexes old databases have, check regularly for unused indexes and tables (we will do a post on how to check this in a near future), this can be helpful to free up some space.

Optimize your T-SQL code

Bad programming techniques, missing indexes, not using search arguments, can lead to poor performance queries, in a big database can lead to queries, reports, processes that take a long time to complete, use additional transaction log space and also use additional tempdb space, all of these thinks are bad for a high response database, so always identify your top resource consuming queries and try to optimize them.
Also as stated above, separating your tables across partitions/filegroups/datafiles/databases can be helpful to overall performance.

Reorganize and rebuild indexes

If your database is small, this is not an issue, but for big databases, think twice before reorganizing/rebuilding an index, this can be a high resource/high time consuming task, and transaction log of your database, as the tempdb size can significantly increase in matter of minutes.
Avoid using maintenance plans for big databases and implement specially tuned index and statistics maintenance jobs, that can determine if the maintenance is really needed for a particular object.


Please be aware that all of the described above are suggestions with no special order, and can drastically vary from one environment to another, so always keep in mind this!