How is Fill Factor Impacting My Indexes?

TSQLTuesday LogoThe theme for this month’s T-SQL Tuesday is indexes so it seemed like the perfect excuse to blog about a script that I have written to see what choices for fill factors on indexes actually does to the structure of those indexes. I have to give special thanks to Brent Ozar (Blog|Twitter) for taking the time to review and offer his thoughts on the query. I have to admit that I was nervous to publish the script because I have not seen anything else like it and figured there must have been a reason for that.

For those that are unfamiliar, fill factor is an optional parameter that can be specified when adding or rebuilding an index. Specifying a fill factor tells SQL Server to leave a certain percentage of each data page open for future inserts in order to lessen the likelihood of page splits. Page splits are what happens when SQL Server tries to add another row to a data page that it does not fit on. Most page splits involve taking half the rows on the page and putting them onto a newly allocated page somewhere else on your data file, allowing sufficient room for the new row to be added to either page. If you are lucky enough that the row you are adding would be the last row on the page then the existing page is left as is and the new row is added to the newly allocated page. Regardless of how the page splits, the new page is almost never anywhere near the other pages of the index it goes with. The scattering of index pages means that the disk heads have to move around a lot more leading to poor performance.

Now that we have talked about the problems that fill factor can help us with, we should talk about the dark side. Yes, the dark side. Setting the fill factor to anything other than the default decreases the rows per page for that index, thereby increasing the number of pages that must be read. According to Books Online, the read performance penalty is twice the chosen fill factor. This means that setting the fill factor to 50% will lead to twice as many reads to get the same data. Even a more reasonable number like 90% would have a 20% performance penalty on all reads.

By now it should be clear that choosing the right fill factor for your indexes is one of the more important steps in creating an index, right behind picking the right key columns. The problem is knowing how to pick a good number and here is where it gets tough because like everything else: It Depends and It Changes. My method of setting fill factors is to calculate the rows per page of an index then use the expected change in rows between reindex operations to figure out what percentage of rows need to be left free per page. The exception to this process is if the index is on an ever increasing value, like an identity column, then the fill factor is automatically 100.

My process works very well for the “It Depends” part of setting a fill factor but completely ignores the “It Changes” part. Over time as tables get larger, the fill factor setting on a table needs to be adjusted down. I have also run into servers where the default fill factor has been set to a value other than 0 (same as 100%), creating a need to quickly identify indexes that could perform better. What I needed was a simple query that I could run that would very quickly give me an idea of where I can adjust fill factors to improve performance.

Here is that query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
SELECT      OBJECT_NAME(ips.object_id) AS table_name,
            ips.index_type_desc,
            ISNULL(i.name, ips.index_type_desc) AS index_name,
            ISNULL(REPLACE(RTRIM((  SELECT      c.name + CASE WHEN c.is_identity = 1 THEN ' (IDENTITY)' ELSE '' END + CASE WHEN ic.is_descending_key = 0 THEN '  ' ELSE ' DESC  ' END
                                    FROM        sys.index_columns ic
                                                    INNER JOIN sys.columns c
                                                          ON ic.object_id = c.object_id
                                                                AND ic.column_id = c.column_id
                                    WHERE       ic.object_id = ips.object_id
                                                          AND ic.index_id = ips.index_id
                                                                AND ic.is_included_column = 0
                                    ORDER BY    ic.key_ordinal
                                    FOR XML PATH(''))), '  ', ', '), ips.index_type_desc)  AS index_keys,
            ips.record_count,
            (ips.page_count / 128.0) AS space_used_in_MB,
            ips.avg_page_space_used_in_percent,
            CASE WHEN i.fill_factor = 0 THEN 100 ELSE i.fill_factor END AS fill_factor,
            8096 / (ips.max_record_size_in_bytes + 2.00) AS min_rows_per_page,
            8096 / (ips.avg_record_size_in_bytes + 2.00) AS avg_rows_per_page,
            8096 / (ips.min_record_size_in_bytes + 2.00) AS max_rows_per_page,
            8096 * ((100 - (CASE WHEN i.fill_factor = 0 THEN 100.00 ELSE i.fill_factor END)) / 100.00) / (ips.avg_record_size_in_bytes + 2.0000) AS defined_free_rows_per_page,
            8096 * ((100 - ips.avg_page_space_used_in_percent) / 100.00) / (ips.avg_record_size_in_bytes + 2) AS actual_free_rows_per_page,
            reads = ISNULL(ius.user_seeks, 0) + ISNULL(ius.user_scans, 0) + ISNULL(ius.user_lookups, 0),
            writes =  ISNULL(ius.user_updates, 0),
            1.00 * (ISNULL(ius.user_seeks, 0) + ISNULL(ius.user_scans, 0) + ISNULL(ius.user_lookups, 0)) / ISNULL(CASE WHEN ius.user_updates > 0 THEN ius.user_updates END, 1) AS reads_per_write
FROM        sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
            INNER JOIN sys.indexes i
                ON ips.object_id = i.object_id
                    AND ips.index_id = i.index_id
            LEFT OUTER JOIN sys.dm_db_index_usage_stats ius
                ON ius.database_id = DB_ID()
                    AND ips.object_id = ius.object_id
                        AND ips.index_id = ius.index_id
WHERE       ips.alloc_unit_type_desc != 'LOB_DATA'
ORDER BY    ips.index_type_desc,
            OBJECT_NAME(ips.object_id),
            (ips.page_count / 128.0)

The query should be very familiar to anyone that has looked at index fragmentation in SQL 2005 or newer. The same rules apply, the only difference is the columns that are being used. For larger databases consider limiting the scan to a single table or even a single index. It is also a good idea to ignore smaller tables here. I leave it up to the individual running the script to define a small table. For some that will be 100 pages, others 500 pages, but anything over 1000 pages should probably be looked at.

The size calculations used in the query are based on the formulas found here: http://msdn.microsoft.com/en-us/library/ms178085(SQL.90).aspx, although the math is quite simple because the DMV accounts for things like null bitmaps and row version information.

I assume that everyone will come up with slightly different ways to use the query. I like to make 2 passes over the data, the first in the morning and the second after the end of the business day. My first pass through the results is used to look for indexes that have too small of a fill factor set. They are easy to find because their free rows per page numbers are less than 1. A value of less than 1 means that the fill factor either needs to be changed to allow some free rows per page or to be more honest about the actual number of free rows per page. My second pass is used to look at the change over the course of the day. The best way to do the comparison is to paste both result sets into Excel and use formulas to look for differences. The second pass will show the indexes that have their factor set either too high or too low. The idea is to focus just as much on the indexes that show significant changes as much as those that do not show any changes at all.

So there it is, a query to tell how good the current fill factor settings are.

To make sure that all users stay as happy as possible it is best to run the query the first time during an off-peak time so that impact can be safely gauged.

Please let me know if you run into any issues, have any ideas that would make this script better or just want to share how you are using it. As always scripts from the internet are like Halloween candy, inspect before consumption. I offer no warranty beyond a sympathetic ear if you should run into any issues.

A Stored Procedure to Move SSIS Packages Between Servers

Today’s post is one that I have been debating on whether to publish for a while. The purpose of the stored procedure I am sharing is to move SSIS packages stored via SQL Server Storage from one server SQL 2005 server to another in a way that can easily be invoked by any release management system that can call stored procedures. The part I have reservations about is that it uses linked servers. I almost never allow linked servers to be created on the servers I manage, mostly because they can be a security problem. Breaking the rules in this case is what was right for the particular problems I was trying to solve. Please consider whether you can implement this logic another way before using this stored procedure in your environment.

This stored procedure is not terribly complicated so I will run through what it does fairly quickly. The first step is to get the folder_id of the package we want to copy. If it gets more than 1 folder name back it throws an error because it does not know which package to move. If the folder_id returned is null then an error is thrown. If the stored procedure makes it through those checks, the current version a the destination is deleted and the new version is copied there.

Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
CREATE PROCEDURE dbo.move_ssis_package @from_server_name varchar(256), @to_server_name varchar(256), @package_name sysname

AS

DECLARE @sql_command nvarchar(4000),
@folder_id uniqueidentifier,
@foldername sysname

SELECT @sql_command = 'SELECT @folder_id = pf2.[folderid]
FROM [' + @from_server_name + '
].[msdb].[dbo].[sysdtspackagefolders90] pf
INNER JOIN [' + @from_server_name + '].[msdb].[dbo].[sysdtspackages90] p
ON pf.folderid = p.folderid
LEFT OUTER JOIN [' + @to_server_name + '].[msdb].[dbo].[sysdtspackagefolders90] pf2
ON pf.[foldername] = pf2.[foldername]
WHERE p.name = @package_name'

EXEC sp_executesql @sql_command, N'
@package_name sysname, @folder_id uniqueidentifier OUTPUT', @package_name = @package_name, @folder_id=@folder_id OUTPUT

IF @@ROWCOUNT > 1
BEGIN
RAISERROR ('
This package exists in more than one location.', 16, 1)
END

IF @folder_id IS NULL
BEGIN
RAISERROR ('
SSIS Folder does not exist.', 16, 1)
END

SELECT @sql_command = '
DELETE [' + @to_server_name + '].[msdb].[dbo].[sysdtspackages90]
WHERE name = @package_name'

EXEC sp_executesql @sql_command, N'
@package_name sysname', @package_name = @package_name

SELECT @sql_command = '
INSERT [' + @to_server_name + '].[msdb].[dbo].[sysdtspackages90]
SELECT [name]
,[id]
,[description]
,[createdate]
,@folder_id AS [folderid]
,[ownersid]
,[packagedata]
,[packageformat]
,[packagetype]
,[vermajor]
,[verminor]
,[verbuild]
,[vercomments]
,[verid]
,[isencrypted]
,[readrolesid]
,[writerolesid]
FROM [' + @from_server_name + '].[msdb].[dbo].[sysdtspackages90]
WHERE name = @package_name'

EXEC sp_executesql @sql_command, N'
@package_name sysname, @folder_id uniqueidentifier', @package_name = @package_name, @folder_id=@folder_id

Please let me know if you run into any issues, have any ideas that would make this stored procedure better or just want to share how you are using it. As always scripts from the internet are like Halloween candy, inspect before consumption. I offer no warranty beyond a sympathetic ear if you should run into any issues.

Great News! SSMS Tools Pack 1.9 is Coming Out!

Mladen Prajdic (Blog|Twitter) recently announced that the newest version of SSMS Tools Pack is coming out and I am excited.

Why am I excited?

Well, I am glad you asked.

I am excited because it will allow me to define my window colors in SQL Management Studio using regular expressions rather than having to define them each individually.

Why is that such a big deal?

I have 100s of servers and I am constantly adding new servers while decommissioning old ones. The sheer amount of changes that would have to be made manually has always kept me from being able to take advantage of window coloring. Rather than have 100s of rules I now have less than 10 regular expressions that cover all of my servers. Here, check it out:

SSMS Tools Pack Connection Coloring Options Window
Naming convention changed to protect the employed.

So is that all the SSMS Tools Pack does is color windows?

No, not at all. The SSMS Tools Pack is a large suite of plug-ins for SQL Management Studio available as a free download. There are several features that I cannot live without. My favorite feature is that it can be configured to keep a journal of all queries you have run. This can be especially useful if you work in an environment where a sys admin can push updates that cause your machine to reboot whenever they feel like it. If you are not familiar with all of the features currently in the product then please go check out the list here.

If you have not tried out the SSMS Tools Pack then I highly suggest you give version 1.9 a whirl. I know I will.

Stored Procedure to Get SQL Job Execution Status

Introduction

Ever need a programmatic way to figure out if a SQL Agent job is running? I did. After spending a while searching on Google I did not turn up anything I liked so I turned to Profiler and started looking at what calls SQL Management Studio uses. After a bit of poking around I was able to come up with a stored procedure that I could call.

The Meat

The stored procedure is pretty straightforward. Get the job and owner from the system tables, declare a temporary table, call xp_sqlagent_enum_jobs with the results going into the temporary table, finally placing a return value in an output variable and optionally producing a recordset.
Comments at the top of the stored procedure outline the various statuses as of the time of writing. My general rule is everything except a status of 4 means running.

Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
USE [master]
GO

CREATE PROCEDURE [dbo].[sp_dba_GetSqlJobExecutionStatus](@job_name sysname, @select_data int =0, @execution_status int = NULL OUTPUT)

AS

SET NOCOUNT ON

/*
Is the execution status for the jobs.
Value Description
0 Returns only those jobs that are not idle or suspended.
1 Executing.
2 Waiting for thread.
3 Between retries.
4 Idle.
5 Suspended.
7 Performing completion actions

*/


DECLARE @job_id uniqueidentifier,
@is_sysadmin int,
@job_owner sysname

SELECT @job_id = jv.job_id,
@job_owner = sp.name
FROM msdb.dbo.sysjobs_view jv
INNER JOIN sys.server_principals sp
ON jv.owner_sid = sp.sid
WHERE jv.name = @job_name

SELECT @is_sysadmin = ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0)

CREATE TABLE #xp_results
(
job_id uniqueidentifier NOT NULL,
last_run_date int NOT NULL,
last_run_time int NOT NULL,
next_run_date int NOT NULL,
next_run_time int NOT NULL,
next_run_schedule_id int NOT NULL,
requested_to_run int NOT NULL, -- BOOL
request_source int NOT NULL,
request_source_id sysname COLLATE database_default NULL,
running int NOT NULL, -- BOOL
current_step int NOT NULL,
current_retry_attempt int NOT NULL,
job_state int NOT NULL
)

INSERT INTO #xp_results
EXECUTE master.dbo.xp_sqlagent_enum_jobs @is_sysadmin, @job_owner, @job_id

SELECT @execution_status = job_state
FROM #xp_results

DROP TABLE #xp_results

IF @select_data =1
SELECT @job_name AS job_name,
@execution_status AS execution_status

SET NOCOUNT OFF
GO

EXEC sp_MS_marksystemobject 'sp_dba_GetSqlJobExecutionStatus'

Conclusion

I hope you find this stored procedure as useful as I do. Please let me know if you run into any issues or have any ideas that would make it better. As always scripts from the internet are like Halloween candy, inspect before consumption. I offer no warranty beyond a sympathetic ear if you should run into any issues.