SSP | Adventures In SQL

The theme for this month’s T-SQL Tuesday is indexes so it seemed like the perfect excuse to blog about a script that I have written to see what choices for fill factors on indexes actually does to the structure of those indexes. I have to give special thanks to Brent Ozar (Blog|Twitter) for taking the time to review and offer his thoughts on the query. I have to admit that I was nervous to publish the script because I have not seen anything else like it and figured there must have been a reason for that.

For those that are unfamiliar, fill factor is an optional parameter that can be specified when adding or rebuilding an index. Specifying a fill factor tells SQL Server to leave a certain percentage of each data page open for future inserts in order to lessen the likelihood of page splits. Page splits are what happens when SQL Server tries to add another row to a data page that it does not fit on. Most page splits involve taking half the rows on the page and putting them onto a newly allocated page somewhere else on your data file, allowing sufficient room for the new row to be added to either page. If you are lucky enough that the row you are adding would be the last row on the page then the existing page is left as is and the new row is added to the newly allocated page. Regardless of how the page splits, the new page is almost never anywhere near the other pages of the index it goes with. The scattering of index pages means that the disk heads have to move around a lot more leading to poor performance.

Now that we have talked about the problems that fill factor can help us with, we should talk about the dark side. Yes, the dark side. Setting the fill factor to anything other than the default decreases the rows per page for that index, thereby increasing the number of pages that must be read. According to Books Online, the read performance penalty is twice the chosen fill factor. This means that setting the fill factor to 50% will lead to twice as many reads to get the same data. Even a more reasonable number like 90% would have a 20% performance penalty on all reads.

By now it should be clear that choosing the right fill factor for your indexes is one of the more important steps in creating an index, right behind picking the right key columns. The problem is knowing how to pick a good number and here is where it gets tough because like everything else: It Depends and It Changes. My method of setting fill factors is to calculate the rows per page of an index then use the expected change in rows between reindex operations to figure out what percentage of rows need to be left free per page. The exception to this process is if the index is on an ever increasing value, like an identity column, then the fill factor is automatically 100.

My process works very well for the “It Depends” part of setting a fill factor but completely ignores the “It Changes” part. Over time as tables get larger, the fill factor setting on a table needs to be adjusted down. I have also run into servers where the default fill factor has been set to a value other than 0 (same as 100%), creating a need to quickly identify indexes that could perform better. What I needed was a simple query that I could run that would very quickly give me an idea of where I can adjust fill factors to improve performance.

Here is that query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

SELECT OBJECT_NAME(ips.object_id) AS table_name,
ips.index_type_desc,
ISNULL(i.name, ips.index_type_desc) AS index_name,
ISNULL(REPLACE(RTRIM(( SELECT c.name + CASE WHEN c.is_identity = 1 THEN ' (IDENTITY)' ELSE '' END + CASE WHEN ic.is_descending_key = 0 THEN ' ' ELSE ' DESC ' END
FROM sys.index_columns ic
INNER JOIN sys.columns c
ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = ips.object_id
AND ic.index_id = ips.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH(''))), ' ', ', '), ips.index_type_desc) AS index_keys,
ips.record_count,
(ips.page_count / 128.0) AS space_used_in_MB,
ips.avg_page_space_used_in_percent,
CASE WHEN i.fill_factor = 0 THEN 100 ELSE i.fill_factor END AS fill_factor,
8096 / (ips.max_record_size_in_bytes + 2.00) AS min_rows_per_page,
8096 / (ips.avg_record_size_in_bytes + 2.00) AS avg_rows_per_page,
8096 / (ips.min_record_size_in_bytes + 2.00) AS max_rows_per_page,
8096 * ((100 - (CASE WHEN i.fill_factor = 0 THEN 100.00 ELSE i.fill_factor END)) / 100.00) / (ips.avg_record_size_in_bytes + 2.0000) AS defined_free_rows_per_page,
8096 * ((100 - ips.avg_page_space_used_in_percent) / 100.00) / (ips.avg_record_size_in_bytes + 2) AS actual_free_rows_per_page,
reads = ISNULL(ius.user_seeks, 0) + ISNULL(ius.user_scans, 0) + ISNULL(ius.user_lookups, 0),
writes = ISNULL(ius.user_updates, 0),
1.00 * (ISNULL(ius.user_seeks, 0) + ISNULL(ius.user_scans, 0) + ISNULL(ius.user_lookups, 0)) / ISNULL(CASE WHEN ius.user_updates > 0 THEN ius.user_updates END, 1) AS reads_per_write
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
INNER JOIN sys.indexes i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius
ON ius.database_id = DB_ID()
AND ips.object_id = ius.object_id
AND ips.index_id = ius.index_id
WHERE ips.alloc_unit_type_desc != 'LOB_DATA'
ORDER BY ips.index_type_desc,
OBJECT_NAME(ips.object_id),
(ips.page_count / 128.0)

The query should be very familiar to anyone that has looked at index fragmentation in SQL 2005 or newer. The same rules apply, the only difference is the columns that are being used. For larger databases consider limiting the scan to a single table or even a single index. It is also a good idea to ignore smaller tables here. I leave it up to the individual running the script to define a small table. For some that will be 100 pages, others 500 pages, but anything over 1000 pages should probably be looked at.

The size calculations used in the query are based on the formulas found here: http://msdn.microsoft.com/en-us/library/ms178085(SQL.90).aspx, although the math is quite simple because the DMV accounts for things like null bitmaps and row version information.

I assume that everyone will come up with slightly different ways to use the query. I like to make 2 passes over the data, the first in the morning and the second after the end of the business day. My first pass through the results is used to look for indexes that have too small of a fill factor set. They are easy to find because their free rows per page numbers are less than 1. A value of less than 1 means that the fill factor either needs to be changed to allow some free rows per page or to be more honest about the actual number of free rows per page. My second pass is used to look at the change over the course of the day. The best way to do the comparison is to paste both result sets into Excel and use formulas to look for differences. The second pass will show the indexes that have their factor set either too high or too low. The idea is to focus just as much on the indexes that show significant changes as much as those that do not show any changes at all.

So there it is, a query to tell how good the current fill factor settings are.

To make sure that all users stay as happy as possible it is best to run the query the first time during an off-peak time so that impact can be safely gauged.

Please let me know if you run into any issues, have any ideas that would make this script better or just want to share how you are using it. As always scripts from the internet are like Halloween candy, inspect before consumption. I offer no warranty beyond a sympathetic ear if you should run into any issues.

I recently finished taking down a bunch of servers that I was using to scale out my environment by forcing read only connections off of my main read-write servers. To make a long story short, hardware advances and the additional diagnostic information in SQL 2005 allowed me to consolidate to a few very powerful, reasonably well-tuned read-write servers. The consolidation of servers allowed me to save a ton of power and cooling along with some rack space and a good size chunk of SAN disk.

Taking down the servers means that I now have to update all of my environment diagrams, server configuration scripts and even a spreadsheet or two. Anyone who has ever done this before is cringing right now. One of the worst tasks is updating the replication scripts. I script my replication settings to a network share just in case I do something silly and need to revert to my last know good setup. The scripts can really save my bacon but they are incredibly tedious to create. I have to go into Management Studio, right-click on each publication, select generate script, select script to file then finally find the existing file for that database to add to or decide there is not one and start a new file. With the amount of scripts I had to create it would have easily taken 4, make that 8 hours with interruptions to get everything scripted.

Given, that the whole process would have taken hours and probably would have gotten screwed up along the way I decided to turn to PowerShell. Unfortunately, I did not have a script ready to go….WHHAAAT?…yeah I know..I don’t have a script for everything..so I threw the question out to Twitter. Aaron Nelson (Blog|Twitter) came back right away, pointing me toward SQL PowerShell Extensions (SQLPSX) and very quickly I had a working script. If you are not familiar with SQLPSX please take some time to check it out. It really makes coding PowerShell for SQL Server fast. More importantly, if you are not part of the SQL community on Twitter then get there first.

The actual script is not terribly complex. It takes a distribution server name and an output directory as parameters then works through all publications on each of the servers that connects to the distribution server, scripting them out.

I have only run this script against a dedicated distribution server but it should also work where the publisher is the distributor too.

I spent about 4 hours throwing the script together and generated all of the scripts I needed in a little over 1 minute.

With that, here is the script:

Update: Chad Miller (Blog|Twitter) showed how this script could take better advantage of the features of SQLPSX. His version of the script is available here: http://sev17.com/2010/08/quickly-script-out-replication-redux/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

param ([string]$sqlServer, [string]$outputDirectory, [bool]$scriptPerPublication)

if ($sqlServer -eq "")
{
$sqlserver = Read-Host -Prompt "Please provide a value for -sqlServer"
}

if ($outputDirectory -eq "")
{
$outputDirectory = Read-Host -Prompt "Please provide a value for -outputDirectory"
}

function ScriptPublications
{
param ([string]$sqlServer, [string] $outputDirectory, [bool] $scriptPerPublication)

Import-Module Repl

[string] $path = "$outputDirectory$((get-date).toString('yyyy-MMM-dd_HHmmss'))"

New-Item $path -ItemType Directory | Out-Null

foreach($publication in Get-ReplPublication $sqlServer)
{
[string] $fileName = "{0}{1}.sql" -f $path,$publication.DatabaseName.Replace(" ", "")
if($scriptPerPublication)
{
$fileName = "{0}{1}_{2}.sql" -f $path,$publication.DatabaseName.Replace(" ", ""),$publication.Name.Replace(" ", "")
}
[string] $progressText = "Scripting {0} to {1}" -f $publication.Name.Replace(" ", ""),$fileName
Write-Output $progressText
$publication.Script([Microsoft.SqlServer.Replication.scriptoptions]::Creation `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeArticles `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludePublisherSideSubscriptions `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeCreateSnapshotAgent `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeGo `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::EnableReplicationDB `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludePublicationAccesses `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeCreateLogreaderAgent `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeCreateQueuereaderAgent `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeSubscriberSideSubscriptions) | Out-File $fileName -Append
}
}

[Microsoft.SqlServer.Management.Common.ServerConnection] $serverConnection = new-object Microsoft.SqlServer.Management.Common.ServerConnection($sqlServer)
[Microsoft.SqlServer.Replication.ReplicationServer] $distributor = New-Object Microsoft.SqlServer.Replication.ReplicationServer($serverConnection);

foreach($distributionPublisher in $distributor.DistributionPublishers)
{
if($distributionPublisher.PublisherType -eq "MSSQLSERVER")
{
[string] $path = $outputDirectory + "from_" + $distributionPublisher.Name.Replace("", "_")
ScriptPublications -sqlServer $distributionPublisher.Name -outputDirectory $path -scriptPerPublication $false
}
}

As usual, I hope you find this script helpful. Please let me know if you run into any issues with it or know a better way to do the same thing. Please keep in mind that scripts from the internet are like Halloween candy, inspect before consumption. I offer no warranty beyond a sympathetic ear if you should run into any issues.

Tag: SSP

How is Fill Factor Impacting My Indexes?

How Can I Quickly Script Out Replication?