While there are some really good in-depth whitepapers and blogs (specifically by CSS Team and Brent Ozar & his team) to address this topic. However the information in these blogs,whitepapers is kind-off scattered and for a novice the information in some these whitepapers can be overwhelming.In this blog post I would like to explain you the approach one would want to take to perform IO Stress Testing in order to meet expected workload from Application hosted on SQL Server.

If you are migrating to a new server, it is important to capture the IO statistics which is generated by your current workload which can help you to make a decision whether the new server will be able to withstand your current IO workload while providing the expected response time.

In order to capture the current Disk IO statistics, we need to capture the following performance counters

Logical Disk & Physical Disk


Disk Read Bytes/sec   )
Disk Write Bytes/sec  )    Referred as IO bandwidth
Disk Bytes/sec             )
Disk Reads/sec            }
Disk Transfers/sec      }    Equates to IOPs in SAP World
Disk Writes/sec           }
Disk sec/reads             )
Disk sec/Writes           )     Referred as Disk Latency
Disk sec/Transfers      )
Disk Bytes/Read          }
Disk Bytes/Writes       }    IO Transfer Size (generally ranges from 8k to 64k)
Disk Bytes/Transfer   }
Avg Disk Read Queue Length   )
Avg Disk Write Queue Length  )   (Disk Transfers/sec) * (Disk sec/Transfer)
Avg Disk Queue Length             )
If you need more information on these counter, I would recommend you to read this blog.

On my server, I captured the above mentioned performance counters during peak workload and I derived at the following IO statistics generated by current workload.

 

IO Metrics E: (Log Drive) F: (Data Drive) Total Physical Disk
Disk MB/sec 100MB/sec 80MB/s 180MB/sec
Disk IOPs 125 202 344
Disk Bytes/Transfer 23KB 50KB 37KB
Avg Disk Queue Length 5.7 17 23.47
Disk Read Latency 23ms 57ms 51ms
Disk Write Latency 77ms 56ms 67ms

 

Ideally the Disk latency is expected to be within 10-15ms so the above counters already suggests the current Disk performance is poor and not meeting the expected standards.

As a result of poor Disk performance, we start seeing IO_COMPLETION, WRITELOG and PAGEIOLATCH_*  wait types in our environment.

Once we have established the IO Statistics for our current workload we can now look at tools available to benchmark our new Storage which will be hosting our SQL Server and will have to provide throughput which exceeds our current workload within the acceptable Disk latency (< 10-15ms).

Our recommended tool for performing SQL IO stress testing is sqlio. There are other third party tools available  viz IOMeter and CrystalDiskMark, however I prefer to use sqlio. it is purely your own comfort and in this blog post I will be talking about sqlio.

We have another tool called SQLIOSim but as per the guidance from Microsoft CSS Blog, It should not be used for  performance testing purpose and should only be used to test the storage from perspective of data integrity and consistency.

Before performing the Stress test, it helps to get the following information from the Storage Vendor

Which RAID configuration is used ?
RAID 10 is recommended for SQL Server.

How many LUNs are configured?
Generally we recommend separate LUNs for datafiles,log files and Tempdb files.

What is strip_unit_size used for LUN ?
It should be multiple of file_allocation_unit_size to address Disk Alignment issues

How much is the Storage Controller Cache available? How much is Read Cache and how much is the write cache?

Larger the Storage Cache, better is the response time at the expense of cost. For SQL Server, it is recommended to have 80/20 or 90/10 ratio of write cache/read cache for SQL Server since large write cache tends to absorb the checkpoint spikes and improves the write performance of SQL Server.

Once we have the above information available, we need to install sqlio.msi which by default installs itself in
C:\Program Files\sqlio                       For 32 bit
C:\Program Files(x86)\sqlio              For 64 bit

More information about SQLIO command line options is already documented by Brent Ozar in his blog post and hence I would not like to duplicate the efforts here.

To start with first we should create a file called testfile.dat in the drive which needs to be stress tested. So we need to make the necessary changes in param.txt to change the drive letter and change the file size as mentioned in the Brent’s blog.

Practically the file size should be around 3 to 4 times the Storage controller cache size. So if the cache size is 5GB we might want to create the testfile.dat of 20GB size. In my case, we create a file of 100GB size.

We can fire the following command to create the file.
sqlio -kW -s10 -fsequential -t8 -o8 -b64 -LS -Fparam.txt timeout /T 10

During this run of SQLIO, ignore the data points and just let the file be created.

The advantage of using SQLIO for IO Stress testing is gives lot of flexibility to generate various IO patterns as generated by SQL Server.

SQL Server generates both random and sequential IO patterns while transfer size of the IO varies from 8k to 64K but doesn’t exceed 64K except for backup and restore operations.

So in our test, we have 3 variable parameters viz

1. IO Pattern (Sequential or Random)  –> 2 values
2. IO Type (Read or Write) –> 2 values
2. IO Transfer Size (8k,64k,128k,256k,512k)  –> 5 values
3. Outstanding IO requests (1,2,4,……1024) –> 11  values

As we all know the rules of testing, if we have 3 variable parameters on which we need to perform the test, we need to vary one parameter at a time while keeping rest of the parameters constant.

So we will have around ( 2*2* 5* 11= 220)  test runs of sql io. I prefer to create bat files for these test runs beforehand and also capture the above mentioned perfmon counters while running the tests.


Random Read Test

sqlio -dE -BH -kR -frandom -t1 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t2 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o2 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o4 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o12 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o16 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o64 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o128 -s120 -b64 -LS testfile.dat

Note: Do not exceed the no. of threads (-t) beyond the number of processors on the server.

In this test run, as mentioned earlier we keep other parameters constant while varying only 1 of the parameter. So we have random read and io transfer size is fixed to 64KB while outstanding io ( no. of threads (-t) * no. of io per threads (-o)) is increased in steps from 1 to 512.

For the above test, I created a bat files and routed the results to text file. In the output, we need to capture MBs/sec, IOs/sec and Avg_Latency(ms) for each test runs.

 

We captured the result in the excel sheet as shown below

 

Outstanding IO IO MB/sec IOPs Disk Latency
1 11.5 184 4
2 23 369 4
4 42.9 686 5
8 65.9 1054 7
16 109.6 1754.2 8
32 152.67 2442.7 12
48 151.21 2419.4 19
64 149.09 2385.45 26
128 149.77 2396.33 52
256 147.71 2363.38 107
512 151.83 2429.4 210

 

Based on the above results we plotted 3 graphs as shown below

 

So we see that as the outstanding IOs increases the IOPs and IO MB/sec (in other words Disk throughput) also increases until 32 outstanding IOs, beyond which there is no increase in Disk throughput even when there is increase in outstanding IOs.

Any guesses why the graph saturates at 32 outstanding IOs?
No marks for guessing that 32 is HBA queue depth setting which can further increased to 256 if required. For SQL Server generally, the HBA queue depth between 32 and 64 is recommended so you might want to increase the queue depth setting if Disk throughput at 32 outstanding ios doesn’t meet the current workload requirement.

However from SQL Server perspective, the most important parameter which you are interested is the Disk Response time which should be within 15 ms. As seen in the graph, the until 48 outstanding IO requests, the Disk responds under the acceptable limits beyond which it begin to increase exponentially.

Large Random Read Test

In this test, we keep the outstanding IO request constant to 32 (as found from the above test) generating random reads of various IO sizes varying from 8K to 512K.

sqlio -dE -BH -kR -frandom -t4 -o8 -s120 -b8 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o8 -s120 -b128 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o8 -s120 -b256 -LS testfile.dat
sqlio -dE -BH -kR -frandom -t4 -o8 -s120 -b512 -LS testfile.dat

Following are the results from our testing

 

IO Transfer Size IO MB/sec IOPs Disk Latency
8KB 37.88 4849.78 6
64KB 162.33 2597.38 11
128KB 226.57 1812.63 17
256KB 326.37 1305.48 24
512KB 380.92 761.84 41

 

 

 

In this test, between 8k to 64KB if we get the acceptable response time and our IOPs requirement meets the our current workload requirement,  Disk is said to meet the test requirements.

 

 

 

 

 

Similarly,

We need to perform the following test
Random Write Test

sqlio -dE -BH -kW -frandom -t1 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t2 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o2 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o4 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o12 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o16 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o32 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o64 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o128 -s120 -b64 -LS testfile.dat

Large Random Write Test

sqlio -dE -BH -kW -frandom -t4 -o8 -s120 -b8 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o8 -s120 -b128 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o8 -s120 -b256 -LS testfile.dat
sqlio -dE -BH -kW -frandom -t4 -o8 -s120 -b512 -LS testfile.dat

Sequential Read Test

sqlio -dE -BH -kR -fsequential -t1 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t2 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o2 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o4 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o12 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o16 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o64 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o128 -s120 -b64 -LS testfile.dat

Large Sequential Read Test

sqlio -dE -BH -kR -fsequential -t4 -o8 -s120 -b8 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o8 -s120 -b128 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o8 -s120 -b256 -LS testfile.dat
sqlio -dE -BH -kR -fsequential -t4 -o8 -s120 -b512 -LS testfile.dat

Sequential Write Test

sqlio -dE -BH -kW -fsequential -t1 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t2 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o1 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o2 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o4 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o12 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o16 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o32 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o64 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o128 -s120 -b64 -LS testfile.dat

Large Sequential Write Test

sqlio -dE -BH -kW -fsequential -t4 -o8 -s120 -b8 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o8 -s120 -b64 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o8 -s120 -b128 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o8 -s120 -b256 -LS testfile.dat
sqlio -dE -BH -kW -fsequential -t4 -o8 -s120 -b512 -LS testfile.dat

 

 

The results can be captured and verifies whether it meets the expected workload requirement within the acceptable response time (< 15ms).

 

 

Hope this Helps !!

Parikshit Savjani
Premier Field Engineer

Disclaimer: This post is provided “AS IS” with no warranties, and confers no rights

 

Similar Posts

9 Comments

  1. Great article, fantastic job. However, I would argue that the latency ought to be < 15 ms. Not everybody has the cash to address that:-). If it stays around 25 ms, that is great. 30 might not be a killer either.
    And for me, the simplest way to really see how my disk subsystem is performing on prod is to run a backup to 'NUL'. The reason for this is that a SAN engineer might decide that I have way too many spindles (disks) for such a small database, and he's going to change that. It happened before:-).
    Yet again, great article.

  2. thanks for sharing this….its very helpful….

    i have a quick question…..why the IOPS decreases with more Outstanding IO, i understand it would saturates at some point, but I noticed in some test which i ran the IOPS decreases after a certain no. of “Outstanding IO”…..?

  3. Hi Suman,

    If you have HBA Queue Depth specified to a very high value, your outstandiong IOs will not be throttled and hence the IOPs will not saturate but can decrease. And why it decreases?

    Your storage vendor would be the best person to answer that questions, since there might be contention at the disk level, at the fiber channel, caching etc.

    Ideally if you are observing IOP decreasing, you should parallely run a storage tracing which should be analyzed and commented by the storage vendor

  4. Hi Parikshit,

    As part of SQLServer 2012, we have performance&scalable testing module is inbuilt in SQL server can you please elaborate on this module?
    How it is different from SQLIO

    Thansk,
    Sindura.

    1. Hi Sindura,

      In Sql 2012, are you referring to Distributed Replay Server as performance or scalable testing module?

      In either case, none of tools available with SQL 2012 is used to perform IO testing for your Disk. SQLIO is the utility used to generate the same IO pattern as sql server and hence can be used to test the performance (throughput and latency) of the Disk before installing SQL Server on the server.

      So there is no substitute available for SQLIO in SQL 2012 yet.

Leave a Reply to Suman Debnath Cancel reply

Your email address will not be published. Required fields are marked *