How to Configure Oracle Redo on the Intel PCIe SSD DC P3700
Back in 2011, I made the statement, "I have put my Oracle redo logs or SQL Server transaction log on nothing but SSDs" (Improve Database Performance: Redo and Transaction Logs on Solid State Disks (SSDs). In fact since the release of the Intel® SSD X25-E series in 2008, it is fair to say I have never looked backed. Even though those X25-Es have long since retired, every new product has convinced me further still that from a performance perspective a hard drive configuration just cannot compete. This is not to say that there have not been new skills to learn, such as configuration details explained here (How to Configure Oracle Redo on SSD (Solid State Disks) with ASM). The Intel® SSD 910 series provided a definite step-up from the X25-E for Oracle workloads (Comparing Performance of Oracle Redo on Solid State Disks (SSDs)) and proved concerns for write peaks was unfounded (Should you put Oracle Database Redo on Solid State Disks (SSDs)). Now with the PCIe*-basedIntel® SSD DC P3600/P3700 series Opens in a new windowwe have the next step in the evolutionary development of SSDs for all types of Oracle workloads.Additionally we have updates in operating system and driver support and therefore a refresh to the previous posts on SSDs for Oracle is warranted to help you get the best out of the Intel SSD DC P3700 series for Oracle redo.NVMeOne significant difference in the new SSDs is the change in interface and driver from AHCI and SATA to NVMe (Non-volatile memory express). For an introduction to NVMe see thisvideo by James Myers and to understand the efficiency that NVMe brings read thispost by Christian Black. As James noted, high performance, consistent, low latency Oracle redo logging also needs high endurance, therefore the P3700 is the drive to use. With a new interface comes a new driver, which fortunately is included in the Linux kernel at the Oracle supported Linux releases of Red Hat and Oracle Linux 6.5, 6.6 and 7.I am using Oracle Linux 7.Booting my system with both a RAID array of Intel SSD DC S3700 series and Intel SSD DC P3700 series shows two new disk devices:First the S3700 array using the previous interfaceDisk /dev/sdb1: 2394.0 GB, 2393997574144 bytes, 4675776512 sectorsUnits = sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 4096 bytesI/O size (minimum/optimal): 4096 bytes / 4096 bytesSecond the new PCIe P3700 using NVMeDisk /dev/nvme0n1: 800.2 GB, 800166076416 bytes, 1562824368 sectorsUnits = sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesChanging the Sector Size to 4KBAs Oracle introduced support for 4KB sector sizes at Oracle release 11g R2, it is important to be at a minimum of this release or Oracle 12c to take full advantage of SSD for Oracle redo. However 'out of the box’ as shown the P3700 presents a 512 byte sector size. We can use this 'as is’ and set the Oracle parameter 'disk_sector_size_override’ to true. With this we can then specify the blocksize to be 4KB when creating a redo log file. Oracle will then use 4KB redo log blocks and performance will not be compromised.As a second option, the P3700 offers a feature called 'Variable Sector Size’. Because we know we need 4KB sectors, we can set up the P3700 to present a 4KB sector size instead. This can then be used transparently by Oracle without the requirement for additional parameters. It is important to do this before you have configured or started to use the drive for Oracle as the operation is destructive of any existing data on the device.To do this, first check that everything is up to date by using the Intel Solid State Drive Data Center Tool fromhttps://downloadcenter.intel.com/download/23931/Intel-Solid-State-Drive-Data-Center-ToolOpens in a new window Be aware that after running the command it will be necessary to reboot the system to pick up the new configuration and use the device.[root@haswex1 ~]# isdct show -intelssd- IntelSSD Index 0 -Bootloader: 8B1B012DDevicePath: /dev/nvme0n1DeviceStatus: HealthyFirmware: 8DV10130FirmwareUpdateAvailable: Firmware is up to date as of this tool release.Index: 0ProductFamily: Intel SSD DC P3700 SeriesModelNumber: INTEL SSDPEDMD800G4SerialNumber: CVFT421500GT800CGNThen run the following command to change the sector size. The parameter LBAFormat=3 sets it to 4KB and LBAFormat=0 sets it back to 512b.[root@haswex1 ~]# isdct start -intelssd 0 Function=NVMeFormat LBAFormat=3 SecureEraseSetting=2 ProtectionInformation=0 MetaDataSetting=0WARNING! You have selected to format the drive!Proceed with the format? (Y|N): YRunning NVMe Format...NVMe Format Successful.After it ran I rebooted, the reboot is necessary because of the need to do an NVMe reset on the device because I am on Oracle Linux 7 with a UEK kernel at 3.8.13-35.3.1. At Linux kernels 3.10 and above you can also run the following command with the system online to do the reset.echo 1 > /sys/class/misc/nvme0/device/resetThe disk should now present the 4KB sector size we want for Oracle redo.Disk /dev/nvme0n1: 800.2 GB, 800166076416 bytes, 195353046 sectorsUnits = sectors of 1 * 4096 = 4096 bytesSector size (logical/physical): 4096 bytes / 4096 bytesI/O size (minimum/optimal): 4096 bytes / 4096 bytesConfiguring the P3700 for ASMFor ASM (Automatic Storage Management) we need a disk with a single partition and, after giving the disk a gpt label, I use the following command to create and check the use of an aligned partition.(parted) mkpart primary 2048s 100%(parted) printModel: Unknown (unknown)Disk /dev/nvme0n1: 195353046sSector size (logical/physical): 4096B/4096BPartition Table: gptDisk Flags:Number Start End Size File system Name Flags1 2048s 195352831s 195350784s primary(parted) align-check optimal 11 aligned(parted)I then use udev to set the device permissions. Note: the scsi_id command can be run independently to find the device id to put in the file and the udevadm command used to apply the rules. Rebooting the system is useful during configuration to ensure that the correct permissions are applied on boot.[root@haswex1 ~]# cd /etc/udev/rules.d/[root@haswex1 rules.d]# more 99-oracleasm.rulesKERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="3600508e000000000c52195372b1d6008", OWNER="oracle", GROUP="dba", MODE="0660"KERNEL=="nvme0n1p1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="365cd2e4080864356494e000000010000", OWNER="oracle", GROUP="dba", MODE="0660"Successfully applied, the oracle user now has ownership of the DC S3700 RAID array device and the P3700 presented by NVMe.[root@haswex1 rules.d]# ls -l /dev/sdb1brw-rw---- 1 oracle dba 8, 17 Mar 9 14:47 /dev/sdb1[root@haswex1 rules.d]# ls -l /dev/nvme0n1p1brw-rw---- 1 oracle dba 259, 1 Mar 9 14:39 /dev/nvme0n1p1Use ASMLIB to mark both disks for ASM.[root@haswex1 rules.d]# oracleasm createdisk VOL2 /dev/nvme0n1p1Writing disk header: doneInstantiating disk: done[root@haswex1 rules.d]# oracleasm listdisksVOL1VOL2As the Oracle user, use theASMCA utilityOpens in a new window to create the ASM disk groups.
I now have 2 disk groups created under ASM.
Because of the way the disk were configured Oracle has automatically detected and applied the sector size of 4KB.[oracle@haswex1 ~]$ sqlplus sys/oracle as sysasmSQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 12 10:30:04 2015Copyright (c) 1982, 2014, Oracle. All rights reserved.Connected to:Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit ProductionWith the Automatic Storage Management optionSQL> select name, sector_size from v$asm_diskgroup;NAME SECTOR_SIZE------------------------------ -----------REDO 4096DATA 4096SPFILES in 4K DISKGROUPSIn previous posts I noted Oracle bug “16870214 : DB STARTUP FAILS WITH ORA-17510 IF SPFILE IS IN 4K SECTOR SIZE DISKGROUP” and even with Oracle 12.1.0.2 this bug is still with us. As both of my diskgroups have a 4KB sector size, this will affect me if I try to create a database in either without having applied patch 16870214.With this bug, upon creating a database with DBCA you will see the following error.
The database is created and the spfile does exist so can be extracted as follows:ASMCMD> cd PARAMETERFILEASMCMD> lsspfile.282.873892817ASMCMD> cp spfile.282.873892817 /home/oracle/testspfilecopying +DATA/TEST/PARAMETERFILE/spfile.282.873892817 -> /home/oracle/testspfileThis spfile is corrupt and attempts to reuse it will result in errors.ORA-17510: Attempt to do i/o beyond file sizeORA-17512: Block Verification FailedHowever, you can extract the parameters by using the strings command and create an external spfile or a spfile in a diskgroup with a 52b sector size. Once complete, the Oracle instance can be started.SQL> create spfile='/u01/app/oracle/product/12.1.0/dbhome_1/dbs/spfileTEST.ora' from pfile='/home/oracle/testpfile';SQL> startupORACLE instance startedCreating Redo Logs under ASMIn viewing the same disks within the Oracle instance, the underlying sector size has been passed right through to the database.SQL> select name, SECTOR_SIZE BLOCK_SIZE from v$asm_diskgroup;NAME BLOCK_SIZE------------------------------ ----------REDO 4096DATA 4096Now it is possible to create a redo log file with a command such as follows:SQL> alter database add logfile '+REDO’ size 32g;…and Oracle will create a redo log automatically with an optimal blocksize of 4KB.SQL> select v$log.group#, member, blocksize from v$log, v$logfile where v$log.group#=3 and v$logfile.group#=3;GROUP#----------MEMBER-----------BLOCKSIZE----------3+REDO/HWEXDB1/ONLINELOG/group_3.256.8741468094096Running an OLTP workload with Oracle Redo on Intel® SSD DC P3700 seriesTo put the Oracle redo on P3700 through its paces I used a HammerDB workload. The redo is set with a standard production type configuration without commit_write and commit_wait parameters. A test shows we are running almost 100,000 transactions per second at redo over 500MB / second and therefore we would be archiving almost 2 TBs per hour.Per SecondPer TransactionPer ExecPer CallRedo size (bytes):504,694,043.75,350.6Log file sync even at this level of throughput is just above 1msEventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait ClassDB CPU35.4K59.1log file sync19,927,44923.2K1.1638.7Commit…and the average log file parallel write showing the average disk response time to just 0.13msEventWaits%Time -outsTotal Wait Time (s)Avg wait (ms)Waits /txn% bg timelog file parallel write3,359,02304420.130.122237277.09There are six log writers on this system. As with previous blog posts on SSDs I observed the log activity to be heaviest on the first three and therefore traced the log file parallel write activity on the first one with the following method:SQL> oradebug setospid 67810;Oracle pid: 18, Unix process pid: 67810, image: oracle@haswex1.example.com (LG00)SQL> oradebug event 10046 trace name context forever level 8;ORA-49100: Failed to process event statement [10046 trace name context forever level 8]SQL> oradebug event 10046 trace name context forever, level 8;The trace file shows the following results for log file parallel write latency to the P3700.Log Writer WorkerOver 1msOver 10msOver 20msMax ElapsedLG001.04%0.01%0.00%14.83msLooking at a scatter plot of all of the log file parallel write latencies recorded in microseconds on the y axis clearly illustrate that any outliers are statistically insignificant and none exceed 15 milliseconds. Most of the writes are sub-millisecond on a system that is processing many millions of transactions a minute while doing so.
A subset of iostat data shows the the device is also far from full utilization.avg-cpu: %user %nice %system %iowait %steal %idle77.30 0.00 8.07 0.24 0.00 14.39Device: wMB/s avgrq-sz avgqu-sz await w_await svctm %utilnvme0n1 589.59 24.32 1.33 0.03 0.03 0.01 27.47ConclusionAs a confirmed believer in SSDs, I have long been convinced that most experiences of poor Oracle redo performance on SSDs has been due to an error in configuration such as sector size, block size and/or alignment as opposed to performance of the underlying device itself. In following the configuration steps I have outlined here, the Intel SSD DC P3700 series shows as an ideal candidate to take Oracle redo to the next level of performance without compromising endurance.