Thursday, October 4, 2012

Data Deduplication in Windows Server 2012

After a few days testing Data Deduplication under Windows Server 2012, here's a few facts as well as my considerations on its performance.
  • No Data Deduplication of ReFS partitions. Source: Personal experience (check this previous post)
  • Data Deduplication is not enabled by default. Source: Personal experience
  • Data Deduplication is all but fast. It is indeed designed as a background service to improve disk space usage, so you can expect best ROI on the long term only. Source: Personal experience
  • From the sentence above it follows the next fact: when new files are added to the volume, they are not optimized right away. Only files that have not been changed for a minimum amount of time are optimized. (This minimum amount of time is set by user-configurable policy.) Source: MSDN 
  • Data Deduplication jobs can be manually started from Task Scheduler under 'Task Scheduler Library', 'Microsoft','Windows','Deduplication'. Source: Deploymentresearch.com 
  • Deduplication has a setting called MinimumFileAgeDays that controls how old a file should be before processing the file. The default setting is 5 days. This setting is configurable by the user and can be set to “0” to process files regardless of how old they are. Source: Technet 
  • The chunks have an average size of 64KB and they are compressed and placed into a chunk store located in a hidden folder at the root of the volume called the System Volume Information, or “SVI folder”. The normal file is replaced by a small reparse point, which has a pointer to a map of all the data streams and chunks required to “rehydrate” the file and serve it up when it is requested. Source: Technet 
  • Redundancy: Extra copies of critical metadata are created automatically. Very popular data chunks receive entire duplicate copies whenever it is referenced 100 times. We call this area “the hotspot”, which is a collection of the most popular chunks. Source: Technet 
  • Files smaller than 32KB are not deduplicated ( because their size is already smaller than the minimum chunk size). Source: Storagegaga.com 
  • The first service behind Deduplication is the Data Deduplication service, which enables the deduplication and compression of data on selected volumes in order to optimize disk space used. If this service is stopped, optimization will no longer occur but access to already optimized data will continue to function. Its command line is C:\Windows\system32\svchost -k ddpsvc Source: Personal experience
  • The second service is Data Deduplication Volume Shadow Copy Service, which is used to back up volumes with deduplication. Its command line is: C:\Windows\system32\svchost -k ddpvssvc Source: Personal experience
  • Deduplication Data Evaluation Tool (ddpeval.exe) doesn't work on Windows 7 Ultimate. Source: Personal experience
  • The deduplication VSS writer reports two components for each volume that contains a deduplication chunk store: the "Chunk Store" under \System Volume Information\Dedup\ChunkStore\* and "Dedup Configuration" under \System Volume Information\Dedup\Settings\*. Source: MSDN
Let's check this last fact and fire a few Powershell commands to check what's inside the Chunk Store:
PS G:\> gci ".\System Volume Information" -Recurse -hidden

    Directory: G:\System Volume Information

Mode   LastWriteTime     Length Name
----   -------------     ------ ----
-a-hs  03/10/2012 14:16  20480  tracking.log

    Directory: G:\System Volume Information\Dedup\ChunkStore

Mode   LastWriteTime     Length Name
----   -------------     ------ ----
d--hs  02/10/2012 13:32         {512528DE-2E46-4C15-A013-8AEA62DEF7A8}.ddp

    Directory: G:\System Volume Information\Dedup\ChunkStore\{512528DE-2E46-4C15-A013-8AEA62DEF7A8}.ddp

Mode   LastWriteTime     Length Name
----   -------------     ------ ----
d--hs  02/10/2012 13:32         Data
d--hs  02/10/2012 13:32         Hotspot
d--hs  02/10/2012 13:32         Stream
-a-hs  02/10/2012 13:32  28     stamp.dat

    Directory: G:\System Volume Information\Dedup\Settings

Mode   LastWriteTime     Length Name
----   -------------     ------ ----
-a-hs  02/10/2012 13:29  2280   dedupConfig.01.xml
-a-hs  02/10/2012 13:29  2280   dedupConfig.02.xml

    Directory: G:\System Volume Information\Dedup\State

Mode   LastWriteTime     Length Name
----   -------------     ------ ----
-a-hs  03/10/2012 09:30  852    analysisState.xml
-a-hs  03/10/2012 13:36  2894   chunkStoreStatistics.xml
-a-hs  03/10/2012 13:36  2442   dedupStatistics.xml
-a-hs  03/10/2012 13:34  864    gcState.xml
-a-hs  03/10/2012 13:36  2066   optimizationState.xml
-a-hs  03/10/2012 13:34  852    scrubbingState.xml
It looks like the configuration of the deduplication service is stored in two XML files, whose content I show here:

dedupConfig.01.xml
<?xml version="1.0"?>
-<root version="1.0">-<properties><property value="0" type="VT_UI8" name="changeTime"/><property value="0" type="VT_UI4" name="options"/><property value="5" type="VT_UI4" name="fileMinimumAge"/><property value="32768" type="VT_UI4" name="fileMinimumSize"/><property value="" type="VT_BSTR" name="excludeFolders"/><property value="" type="VT_BSTR" name="excludeFileExtensions"/><property value="aac|aif|aiff|asf|asx|au|avi|flac|m3u|mid|midi|mov|mp1|mp2|mp3|mp4|mpa|mpe|mpeg|mpeg2|mpeg3|mpg|ogg|qt|qtw|ram|rm|rmi|rmvb|snd|swf|vob|wav|wax|wma|wmv|wvxaccdb|accde|accdr|accdt|docm|docx|dotm|dotx|pptm|potm|potx|ppam|ppsx|pptx|sldx|sldm|thmx|xlsx|xlsm|xltx|xltm|xlsb|xlam|xllace|arc|arj|bhx|bz2|cab|gz|gzip|hpk|hqx|jar|lha|lzh|lzx|pak|pit|rar|sea|sit|sqz|tgz|uu|uue|z|zip|zoo" type="VT_BSTR" name="noCompressionFileExtensions"/><property value="100" type="VT_UI4" name="hotspotThreshold"/><property value="2" type="VT_UI4" name="compressionLevel"/></properties></root>
dedupConfig.02.xml
<?xml version="1.0"?>
-<root version="1.0">-<properties><property value="0" type="VT_UI8" name="changeTime"/><property value="0" type="VT_UI4" name="options"/><property value="5" type="VT_UI4" name="fileMinimumAge"/><property value="32768" type="VT_UI4" name="fileMinimumSize"/><property value="" type="VT_BSTR" name="excludeFolders"/><property value="" type="VT_BSTR" name="excludeFileExtensions"/><property value="aac|aif|aiff|asf|asx|au|avi|flac|m3u|mid|midi|mov|mp1|mp2|mp3|mp4|mpa|mpe|mpeg|mpeg2|mpeg3|mpg|ogg|qt|qtw|ram|rm|rmi|rmvb|snd|swf|vob|wav|wax|wma|wmv|wvxaccdb|accde|accdr|accdt|docm|docx|dotm|dotx|pptm|potm|potx|ppam|ppsx|pptx|sldx|sldm|thmx|xlsx|xlsm|xltx|xltm|xlsb|xlam|xllace|arc|arj|bhx|bz2|cab|gz|gzip|hpk|hqx|jar|lha|lzh|lzx|pak|pit|rar|sea|sit|sqz|tgz|uu|uue|z|zip|zoo" type="VT_BSTR" name="noCompressionFileExtensions"/><property value="100" type="VT_UI4" name="hotspotThreshold"/><property value="2" type="VT_UI4" name="compressionLevel"/></properties></root>
Among the settings contained in this configuration files, there is the list of the excluded file extensions, that is the type of files that won't be analyzed by the Dedupe Service. No file extensions are excluded by default. And there is also the list of the file extensions that the Deduplication Service won't try to compress. This second list includes by default mpeg files, zip files and MSOffice files.

As I said, Deduplication is designed to work on files on the long term. So If I try get the Dedpulication State of a newly added volume I'll get that no files are optimized:
PS C:\> Get-DedupVolume g:

Enabled SavedSpace  SavingsRate Volume
------- ----------  ----------- ------
True    0 B         0 %         G:
Now, if you want Data Deduplication to immediately treat all files on the volume regardless of their age, run:
PS G:\> Get-DedupVolume g: | fl *

ObjectId                 : \\?\Volume{795fedec-0bc3-11e2-93ea-005056984e73}\
Capacity                 : 10734268416
ChunkRedundancyThreshold : 100
DataAccessEnabled        : True
Enabled                  : True
ExcludeFileType          :
ExcludeFolder            :
FreeSpace                : 9539395584
MinimumFileAgeDays       : 5
MinimumFileSize          : 32768
NoCompress               : False
NoCompressionFileType    : {aac, aif, aiff, asf...}
SavedSpace               : 0
SavingsRate              : 0
UnoptimizedSize          : 1194872832
UsedSpace                : 1194872832
Verify                   : False
Volume                   : G:
VolumeId                 : \\?\Volume{795fedec-0bc3-11e2-93ea-005056984e73}\
PSComputerName           :
CimClass                 : ROOT/Microsoft/Windows/Deduplication:MSFT_DedupVolume
CimInstanceProperties    : {Capacity, ChunkRedundancyThreshold, DataAccessEnabled, Enabled...}
CimSystemProperties      : Microsoft.Management.Infrastructure.CimSystemProperties
There you can recognise the parameter we talked about a few lines above: MinimumFileAgeDays. Let's change its value to 0:
PS G:\> Set-DedupVolume g: -MinimumFileAgeDays 0
When I issue this command dedupConfig.01.xml and dedupConfig.02.xml are both modified with the new value.

After a night the Savingsrate value goes from 0% to 75%. Amazing.
PS HKLM:\SOFTWARE> Get-DedupVolume g:

Enabled SavedSpace SavingsRate Volume
------- ---------- ----------- ------
True    856.98 MB  75 %        G:

PS HKLM:\SOFTWARE> Get-DedupMetadata

Volume                         : G:
VolumeId                       : \\?\Volume{795fedec-0bc3-11e2-93ea-005056984e73}\
StoreId                        : {512528DE-2E46-4C15-A013-8AEA62DEF7A8}
DataChunkCount                 : 3511
DataContainerCount             : 1
DataChunkAverageSize           : 24.12 KB
DataChunkMedianSize            : 0 B
DataStoreUncompactedFreespace  : 0 B
StreamMapChunkCount            : 34
StreamMapContainerCount        : 1
StreamMapAverageDataChunkCount :
StreamMapMedianDataChunkCount  :
StreamMapMaxDataChunkCount     :
HotspotChunkCount              : 1
HotspotContainerCount          : 1
HotspotMedianReferenceCount    :
CorruptionLogEntryCount        : 0
TotalChunkStoreSize            : 83.84 MB
The disk space saving can be seen directly under Windows Explorer, as shown in the image, as well in File and Storage Services, Volume view.

Used space after deduplication

Deduplication efficiency rate

These facts listed here are just a starting point to understand this new service proposed by Microsoft. You are free to add your own comments and to share your opinion on the results you get with deduplication!

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...