Today I have encountered a strange problem with some of my Stand-Alone DFS Targets. Many users using old Windows versions, such as Windows XP pre-SP2, were no more able to browse DFS file shares after I had updated some referrals to reflect an infrastructure change (new folder targets) happening at my company.
The funny thing is that most of the end-users running Windows XP SP2 or Windows 7 had no problem at all in browsing the DFS links from their Windows Explorer.
This problem pushed me to dig inside DFS behavior, design and architecture more than I hadn't done any time before.
After checking that no alerts where reported Server-side, I went to one old XP box and tried to browse the DFS \\dfsserver\root\link. The error I got was a generic "\\dfsserver\root\link refers to a location that is unavailable"...
I then tried to map the DFS link using the good old "net use", hoping for a error code a little bit more specific... but all I got was a "System error 2 has occurred The system cannot find the file specified".
I closed my eyes five seconds, waiting for inspiration and then the first thing that came to my mind was a DNS problem, but I was wrong. I was wrong because I was able to resolve and ping the DFS server as well as the real file server hosting the data. I was also able to access the fileserver directly without passing through the DFS name resolution. The fact that I could access the target path told me also that wrong NTFS permission were not the cause of the problem.
As I always do when I meet weird communication problems, I fired my good old Network Monitor for Windows 2008 (ah yes, DFS server is running on a Windows 2008 Server). After one minute spent putting in place the required filter based on the IP address of the test PC running Windows XP, I started a Network Capture.
What surprised me was that I wasn't seeing the XP box coming in and asking to the DFSSVC the referral target as I had expected. It looked clear to me that some sort of cache had stored the wrong referral target somewhere in the Windows meanders and that the PC was stuck with this wrong old information.
I know that windows uses a Multiple UNC Provider (best known as mup.sys) which stores in memory (yes, not in the registry, nor on some file on the disk) the information about the redirectors used, be it a DFS path, a WebDav path (such as a SharePoint library) or a SMB path.
Inside MUP there are different caches. Concerning DFS, there is one cache for type of distant resources (webdav, dfs or smb) and one cache for referral targets. This last cache, known as PKT (Partition Knowledge Table) Cache or Referral Cache, stores (for as long as defined in the DFS link configuration) the DFS target.
The standard TTL for a DFS target in the Referral Cache is 30 minutes (1800 seconds) but in my configuration the TTL is 300 seconds.
|DFS TTL set to 300 seconds|
So, I imagined that the problem was with the PKT cache not being properly flushed. But why this problem was happening on old Windows XP SP1 clients and not on Windows XP SP2 or Windows 7 clients it was still a mystery to me.
Not knowing how to proceed, I tried to launch the only existing tool that could give me an insight of the MUP cache: dfsutil /pktinfo... but, unfortunately, this executable was missing from my test XP computer. At that moment my understanding was that Windows XP systems have a MUP cache for DFS targets but they have no built-in tool to check its content. Cool isn't it?
I said myself that if I restarted the Workstation service the MUP cache would be flushed... but I was wrong this time too, as the PKT Cache stayed untouched. I thought that if I could restart the Windows XP box the problem could get solved, but I didn't want to go straight to that point and confirm all the bad hype about Windows like "Windows has detected that you have moved your mouse, please restart...".
So I decided to drop a visit to technet.microsoft.com and after some wandering in their huge document library I found out the following sentence:
"For DFS clients that are not running Windows XP with SP2 or Windows Server 2003 with SP1, the Time to Live for a referral determines the earliest time that a client will request a new referral, but only if the existing referral expires before it is accessed again. Clients that use a cached referral will renew the Time to Live of the referral and thus use the referral indefinitely until the client’s referral cache is cleared or the client is restarted. This behavior has changed for clients running Windows XP with SP2 or Windows Server 2003 with SP1. Specifically, the Time to Live value is not reset each time a client accesses a target using a cached referral. Instead, the referral expires after the Time to Live value lapses. This change has several effects:
- Clients running Windows XP with SP2 or Windows Server 2003 with SP1 will request referrals more frequently than other DFS clients, which can cause moderately increased load on the domain-based DFS root servers and domain controllers.
- Because they request new referrals more frequently, clients running Windows XP with SP2 or Windows Server 2003 with SP1 will discover namespace updates more quickly than other DFS clients. "
So this was a really good piece of information. The behavior of the cache had apparently changed with XP SP2 and I should be able to reset the PKT cache on older systems just rebooting. Ok, but this was not a solution to me, just a workaround.
I was a little stuck here. Therefore I decide to fetch the Windows Support Tools for Windows Server 2003 then to copy dfsutil.exe on the XP workstation and view the contents of the referral cache by using the famous /pktinfo switch. As imagined, the PKT cache was still keeping the old referral in memory. I decide to run a dfsutil /pktflush to forcibly flush the Referral Cache and the problem was immediately solved.
Notwithstanding the fact that I have found a workaround I still don't know what to do at enterprise level to flush the MUP cache of all the Windows XP clients. I will keep investigating for a solution to this problem and if I can get any real solution I will update this post. Meanwhile I will tell people to restart their Windows XP clients, or to update them to Windows 7 if they fell like they long for a better designed MUP cache...
If you are a DFS expert or if you are a Desperate Sysadmin and have encountered this same problem, please do not hesitate to share your hints, findings and, hopefully, solutions. Microsoft people are obviously invited to say something too.
Here a few links I have found interesting to read while investigating: