Monday, December 31, 2012

How to normalize special characters in filenames for Owncloud

This year is almost over. During the holidays I have spent a few hours working on that new on-premise cloud solution that is Owncloud (which I deeply suggest you to try).

The idea behind it is to give people the possibility to access their personal data from any internet point (be it your PDA, an internet point, your Android mobile phone or your work PC) and to share files, folders, pics, movies, documents with other people (friends or colleagues) in a nice and swift web interface.
Owncloud web interface
The problem many people are facing is that files and folders names containing special characters (such as é, à, ç, ù, ü) are very badly handled by Owncloud when it's installed on a Windows platform (be it Windows 7, Windows 2008 or even Windows 2012).

I think that most of the problems these people have come from PHP not natively supporting Unicode (when PHP was started several years ago, UTF-8 was not really supported, there still were non-Unicode OSes like Windows 98/Me and other big languages were also non-Unicode) and from a bad interaction between PHP and NTFS, particularly because the Microsoft file system stores filenames as UTF-16, which is misunderstood by PHP and mysql. I haven't dig more than that on this issue, also because many smart people have given up trying to solve it.

While I am pretty sure Owncloud developers will work hard on the problem (also because the number of Owncloud users running Windows is going to be so high they cannot be neglected for sure), I am not here to propose a solution but just a workaround which will apply for those users who have files named with European accents (such as: à, á, â, ã, ä, å, æ, ç, è, é, ê, ë, ì, í, î, ï, ï, ð, ñ, ñ, ò, ó, ô, ô, õ, ö, ø, ù, ú, û, ü, ý).

My workaround transliterates these letters to their basic unaccented basic form (ie à to a and é to e) and rename all the files and folder inside the Owncloud data directory with simple Latin characters. For instance, Dmitrij_Dmitrievič_Šostakovič.mp3 will be renamed to Dmitrij_Dmitrievic_Sostakovic.mp3. Cool, isn't it?

Here's the Powershell code. Just copy and save it as a .ps1 file, then modify the three variables named $log_folders, $log_files and $rootpath_to_translitterate.

Also, and this is very important, make a backup of your data! I have tested this script but I am not responsible for the use you make of it, so backing up will give you the possibility to roll back in case something goes wrong!!
# Defining the translitterate function

function OC_translitterate {
    param(
        [string]$inputString
    )
    [string]$formD = $inputString.Normalize(
            [System.text.NormalizationForm]::FormD
    )
    $stringBuilder = new-object System.Text.StringBuilder
    for ($i = 0; $i -lt $formD.Length; $i++){
        $unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
        $nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
        if($unicodeCategory -ne $nonSPacingMark){
            $stringBuilder.Append($formD[$i]) | out-null
        }
    }
    $string = $stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
    return $string
}

# Variables definition

$log_folders = "c:\list_folders.log"
$log_files = "c:\list_files.log"
$rootpath_to_translitterate = "z:\owncloud_data\"

# Moving to the Owncloud-shared data folder

set-location $rootpath_to_translitterate 

# Purging ol logs

remove-item $log_folders -force
remove-item $log_files -force

# Resetting a counter

$counter = $null

# Writing to the folders log the names of the folders that need renaming

gci -recurse  | Where {$_.psIsContainer -eq $true} | %{if($_.name -ne (OC_translitterate $_)) {$counter=$counter+1;$folder=$_.directoryname;$corrected=(OC_translitterate $_);write $counter`t$folder`t$_`t$corrected | out-file -append $log_folders}}

# Resetting a counter

$counter = $null

# Writing to the files log the names of the files that need renaming

gci -recurse  | Where {$_.psIsContainer -eq $false} | %{if($_.name -ne (OC_translitterate $_)) {$counter=$counter+1;$folder=$_.directoryname;$corrected=(OC_translitterate $_);write $counter`t$folder`t$_`t$corrected | out-file -append $log_files}}

# Launching notepad to display the two logs for checking

notepad $log_folders 
notepad $log_files 

# Asking for confirmation before proceeding

$yes = New-Object System.Management.Automation.Host.ChoiceDescription "&Yes",""
$no = New-Object System.Management.Automation.Host.ChoiceDescription "&No",""
$choices = [System.Management.Automation.Host.ChoiceDescription[]]($yes,$no)
$caption = "Warning!"
$message = "Do you want to translitterate or not? (MAKE A BACKUP FIRST!!!)"
$result = $Host.UI.PromptForChoice($caption,$message,$choices,0)

# If confirmed, rename

if($result -eq 0) {
Write-Host "You answered YES. Renaming..."
gci -recurse  | Where {$_.psIsContainer -eq $true} | %{if($_.name -ne OC_translitterate $_)) {rename-item $_ -newname OC_translitterate $_)}}
gci -recurse  | Where {$_.psIsContainer -eq $false} | %{if($_.name -ne OC_translitterate $_)) {rename-item $_ -newname OC_translitterate $_)}}
}

# If not confirmed, exit

if($result -eq 1){Write-Host "You answered NO. Exiting."}
I hope you will find this script useful. Do not hesitate to suggest improvements or alternative solutions! Comments are welcome, as always. Have a great New Year's Eve!

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...