SVN + OS X + “Umlaute”

March 23rd, 2010 by nils

The Mac OS file system —Mac OS Extended (Journaled)— stores umlaut characters as two separate letters (i.e. ‘a’ and ‘¨’). This is referred to as NFD or Normalization Form D with canonical decomposition (see “Unicode Standard Annex #15 – Unicode Normalization Forms”, http://unicode.org/reports/tr15/#Norm_Forms).

This behavior can have unfortunate side effects in applications. Especially remote applications that work path based and interact with different operating systems can run into problems here.

I came across this when I tried to access a subversion repository that contained file names with German umlauts from my Mac. I am running subversion 1.6.5 and when I check out a file with an umlaut in its name, executing “svn stat” will list the file twice, once as missing (with an ‘!’) and once as unversioned (with a ‘?’). A search in the collabnet discussion forums finally confirmed that this is a know issue. The following links provide some documentation:
http://www.opensimwiki.net/index.php/SVN
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames

However, the subversion issues are just one specific bug. For application developers it is important to know that Unicode equivalence is a term to keep in mind. The wikipedia article (http://en.wikipedia.org/wiki/Unicode_equivalence) mentions a bug in the samba protocol due to different representations of Unicode characters.

So, next time you come across an issue that involves a Mac and umlauts, Unicode equivalence might be the term to look for.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.