SVN + OS X + “Umlaute”

March 23rd, 2010 by nils

The Mac OS file system —Mac OS Extended (Journaled)— stores umlaut characters as two separate letters (i.e. ‘a’ and ‘¨’). This is referred to as NFD or Normalization Form D with canonical decomposition (see “Unicode Standard Annex #15 – Unicode Normalization Forms”,

This behavior can have unfortunate side effects in applications. Especially remote applications that work path based and interact with different operating systems can run into problems here.

I came across this when I tried to access a subversion repository that contained file names with German umlauts from my Mac. I am running subversion 1.6.5 and when I check out a file with an umlaut in its name, executing “svn stat” will list the file twice, once as missing (with an ‘!’) and once as unversioned (with a ‘?’). A search in the collabnet discussion forums finally confirmed that this is a know issue. The following links provide some documentation:

However, the subversion issues are just one specific bug. For application developers it is important to know that Unicode equivalence is a term to keep in mind. The wikipedia article ( mentions a bug in the samba protocol due to different representations of Unicode characters.

So, next time you come across an issue that involves a Mac and umlauts, Unicode equivalence might be the term to look for.