34796

0x202A in filename: Why?

I recently needed to do a isnull in SQL on a varbinary image. So far so (ab)normal. I very quickly wrote a C# program to read in the file no_image.png from my desktop, and output the bytes as hex string.

That program started like this:

byte[] ba = System.IO.File.ReadAllBytes(@"‪D:\UserName\Desktop\no_image.png"); Console.WriteLine(ba.Length); // From here, change ba to hex string 

And as I had used readallbytes countless times before, I figured no big deal. To my surprise, I got a "NotSupported" exception on ReadAllBytes.

I found that the problem was that when I right click on the file, go to tab "Security", and copy-paste the object-name (start marking at the <strong>right</strong> and move inaccurately to the left), this happens.

And it happens only on Windows 8.1 (and perhaps 8), but not on Windows 7.

<img src="https://i.stack.imgur.com/ulwNn.png" alt="202A">

When I output the string in question:

public static string ToHexString(string input) { string strRetVal = null; System.Text.StringBuilder sb = new System.Text.StringBuilder(); foreach (char c in input) { sb.Append(((int)c).ToString("X2")); } strRetVal = sb.ToString(); sb.Length = 0; sb = null; return strRetVal; } // End Function ToHexString string str = ToHexString(@"‪D:\UserName\Desktop\cookie.png"); string strRight = " (" + ToHexString(@"D:\UserName\Desktop\cookie.png") + ")"; // Correct value, for comparison string msg = str + Environment.NewLine + " " + strRight; Console.WriteLine(msg); 

I get this:

202A443A5C557365724E616D655C4465736B746F705C636F6F6B69652E706E67 (443A5C557365724E616D655C4465736B746F705C636F6F6B69652E706E67) 

First thing, when I lookup 20 2A in ascii, it's [space] + *

Since I don't see neither a space nor a star, when I google 20 2A, the first thing I get is paragraph 202a of the german penal code http://dejure.org/gesetze/StGB/202a.html

But I suppose that is rather an unfortunate coincidence and it is actually the unicode control character 'LEFT-TO-RIGHT EMBEDDING' (U+202A) http://www.fileformat.info/info/unicode/char/202a/index.htm

Is that a bug, or is that a feature ? My guess is, it's a buggy feature.

The issue is that the string does not begin with a letter D at all - it just looks like it does.

It appears that the string is hard-coded in your source file.

If that's the case, then you have pasted the string from the security dialog. Unbeknownst to you, the string you pasted begins with the LRO character. This is an invisible character which tales no space, but tells the renderer to render characters from left-to-right, ignoring the usual rendering.

You just need to delete the character.

To do this, position the cursor AFTER the D in the string. Use the Backspace or Delete to Left key <x] to delete the D. Use the key again to delete the invisible LRO character. One more time to delete the ". Now retype the " and the D.

A similar problem could occur wherever the string came from - e.g. from user input, command line, script file etc.

<strong>Note:</strong> The security dialog shows the filename beginning with the LRO character to ensure that characters are displayed in the left-to-right order, which is necessary to ensure that the hierarchy is correctly understood when using RTL characters. e.g. a filename c:\folder\path\to\file in Arabic might be c:\folder\مسار/إلى/ملف. The "gotcha" is the Arabic parts read in the other direction so the word "path" according to google translate is مسار, and that is the rightmost word, making it appear is if it was the last element of the path, when in fact it is the element immediately after "c:\folder\".

Because security object paths have an hierarchy which is in conflict with the RTL text layout rules, the security dialog always displays RTL text in LTR mode. That means that the Arabic words will be mangled (letters in wrong order) on the security tab. (Imagine it as if it said "elif ot htap"). So the meaning is just about discernable, but from the point of view of security, the security semantics are preserved.

Filenames that contain RLO/LRO overrides are commonly created by malware. Eg. “exe” read backwards spells “malware”. You probably have an infected host, or the origin of the .png is infected.

If you look through it in your debugger, you will see that the 'D' char in your @"‪D:\UserName\Desktop\cookie.png" (first use of Hex function) is NOT the same char as in @"D:\UserName\Desktop\cookie.png" (second use).
It looks exactly the same, but in reality it's not event a single char 9try to watch the c variable in your toHex function.