How to test for non ASCII characters in a file name - c#

On a asp.net web site, a user tried to upload a file as an email attachment that contained an emdash in the file name. When sending this as an email attachment (exchange server) the file got converted to _utf8_B_****.dat
So, on a .aspx page, I need to be able to detect if an emdash is present in the filename of a file that is uploaded as part of the Request.Files collection.
string s = "a—b-";
byte[] arr = Encoding.ASCII.GetBytes(s);
foreach (byte element in arr)
{
Response.Write(element.ToString() + ",");
}
The string above has an emdash as the second character and a normal hyphen as the fourth character.
The code above prints 97,63,97,45 to the screen.
I assumed that as an emdash is not a valid ASCII character, either an error would be thrown or some indication shown that it was not a valid ASCII character. Yet it returns 63.
How can I detect an emdash in a file name so I can say to the user 'Your file name has an invalid character in it'? I have seen other questions on this issue, I can't get them to work.

This should probably do the trick:
foreach (char c in s) {
if (c >= 128) {
Response.Write("Non-ascii char detected: {0}", c);
}
}
I believe that Encoding.ASCII.GetBytes converts to ASCII first, so you should never see non-ASCII characters when you call that.

How can I detect an emdash in a file name so I can say to the user 'Your file name has an invalid character in it'?
That's the wrong way around, because tomorrow a user will upload a file with another unicode character your filesystem or its API doesn't support. Besides you don't need ASCII, because NTFS can handle a lot more than 7 bytes per character.
The right question is: "What characters can I use to save a file"? But then again you'll be tied to the filesystem implementation. You'd best just generate a random filename and write the file to that path, and store the filename in a database so you can view the original filename.
If you do want to save the file under the user-provided path, you'll have to remove Path.GetInvalidPathChars() and Path.GetInvalidFileNameChars() from your the input.
If the problem is not the filesystem but the mail system, please show relevant code and error message.

Related

Write text to file in C# with 513 space characters

Here is a code that writes the string to a file
System.IO.File.WriteAllText("test.txt", "P ");
It's basically the character 'P' followed by a total of 513 space character.
When I open the file in Notepad++, it appears to be fine. However, when I open in windows Notepad, all I see is garbled characters.
If instead of 513 space character, I add 514 or 512, it opens fine in Notepad.
What am I missing?
What you are missing is that Notepad is guessing, and it is not because your length is specifically 513 spaces ... it is because it is an even number of bytes and the file size is >= 100 total bytes. Try 511 or 515 spaces ... or 99 ... you'll see the same misinterpretation of your file contents. With an odd number of bytes, Notepad can assume that your file is not any of the double-byte encodings, because those would all result in 2 bytes per character = even number of total bytes in the file. If you give the file a few more low-order ASCII characters at the beginning (e.g., "PICKLE" + spaces), Notepad does a much better job of understanding that it should treat the content as single-byte chars.
The suggested approach of including Encoding.UTF8 is the easiest fix ... it will write a BOM to the beginning of the file which tells Notepad (and Notepad++) what the format of the data is, so that it doesn't have to resort to this guessing behavior (you can see the difference between your original approach and the BOM approach by opening both in Notepad++, then look in the bottom-right corner of the app. With the BOM, it will tell you the encoding is UTF-8-BOM ... without it, it will just say UTF-8).
I should also say that the contents of your file are not 'wrong', per se... the weird format is purely due to Notepad's "guessing" algorithm. So unless it's a requirement that people use Notepad to read your file with 1 letter and a large, odd number of spaces ... maybe just don't sweat it. If you do change to writing the file with Encoding.UTF8, then you do need to ensure that any other system that reads your file knows how to honor the BOM, because it is a real change to the contents of your file. If you cannot verify that all consumers of your file can/will handle the BOM, then it may be safer to just understand that Notepad happens to make a bad guess for your specific use case, and leave the raw contents exactly how you want them.
You can verify the physical difference in your file with the BOM by doing a binary read and then converting them to a string (you can't "see" the change with ReadAllText, because it honors & strips the BOM):
byte[] contents = System.IO.File.ReadAllBytes("test.txt");
Console.WriteLine(Encoding.ASCII.GetString(contents));
Try passing in a different encoding:
i. System.IO.File.WriteAllText(filename , stringVariable, Encoding.UTF8);
ii. System.IO.File.WriteAllText(filename , stringVariable, Encoding.UTF32);
iii. etc.
Also You could try using another way to build your string, to make it be easier to read, change and count, instead of tapping the space bar 513 times;
i. Use the string constructor (like #Tigran suggested)
var result = "P" + new String(' ', 513);
ii. Use the stringBuilder
var stringBuilder = new StringBuilder();
stringBuilder.Append("P");
for (var i = 1; i <= 513; i++) { stringBuilder.Append(" "); }
iii. Or both
public string AppendSpacesToString(string stringValue, int numberOfSpaces)
{
var stringBuilder = new StringBuilder();
stringBuilder.Append(stringValue);
stringBuilder.Append(new String(' ', numberOfSpaces));
return stringBuilder.ToString();
}

C# Unrecognized characters while reading from binary file

I have some items who's information is split into two parts, one is contents of a binary file, and other is textual entry inside .txt file. I am trying to make an app that will pack this info into one textual file (textual file because I have reasons to want this file to also be humanly readable as well), with ability to later unpack that file back by creating new binary file and text entry.
The first problem I ran into so far: some info is lost when converting binary into string (or perhaps sooner, during reading of bytes), and I'm not sure if the file is in weird format or I'm doing something wrong. Some characters get shown as question marks.
Example of characters which are replaced with question marks:
ýÿÿ
This is the part where info is read from the binary file and gets encoded into a string (which is how I inteded to store it inside a text file).
byte[] binaryFile = File.ReadAllBytes(pathBinary);
// I also tried this for some reason: byte[] binaryFile = Encoding.ASCII.GetBytes(File.ReadAllText(pathBinary));
string binaryFileText = Convert.ToBase64String(binaryFile); //this is the coded string that goes into joined file to hold binary file information, when decoded the result shows question marks instead of some characters
MessageBox.Show("binary file text: " + Encoding.ASCII.GetString(binaryFile), "debug", MessageBoxButtons.OK, MessageBoxIcon.Information); //this also shows question marks
I expect a few more caveats along the way with second functionality of the app (unpacking back into text and binary), but so far my main problem is unrecognized characters during reading of the binary file or converting it into string, which makes this data unusable in storing as text for purpose of reproducing the file. Any help would be appreciated.
There is no universal conversion of binary string data to a string. A string is a series of unicode characters and as such can hold any character of the unicode range.
Binary data is a series of bytes and as such can be anything from video to a string in various formats.
Since there are multiple binary string representations, you need an Encoding to convert one into the other. The encoding you choose has to match the binary string format. If it doesn't you will get the wrong result.
You are using ASCII encoding for the conversion, which is obviously incorrect. ASCII can not encode the full unicode range. That means even if you use it for encoding, the result of the decoding will not always match the original text.
If you have both, encoding and decoding under control, use an Encoding that can do the full round trip, such as UTF8 or Unicode. If you don't encode the string yourself, use the correct Encoding.

German character ß encoding in Livelink using C#

I have folder name that contains German special character such äÄéöÖüß.The following screenshot display contents of LiveLink server.
I want to extract folder from Livelink server using C#.
valueis obtained from LLserver.
var bytes = new List<byte>(value.Length);
foreach (var c in value)
{
bytes.Add((byte)c);
}
var result = Encoding.UTF8.GetString(bytes.ToArray());
Finally, the result is äÄéöÖü�x .where ß is seen as box character '�x'. All other characters present in folder name are decoded successfully/properly except the ß character.
I am just wondering why the same code works for all other German special characters but not for ß.
Could anybody help to fix this problem in C#?
Thanks in advance.
Go to admin panel of server Livelink/livelink.exe?func=admin.sysvars
and set Character Set: UTF-8
and code section change as follow
byte[] bytes = Encoding.Default.GetBytes(value);
var retValue = Encoding.UTF8.GetString(bytes);
It works fine.
You guessed your encoding to be UTF8 and it obviously is not. You will need to find out what encoding the byte stream really represents and use that instead. We cannot help you with that, you will have to ask the sender of said bytes.

How to read double quotes (") in a text file in C#?

I have to read a text file and then to parse it, in C# using VS 2010. The sample text is as follows,
[TOOL_TYPE]
; provides the name of the selected tool for programming
“Phoenix Select Advanced”;
[TOOL_SERIAL_NUMBER]
; provides the serial number for the tool
7654321;
[PRESSURE_CORRECTION]
; provides the Pressure correction information requirement
“Yes”;
[SURFACE_MOUNT]
; provides the surface mount information
“Yes”;
[SAPPHIRE_TYPE]
; provides the sapphire type information
“No”;
Now I have to parse only the string data (in double quotes) and headers (in square brackets[]), and then save it into another text file. I can successfully parse the headers but the string data in double quotes is not appearing correctly, as shown below.
[TOOL_TYPE]
�Phoenix Select Advanced�;
[TOOL_SERIAL_NUMBER]
7654321;
[PRESSURE_CORRECTION]
�Yes�;
[SURFACE_MOUNT]
�Yes�;
[SAPPHIRE_TYPE]
�No�;
[EXTENDED_TELEMETRY]
�Yes�;
[OVERRIDE_SENSE_RESISTOR]
�No�;
Please note a special character (�) which is appearing every time whenever a double quotes appear.
How can I write the double quotes(") in the destination file and avoid (�) ?
Update
I am using the following line for my parsing
temporaryconfigFileWriter.WriteLine(configFileLine, false, Encoding.Unicode);
Here is the complete code I am using:
string temporaryConfigurationFileName = System.Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\Temporary_Configuration_File.txt";
//Pointers to read from Configuration File 'configFileReader' and to write to Temporary Configuration File 'temporaryconfigFileWriter'
StreamReader configFileReader = new StreamReader(CommandLineVariables.ConfigurationFileName);
StreamWriter temporaryconfigFileWriter = new StreamWriter(temporaryConfigurationFileName);
//Check whether the 'END_OF_FILE' header is specified or not, to avoid searching for end of file indefinitely
if ((File.ReadAllText(CommandLineVariables.ConfigurationFileName)).Contains("[END_OF_FILE]"))
{
//Read the file untill reaches the 'END_OF_FILE'
while (!((configFileLine = configFileReader.ReadLine()).Contains("[END_OF_FILE]")))
{
configFileLine = configFileLine.Trim();
if (!(configFileLine.StartsWith(";")) && !(string.IsNullOrEmpty(configFileLine)))
{
temporaryconfigFileWriter.WriteLine(configFileLine, false, Encoding.UTF8);
}
}
// to write the last header [END_OF_FILE]
temporaryconfigFileWriter.WriteLine(configFileLine);
configFileReader.Close();
temporaryconfigFileWriter.Close();
}
Your input file doesn't contain double quotes, that's a lie. It contains the opening double quote and the closing double quote not the standard version.
First you must ensure that you are reading your input with the correct encoding (Try multiple ones and just display the string in a textbox in C# you'll see if it show the characters correctly pretty fast)
If you want such characters to appear in your output you must write the output file as something else than ASCII and if you write it as UTF-8 for example you should ensure that it start with the Byte Order Mark (Otherwise it will be readable but some software like notepad will display 2 characters as it won't detect that the file isn't ASCII).
Another choice is to simply replace “ and ” with "
It appears that you are using proper typographic quotes (“...”) instead of the straight ASCII ones ("..."). My guess would be that you read the text file with the wrong encoding.
If you can see them properly in Notepad and neither ASCII nor one of the Unicode encodings works, then it's probably codepage 1252. You can get that encoding via
Encoding.GetEncoding(1252)

c# equivalent to stripcslashes function?

I am working with a project that includes getting MMS from a mms-gateway and storing the image on disk.
This includes using a received base64encoded string and storing it as a zip to a web server. This zip is then opened, and the image is retrieved.
We have managed to store it as a zip file, but it is corrupted and cannot be opened.
The documentation from the gateway is pretty sparse, and we have only a php example to rely on. I think we have figured out how to "translate" most of it, except for the PHP function stripcslashes(inputvalue). Can anyone shed shed any light on how to do the same thing in c#?
We are thankful for any help!
stripcslashes() looks for "\x" type elements within longer strings (where 'x' could be any character, or perhaps, more than one). If the 'x' is not recognised as meaningful, it just removes the '\' but if it does recognise it as a valid C-style escape sequence (i.e. "\n" is newline; "\t" is tab, etc.), as I understand it, the recognised character is inserted instead: \t will be replaced by a tab character (0x09, I think) in your string.
I'm not aware of any simple way to get the .net framework to do the same thing without building a similar function yourself. This obviously isn't very hard, but you need to know which escape sequences to process.
If you happen to know (or find out by inspecting your base64 text) that the only thing in your input that will need processing is a particular one or two sequences (say, tab characters), it becomes very easy and the following snippet shows use of String.Replace():
string input = #"Some\thing"; // '#' means string stored without processing '\t'
Console.WriteLine(input);
string output = input.Replace(#"\t", "\t");
Console.WriteLine(output);
Of course, if you really do simply want to remove all the slashes:
string output = input.Replace(#"\", "");

Categories