I have a C# desktop application. I need to generate a file (that I read from sql database). In the string values such as name I want to use chars like Ã. I changed the regional setting to standard and choose Romanian format but in my text files I have char Ă.
I have more chars that have to be used:
Instead of Ş I need ª
Instead of Ţ I need Þ
I don't know what should I modify to replace these chars.
Can somebody help me to generate my file using this chars?
if you look at this picture
you'll find the char numbers you need. so now what you need to do is this: if you read from DB to string s;:
int CharNumToReplace;
int CharNumThatReplace the former;
s.Replace((char)(CharNumToReplace), (char)(CharNumThatReplace))
i actually don't see the characters you need here but in debug mode you can find their value by placing a watch (int)myVal
here:
Related
I am trying to do some kind of sentence processing in turkish, and I am using text file for database. But I can not read turkish characters from text file, because of that I can not process the data correctly.
string[] Tempdatabase = File.ReadAllLines(#"C:\Users\dialogs.txt");
textBox1.Text = Tempdatabase[5];
Output:
It's probably an encoding issue. Try using one of the Turkish code page identifiers.
var Tempdatabase =
File.ReadAllLines(#"C:\Users\dialogs.txt", Encoding.GetEncoding("iso-8859-9"));
You can fiddle around using Encoding as much as you like. This might eventually yield the expected result, but bear in mind that this may not work with other files.
Usually, C# processes strings and files using Unicode by default. So unless you really need something else, you should try this instead:
Open your text file in notepad (or any other program) and save it as an UTF-8 file. Then, you should get the expected results without any modifications in your code. This is because C# reads the file using the encoding you saved it with. This is default behavior, which should be preferred.
When you save your text file as UTF-8, then C# will interpret it as such.
This also applies to .html files inside Visual Studio, if you notice that they are displayed incorrectly (parsed with ASCII)
The file contains the text in a specific Turkish character set, not Unicode. If you don't specify any other behaviour, .net will assume Unicode text when reading text from a text file. You have two possible solutions:
Either change the text file to use Unicode (for example utf8) using an external text editor.
Or specify a specific character set to read for example:
string[] Tempdatabase = File.ReadAllLines(#"C:\Users\dialogs.txt", Encoding.Default);
This will use the local character set of the Windows system.
string[] Tempdatabase = File.ReadAllLines(#"C:\Users\dialogs.txt", Encoding.GetEncoding("Windows-1254");
This will use the Turkish character set defined by Microsoft.
I have a project where I need to generate a .pdf file based on the content in an .eml file. When dealing with just english characters, I'm fine, the pdf is created flawlessly and everything works (after I strip all the needless html junk).
However an issue arrives when I try to read in an .eml file that is filled with french characters. In particular the french characters are stored as number codes like =E9, =E8, œ, so on and so forth.
So my issue is this. I read the .eml file in with:
string content = File.ReadAllText(filePath, Encoding.UTF8);
However it comes in as plain text and I don't know how to make the system interpret the =E9 and =E8, etc., codes as French Characters. I can always Regex.Replace everything but I'm hoping for a more elegant solution. Is there any way to take in that long string of plain text and interpret the codes embedded within properly so that the french characters appear instead of their respective codes without using like 30 Regex.Replace expressions?
Due note I can't use any built in iTextSharp functionality since I also need to be able to incorporate french characters (pulled from that .eml file) into the file name of the pdf.
Thanks
You can use regexes, but two regexes should be enough:
text = Regex.Replace(text, #"=([0-9A-Fa-f]{2})", match => ((char)uint.Parse(match.Groups[1].Value, NumberStyles.HexNumber)).ToString());
text = Regex.Replace(text, #"&#(\d+);", match => ((char)uint.Parse(match.Groups[1].Value)).ToString());
A different way would be to find a MIME parsing library which exposes methods for parsing parts of MIME messages, that way you'd decode the =E9 codes. Then, you'd need to call WebUtility.HtmlDecode to parse the HTML entities.
I've searched everywhere for this answer so hopefully it's not a duplicate. I decided I'm just finally going to ask it here.
I have a file named Program1.exe When I drag that file into Notepad or Notepad++ I get all kinds of random symbols and then some readable text. However, when I try to read this file in C#, I either get inaccurate results, or just a big MZ. I've tried all supported encodings in C#. How can notepad programs read a file like this but I simply can't? I try to convert bytes to string and it doesn't work. I try to directly read line by line and it doesn't work. I've even tried binary and it doesn't work.
Thanks for the help! :)
Reading a binary file as text is a peculiar thing to do, but it is possible. Any of the 8-bit encodings will do it just fine. For example, the code below opens and reads an executable and outputs it to the console.
const string fname = #"C:\mystuff\program.exe";
using (var sw = new StreamReader(fname, Encoding.GetEncoding("windows-1252")))
{
var s = sw.ReadToEnd();
s = s.Replace('\x0', ' '); // replace NUL bytes with spaces
Console.WriteLine(s);
}
The result is very similar to what you'll see in Notepad or Notepad++. The "funny symbols" will differ based on how your console is configured, but you get the idea.
By the way, if you examine the string in the debugger, you're going to see something quite different. Those funny symbols are encoded as C# character escapes. For example, nul bytes (value 0) will display as \0 in the debugger, as NUL in Notepad++, and as spaces on the console or in Notepad. Newlines show up as \r in the debugger, etc.
As I said, reading a binary file as text is pretty peculiar. Unless you're just looking to see if there's human-readable data in the file, I can't imagine why you'd want to do this.
Update
I suspect the reason that all you see in the Windows Forms TextBox is "MZ" is that the Windows textbox control (which is what the TextBox ultimately uses), uses the NUL character as a string terminator, so won't display anything after the first NUL. And the first thing after the "MZ" is a NUL (shows as `\0' in the debugger). You'll have to replace the 0's in the string with spaces. I edited the code example above showing how you'd do that.
The exe is a binary file and if you try to read it as a text file you'll get the effect that you are describing. Try using something like a FileStream instead that does not care about the structure of the file but treats it just as a series of bytes.
I have an application which uses clipboard. Users copy data from excel and paste it into my application.
However, sometimes my application cannot handle the data correctly (in production). But I cannot reproduce the problem. Because the spreadsheet user is using contains his own macro and I am not allowed to have his original spreadsheet, I am thinking to add code to dump the text in the clipboard completely to a log file, so that I can see if the data being pasted is in the correct format.
I am especially concerned with the control characters contained in the clipboard. Therefore I want to dump the string as literal. For example, if string in the clipboard contains '\t', I want to see "\t" in the log file, instead of a tab. Is there a way to do this?
Alternatively I can log the text in the hex format. But it seems I need to first convert the string to a char[] and use System.Buffer.BlockCopy to copy the char[] to a byte[]. And then print the byte[] using BitConverter.ToString(). It works fine. But is there a better way (easier, no need to copy, no need to loop through every character) to do this? .Net does not have a build-in function for this?
I used this as reference: byte[] to hex string
Methods suggested there look a little bit "heavy".
Simple Try this
"some text \t some text".Replace("\t",#"\t");
I have to read a text file and then to parse it, in C# using VS 2010. The sample text is as follows,
[TOOL_TYPE]
; provides the name of the selected tool for programming
“Phoenix Select Advanced”;
[TOOL_SERIAL_NUMBER]
; provides the serial number for the tool
7654321;
[PRESSURE_CORRECTION]
; provides the Pressure correction information requirement
“Yes”;
[SURFACE_MOUNT]
; provides the surface mount information
“Yes”;
[SAPPHIRE_TYPE]
; provides the sapphire type information
“No”;
Now I have to parse only the string data (in double quotes) and headers (in square brackets[]), and then save it into another text file. I can successfully parse the headers but the string data in double quotes is not appearing correctly, as shown below.
[TOOL_TYPE]
�Phoenix Select Advanced�;
[TOOL_SERIAL_NUMBER]
7654321;
[PRESSURE_CORRECTION]
�Yes�;
[SURFACE_MOUNT]
�Yes�;
[SAPPHIRE_TYPE]
�No�;
[EXTENDED_TELEMETRY]
�Yes�;
[OVERRIDE_SENSE_RESISTOR]
�No�;
Please note a special character (�) which is appearing every time whenever a double quotes appear.
How can I write the double quotes(") in the destination file and avoid (�) ?
Update
I am using the following line for my parsing
temporaryconfigFileWriter.WriteLine(configFileLine, false, Encoding.Unicode);
Here is the complete code I am using:
string temporaryConfigurationFileName = System.Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\Temporary_Configuration_File.txt";
//Pointers to read from Configuration File 'configFileReader' and to write to Temporary Configuration File 'temporaryconfigFileWriter'
StreamReader configFileReader = new StreamReader(CommandLineVariables.ConfigurationFileName);
StreamWriter temporaryconfigFileWriter = new StreamWriter(temporaryConfigurationFileName);
//Check whether the 'END_OF_FILE' header is specified or not, to avoid searching for end of file indefinitely
if ((File.ReadAllText(CommandLineVariables.ConfigurationFileName)).Contains("[END_OF_FILE]"))
{
//Read the file untill reaches the 'END_OF_FILE'
while (!((configFileLine = configFileReader.ReadLine()).Contains("[END_OF_FILE]")))
{
configFileLine = configFileLine.Trim();
if (!(configFileLine.StartsWith(";")) && !(string.IsNullOrEmpty(configFileLine)))
{
temporaryconfigFileWriter.WriteLine(configFileLine, false, Encoding.UTF8);
}
}
// to write the last header [END_OF_FILE]
temporaryconfigFileWriter.WriteLine(configFileLine);
configFileReader.Close();
temporaryconfigFileWriter.Close();
}
Your input file doesn't contain double quotes, that's a lie. It contains the opening double quote and the closing double quote not the standard version.
First you must ensure that you are reading your input with the correct encoding (Try multiple ones and just display the string in a textbox in C# you'll see if it show the characters correctly pretty fast)
If you want such characters to appear in your output you must write the output file as something else than ASCII and if you write it as UTF-8 for example you should ensure that it start with the Byte Order Mark (Otherwise it will be readable but some software like notepad will display 2 characters as it won't detect that the file isn't ASCII).
Another choice is to simply replace “ and ” with "
It appears that you are using proper typographic quotes (“...”) instead of the straight ASCII ones ("..."). My guess would be that you read the text file with the wrong encoding.
If you can see them properly in Notepad and neither ASCII nor one of the Unicode encodings works, then it's probably codepage 1252. You can get that encoding via
Encoding.GetEncoding(1252)