String appended to File converted to special charecters

String appended to File converted to special charecters - c#

I am trying to write a some string to a file.
string lines = "server.1=1.1.1.1:9999\nserver.2=2.2.2.2:8888\n";
File.AppendAllText(directory_path + "zoo.cfg", lines);
The string printed in the file is
"敳癲牥ㄮㄽㄮㄮㄮ㤺㤹ਹ敳癲牥㈮㈽㈮㈮㈮㠺㠸ਸ"
.
I tried with encoding ASCII,UTF-8,DEFAULT in the File.AppendAllText. But the output is the same.
Environment:
Visual studio 2015, Windows server 2012, .net v4.5
Please let me know what Iam doing wrong?

Appending doesn't rewrite the entire file. So, you have to use the same encoding as the existing content.
In general, if you don't know the encoding of text, you might as well not have that text. If someone gave it to you, "send it back." When they send it to you, it has to be accompanied by a mutual understanding of the encoding used. (BTW-It's very unlikely that it would be ASCII. And Encoding.Default is useless from machine to machine or user to user.)
Given the result, it makes complete sense that the file encoding was UTF-16 (Encoding.Unicode). Writing using any other encoding is garbage code.

Ok now this is weird. The code given above works perfect on my other machine. But for some reason which i don't know it's not working in my new dev env.
I solved the issue by adding the Encoding.Unicode parameter.
File.AppendAllText(windows_directory_path + "zoo.cfg", lines, Encoding.Unicode);
No idea why Encoding.UTF8/Encoding.ASCII didn't change the format for the resulting string.
Still wondering how same code could behave so differently.
I'm not sure if this is fix or a work around.

Related

Difference behavious between file written by code and by text editor?

I have some xml code that i like to have pretty printed (but is not parsable by tools like XmlDocument etc.) in a browser. I currently write the xml code to a file with
File.WriteAllText(filepath, xmlCode);
When i then open the .xml file in file explorer, I get an error that is can't be parsed. No matter if i open it via code or via file explorer.
However when i copy the exact same message into windows text editor and save it as .xml, it is pretty printed regardless of the browser I open it with. This applies to opening it by code and file explorer.
Does c# or editor add some hidden attributes to the file that is not visible to me (but can be manipulated) which could explain this behaviour?
A colleague of mine said it could have something to do with NTFS streams but I know too little about them.

Thanks for the responses!
It turned out to be more simple than encoding issues and more of a problem of how it was formatted before getting to my end.
Someone must have done:
message.Replace(" ", string.empty);
Which resulted in the xmlns:i part being put together with the attribute name (I belive this is called differently but I don't know the proper name), as such
<AttributeNamexmlns:i="....
My solution:
It still is not parcalable for a XmlDocument or similar for some reason (but that is not necessary for me, as long as it is pretty printed), so my current solution is to open it in a browser (specifically a WebBrowser in Windows Forms, but this should work with a "local" browser too):
First I get rid of the spacing mistake (yes this should be done at an earlier stage in the process, this is just temporary):
var index = msg.IndexOf("xmlns:i");
msg = msg.Insert(index, " ");
Then write it to a file and open it in my custom browser (which is nothing more than a WebBrowser in a form - with nothing modified):
CustomBrowser cb = new CustomBrowser();
cb.Show();
cb.Navigate(filePath);
This then pretty prints the xml doc and displays it. (Thats all I need for my use case)

Clipboard.GetText() always returns empty string in Mono on Mac

Is there a way to get the clipboard on a Mac in Mono that doesn't return an empty string? This is using the latest NON-beta version of mono.
Clipboard.SetText(String) works fine and I can paste to other programs.
Clipboard.GetText(TextDataFormat.UnicodeText)
Clipboard.GetText(TextDataFormat.Text)
Clipboard.GetText(TextDataFormat.Rtf)
All return "" even though Clipboard.ContainsText(TextDataFormat.UnicodeText) returns true.
EDIT:
The solution ended up being to use NSPasteboard on mac. https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/ApplicationKit/Classes/NSPasteboard_Class/Reference/Reference.html

It may be that mono is interpreting TextDataFormat.UnicodeText to mean utf-16 but the 'other application' is placing the text on the clipboard as utf-8.
The following is a patch that, if I remember correctly, fixed an issue similar to this.
clipboard patch
You will need to build mono from and apply the patch to try this out.

The solution ended up being to use NSPasteboard on mac. https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/ApplicationKit/Classes/NSPasteboard_Class/Reference/Reference.html
I could not ever get Clipboard to return anything. Mono has wrappers around NSPasteboard so it ended up being pretty easy this way.

Mono/Linux character encoding issue?

I am working on a project that runs as a .NET console application. I originally wrote this in Windows, but I just converted to GNU/Linux and installed Mono, which runs my application just fine, however there is a problem with the output.
The output should read something like 'Loading plugin '/blabla/bla/path'
but as you can see there is, well.... Giberish
I am pretty sure that there is some sort of escape sequence that is causing this, but I can not thing of what. Any one know what could be causing this mess?

I figured this out! This is caused by changing the Console.Forecolor or back color.

I think it's either a bug in mono, or in your Linux distro. Look over here:
https://github.com/mono/mono/blob/master/mcs/class/corlib/System/TermInfoDriver.cs#L149
Now look at what your $TERM looks like - chances are its contents isn't 'xterm' but something like 'xterm-256color'. You will notice that it falls through. What happens there exactly, i don't know, but i don't think it's falling through to ANSI terminal, as that should also work, but rather picks up a terminfo file from your distro that it chokes upon and emits invalid escape sequences for colour markup. Also you will probably notice that once you set your $TERM to be something it recognizes, all the colours are gonna be shiny and work awesomely.

Rect.ToString() formats with semicolon ("x;y;w;h")

I wonder if someone can help me.
I have an application that uses a config file to store window locations, when I store the location I get it as a Rect and do a simple ConfigSection.SetValue("Location", value.ToString());
99% of the time this string is written as comma separated values x,y,w,h however recently a user complained that our app was raising an exception when opening
After following it through I found that an invalid format exception was raised when parsing a window location, I looked into the config file the location had been written as x;y;w;h, using semicolon as the separator.
I looked at the regional settings and found List Separator, but when I try changing this to a semicolon (as an attempt to replicate the issue), the rect string is still written as comma separated. This means I am unable to replicate locally and do not really know what has caused the issue.
Any insight as to how the separator may have changed would be much appreciated.
Thanks
Kieran

Use InvariantCulture in:
ConfigSection.SetValue("Location", value.ToString(CultureInfo.InvariantCulture));
In the namespace System.Globalization.
Then, the format of the string will use a "generic" culture that is exactly the same on all computers (and does not depend on the settings of the computer).

How to guess the encoding of a file with no BOM in .NET?

I'm using the StreamReader class in .NET like this:
using( StreamReader reader = new StreamReader( "c:\somefile.html", true ) {
string filetext = reader.ReadToEnd();
}
This works fine when the file has a BOM. I ran into trouble with a file with no BOM .. basically I got gibberish. When I specified Encoding.Unicode it worked fine, eg:
using( StreamReader reader = new StreamReader( "c:\somefile.html", Encoding.Unicode, false ) {
string filetext = reader.ReadToEnd();
}
So, I need to get the file contents into a string. So how do people usually handle this? I know there's no solution that will work 100% of the time, but I'd like to improve my odds .. there is obviously software out there that tries to guess (eg, notepad, browsers, etc). Is there a method in the .NET framework that will guess for me? Does anyone have some code they'd like to share?
More background: This question is pretty much the same as mine, but I'm in .NET land. That question led me to a blog listing various encoding detection libraries, but none are in .NET

Libary
http://www.codeproject.com/KB/recipes/DetectEncoding.aspx
And perhaps a useful thread on stackoverflow

You should read this article by Raymond Chen. He goes into detail on how programs can guess what an encoding is (and some of the fun that comes from guessing).
Some files come up strange in Notepad

I had good luck with Ude, a C# port of Mozilla Universal Charset Detector.

UTF-8 is designed in a way that it is unlikely to have a text encoded in an arbitrary 8bit-encoding like latin1 being decoded to proper unicode using UTF-8.
So the minimum approach is this (pseudocode, I don't talk .NET):
try:
u = some_text.decode("UTF-8")
except UnicodeDecodeError:
u = some_text.decode("most-likely-encoding")
For the most-likely-encoding one usually uses e.g. latin1 or cp1252 or whatever. More sophisticated approaches might try & find language-specific character pairings, but I'm not aware of something that does that as a library or some such.

I used this to do something similar a while back:
http://www.conceptdevelopment.net/Localization/NCharDet/

Use Win32's IsTextUnicode.
In the general sense, it is a difficult promlem. See: http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx.

A hacky technique might be to take an MD5 of the text, then decode the text and re-encode it in various encodings, MD5'ing each one. If one matches you guess it's that encoding.
That's obviously too slow for something that handles a lot of files but for something like a text editor I could see it working.
Other than that, it'll be hands dirty porting the java libraries from this post that came from the Delphi SO question, or using the IE MLang feature.

See my (recent) answer to this (as far as I can tell, equivalent) question: How can I detect the encoding/codepage of a text file
It does NOT attempt to guess across a range of possible "national" encodings like MLang and NCharDet do, but rather assumes you know what kind of non-unicode files you're likely to encounter. As far as I can tell from your question, it should address your problem pretty reliably (without relying on the "black box" of MLang).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String appended to File converted to special charecters - c#

Related

Difference behavious between file written by code and by text editor?

Clipboard.GetText() always returns empty string in Mono on Mac

Mono/Linux character encoding issue?

Rect.ToString() formats with semicolon ("x;y;w;h")

How to guess the encoding of a file with no BOM in .NET?

Categories

Resources