C# '\n' saved in different bytes than expected - c#

If I save this string to a text file;
Hello this \n is a test message
The \n character is saved as HEX [5C 6E] I would like to have it saved as [0A].
I believe this is an encoding issue?
I am using;
// 1252 is a variable in the application
Encoding codePage = Encoding.GetEncoding("1252");
Byte[] bytes = new UTF8Encoding(true).GetBytes("Hello this \\n is a test message");
Byte[] encodedBytes = Encoding.Convert(Encoding.UTF8, codePage , bytes);
All this is inside a FileStream scope and uses fs.Write to write the encodedBytes into the file.
I have tried to use \r\n but had the same result.
Any suggestions?
Thanks!
EDIT
The string is being read from a tsv file and placed into an string array. The string being read has the "\n" in it.
To read the string I use a StreamReader reader and split at \t

At execution time, your string contains a backslash character followed by an n. They're encoded exactly as they should be. If you actually want a linefeed character, you shouldn't be escaping the backslash in your code:
Byte[] bytes = new UTF8Encoding(true).GetBytes("Hello this \n is a test message");
That string literal uses \n to represent U+000A, the linefeed character. At execution time, the string won't contain a backslash or an n - it will only contain the linefeed.
However, your code is already odd in that if you want to get the encoded form of a string, there's no reason to go via UTF-8:
byte encodedBytes = codePage.GetBytes("Hello this \n is a test message");

Related

Why Notepad++ shows Carriage return + Line feed for both \r and \n

In C# define 4 variable as below:
string s1 = "\r";
string s2 = "\n";
string CarriageReturn = (Convert.ToChar(13)).ToString();
string LineFeed = (Convert.ToChar(10)).ToString();
Then by watching copy their value in Notepad++ and click on "Show all characters". Interestingly you can see there is no difference between \r and \n and for both of them, it shows CR LF.
Is it a bug or something else? How can we explain this?
Interestingly you can see there no difference with \r and \n and for both of them it shows CR LF Is it a bug or something else?
It is not a bug. CRLF is the default for the Environment.NewLine in Windows: a 'string containing "\r\n" for non-Unix platforms, or a string containing "\n" for Unix platforms.'
How can we explain this?
It probably results from the way you are outputting the string values to a file. If you use a method that adds new lines, such as WriteAllLines() does, then there will automatically be a CRLF at the end of each value you write.
For instance, we can run the following program.
string r = "\r";
string n = "\n";
string CarriageReturn = (Convert.ToChar(13)).ToString();
string LineFeed = (Convert.ToChar(10)).ToString();
var content = new string[] {
$"(r:{r})",
$"(n:{n})",
$"(13:{CarriageReturn})",
$"(10:{LineFeed})"
};
System.IO.File.WriteAllLines("output1.txt", content);
System.IO.File.WriteAllText("output2.txt", string.Join("", content));
It produces two output files. The one on the left used WriteAllLines to write four lines. The one on the right used WriteAllText() and did not write any new lines.
In both, all of the content outside parentheses is independent of your code. That is, the CRLF symbols are part of writing a line in the call to WriteAllLines.

Unicode string to binary string and binary string to unicode c#

I have a unicode text with some unicode characters say,"Hello, world! this paragraph has some unicode characters."
I want to convert this paragraph to binary string i.e in binary digits with datatype string. and after converting, I also want to convert that binary string back to unicode string.
If you're simply looking for a way to decode and encode a string into byte[] and not actual binary then i would use System.Text
The actual example from msdn:
string unicodeString = "This string contains the unicode character Pi (\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Don't forget
using System;
using System.Text;
Since there are several encodings for the Unicode character set, you have to pick: UTF-8, UTF-16, UTF-32, etc. Say you picked UTF-8. You have to use the same encoding going both ways.
To convert to a binary string:
String.Join(
String.Empty, // running them all together makes it tricky.
Encoding.UTF8
.GetBytes("Hello, world! this paragraph has some unicode characters.")
.Select(byt => Convert.ToString(byt, 2).PadLeft(8, '0'))) // must ensure 8 digits.
And back again:
Encoding.UTF8.GetString(
Regex.Split(
"010010000110010101101100011011000110111100101100001000000111011101101111011100100110110001100100001000010010000001110100011010000110100101110011001000000111000001100001011100100110000101100111011100100110000101110000011010000010000001101000011000010111001100100000011100110110111101101101011001010010000001110101011011100110100101100011011011110110010001100101001000000110001101101000011000010111001001100001011000110111010001100101011100100111001100101110"
,"(.{8})") // this is the consequence of running them all together.
.Where(binary => !String.IsNullOrEmpty(binary)) // keeps the matches; drops empty parts
.Select(binary => Convert.ToByte(binary, 2))
.ToArray())

How to to read data from text file in c#?

I want to read data from text file
I tried but it showing errors
showing error at the path of the file
string txtfile = File.ReadAllText("D:\Temp\textdata.txt");
string txtdata = File.ReadAllText("D:\Temp\textstrings.txt");
string txtpara = File.ReadAllText("D:\Temp\textlines.txt");
Console.WriteLine(txtfile);
Console.WriteLine("\n");
Console.WriteLine("\n");
Console.WriteLine(txtpara);
Console.WriteLine("\n");
Console.WriteLine("\n");
Console.WriteLine("\n");
Console.WriteLine(txtdata);
My file is saved in d:\temp\textdata.txt
Can anyone tell me ?
The problem is backslash symbol in your string containing filename. Sequence of characters \t means tabulation symbol.
You should either prepend your string with # sign like
#"D:\Temp\textdata.txt"
or use double slashes like
"D:\\Temp\\textdata.txt"
string value = File.ReadAllText(#"D:\temp\textdata.txt");
Console.WriteLine(value);
Note the '#', this is an escape character for the extra back slash in your path.

spliting string with double new line characters

I have a .txt file, the data of which I have stored in a long string. There are many single new line characters in the string after every line. And there are double new line characters at the end of paragraphs. What I want is to split the string into an array of paragraphs.
what I thought is the following but it is not working
string filePath = "C:\\Users\\Data.txt";
StreamReader readFile = new StreamReader(filePath);
string Data = readFile.ReadToEnd();
string[] paragraphss = Regex.Split(Data, "(^|[^\n])\n{2}(?!\n)");
please help
thank you
If you're OK with not using regex, Data.Split("\n\n") should do the trick.
On windows systems the newline character is \r\n, on Unix systems it is \n. This may be why the lines aren't being split, because you're specifically looking for \n\n instead of \r\n\r\n.
You can however use Environment.Newline, which will return the correct newline character for whatever environment the software is running on.
Inspired by #LueTm's answer and #Traubenfuchs' comment, just making it look compiler friendly and complete. Here's how to split a string with double new line characters:
Data.Split(new string[] { "\r\n\r\n" }, StringSplitOptions.None);

C# UTF7Encoding for first bracket ' { '

While reading bytes from a file containing UTF7 encoded characters the first bracket '{' is supposed to be encoded to 123 or 007B but it is not happening.All other characters are encoded right but not '{'.The code I am using is given below.
StreamReader _HistoryLocation = new StreamReader("abc.txt");
String _ftpInformation = _HistoryLocation.ReadLine();
UTF7Encoding utf7 = new UTF7Encoding();
Byte[] encodedBytes = utf7.GetBytes(_ftpInformation);
What might be the problem ?
As per RFC2152 that you reference '{' and similar characters may only optionally be encoded as directly - they may instead be encoded.
Notice that UTF7Encoding has an overloaded constructor with an allowOptionals flag that will directly encode the RFC2152 optional characters.

Categories