Unicode character is written with wrong byteorder

Unicode character is written with wrong byteorder - c#

I'm trying to add byteorder mark to my string. But when I open the output file, my mark is reversed for some reason (0xFF 0xFE is written instead of 0xFE 0xFF). I wonder what could be the reason for such a behaviour...
if (Globals.target_encoding.WebName == "utf-16" && !isURLFrame)
{
BOM = '\uFEFF'+BOM;
}
if (Globals.target_revision.number == 0x04)
{
processedString = BOM + source.text.Replace(Globals.target_separator.ToString(), '\0' + BOM) + '\0';
}
else
{
processedString = BOM + source.text + '\0';
};
if (!isURLFrame)
{
contents = new byte[1 + Globals.target_encoding.GetByteCount(processedString)];
contents[0] = targetEncodingValue(); // Тип кодировки
Array.Copy(Globals.target_encoding.GetBytes(processedString), 0, contents, 1, Globals.target_encoding.GetByteCount(processedString));
}
else
{
contents = new byte[Encoding.ASCII.GetByteCount(processedString)];
Array.Copy(Encoding.ASCII.GetBytes(processedString), 0, contents, 0, Encoding.ASCII.GetByteCount(processedString));
}

Related

Keep the last eight bytes when reading raw image file with C#

Im trying to read a raw image file with C# and keep the last 2 bytes, and last 8 bytes in a variable. However there is something wrong in my if-statement so the variables just appends.. Like this:
twoBytes= eightBytes=
twoBytes=00 eightBytes=00
twoBytes=0000 eightBytes=0000
twoBytes=000000 eightBytes=000000
twoBytes=00000000 eightBytes=00000000
twoBytes=0000000000 eightBytes=0000000000
twoBytes=000000000000 eightBytes=000000000000
twoBytes=00000000000000 eightBytes=00000000000000
twoBytes=0000000000000000 eightBytes=0000000000000000
twoBytes=000000000000000000 eightBytes=000000000000000000
twoBytes=00000000000000000000 eightBytes=00000000000000000000
twoBytes=0000000000000000000000 eightBytes=0000000000000000000000
twoBytes=000000000000000000000000 eightBytes=000000000000000000000000
twoBytes=00000000000000000000000000 eightBytes=00000000000000000000000000
twoBytes=0000000000000000000000000000 eightBytes=0000000000000000000000000000
twoBytes=000000000000000000000000000000 eightBytes=000000000000000000000000000000
twoBytes=00000000000000000000000000000000 eightBytes=00000000000000000000000000000000
twoBytes=0000000000000000000000000000000000 eightBytes=0000000000000000000000000000000000
twoBytes=000000000000000000000000000000000000 eightBytes=000000000000000000000000000000000000
twoBytes=00000000000000000000000000000000000000 eightBytes=00000000000000000000000000000000000000
I want something like "twoBytes=55AA", and eightBytes="55AA454649205041"
My code:
// Read file, byte at the time (example 00, 5A)
FileStream fs = new FileStream("C:\\Users\\user\\image_files\\usb_guid_exfat.001", FileMode.Open);
int hexIn;
String hex;
String twoBytes = "";
String eightBytes = "";
for (int i = 0; (hexIn = fs.ReadByte()) != -1; i++)
{
hex = string.Format("{0:X2}", hexIn);
Console.WriteLine("twoBytes=" + twoBytes + " eightBytes=" + eightBytes);
// Transfer two bytes
twoBytes = twoBytes + hex;
if (twoBytes.Length < 4)
{
if (twoBytes.Length > 6) {
twoBytes = twoBytes.Substring(2, 4);
}
}
// Transfer eight bytes
eightBytes = eightBytes + hex;
if(eightBytes.Length < 8)
{
if (twoBytes.Length > 10) {
eightBytes = eightBytes.Substring(2, 8);
}
}
}

Your if statements are wrong. A value can't be less than 4 and greater than 6 at the same time.
If length is <=4, you have 1 or 2 bytes, so you need to inspect only if length is grater than 4 (6,8,etc). Otherwise the value stays the same.
The code for inspecting string bigger than 4:
twoBytes = twoBytes + hex;
if (twoBytes.Length > 4) {
twoBytes = twoBytes.Substring(twoBytes.Length-4, 4);
}
The similar with eightBytes.
Good luck!! :)

Decode cyrillic quoted-printable content

I'm using this sample for getting mail from server. Problem is that response contains cyrillic symbols I cannot decode.
Here is a header:
Content-type: text/html; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
And receive response function:
static void receiveResponse(string command)
{
try
{
if (command != "")
{
if (tcpc.Connected)
{
dummy = Encoding.ASCII.GetBytes(command);
ssl.Write(dummy, 0, dummy.Length);
}
else
{
throw new ApplicationException("TCP CONNECTION DISCONNECTED");
}
}
ssl.Flush();
byte[] bigBuffer = new byte[1024*16];
int bites = ssl.Read(bigBuffer, 0, bigBuffer.Length);
byte[] buffer = new byte[bites];
Array.Copy(bigBuffer, 0, buffer, 0, bites);
sb.Append(Encoding.ASCII.GetString(buffer));
string result = sb.ToString();
// here is an unsuccessful attempt at decoding
result = Regex.Replace(result, #"=([0-9a-fA-F]{2})",
m => m.Groups[1].Success
? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
: "");
byte[] bytes = Encoding.Default.GetBytes(result);
result = Encoding.GetEncoding("koi8r").GetString(bytes);
}
catch (Exception ex)
{
throw new ApplicationException(ex.ToString());
}
}
How to decode stream correctly? In result string I got <p>=F0=D2=C9=D7=C5=D4 =D1 =F7=C1=CE=D1</p> instead of <p>Привет я Ваня</p>.

As #Max pointed out, you will need to decode the content using the encoding algorithm declared in the Content-Transfer-Encoding header.
In your case, it is the quoted-printable encoding.
You will need to decode the text of the message into an array of bytes and then you’ll need to convert that array of bytes into a string using the appropriate System.Text.Encoding. The name of the encoding to use will typically be specified in the Content-Type header as the charset parameter (in your case, koi8-r).
Since you already have the text as bytes in the buffer variable, simply perform the deciding on that:
byte[] buffer = new byte[bites];
int decodedLength = 0;
for (int i = 0; i < bites; i++) {
if (bigBuffer[i] == (byte) '=') {
if (bites > i + 1) {
// possible hex sequence
byte b1 = bigBuffer[i + 1];
byte b2 = bigBuffer[i + 2];
if (IsXDigit (b1) && IsXDigit (b2)) {
// decode
buffer[decodedLength++] = (ToXDigit (b1) << 4) | ToXDigit (b2);
i += 2;
} else if (b1 == (byte) '\r' && b2 == (byte) '\n') {
// folded line, drop the '=\r\n' sequence
i += 2;
} else {
// error condition, just pass it through
buffer[decodedLength++] = bigBuffer[i];
}
} else {
// truncated? just pass it through
buffer[decodedLength++] = bigBuffer[i];
}
} else {
buffer[decodedLength++] = bigBuffer[i];
}
}
string result = Encoding.GetEncoding ("koi8-r").GetString (buffer, 0, decodedLength);
Custom functions:
static byte ToXDigit (byte c)
{
if (c >= 0x41) {
if (c >= 0x61)
return (byte) (c - (0x61 - 0x0a));
return (byte) (c - (0x41 - 0x0A));
}
return (byte) (c - 0x30);
}
static bool IsXDigit (byte c)
{
return (c >= (byte) 'A' && c <= (byte) 'F') || (c >= (byte) 'a' && c <= (byte) 'f') || (c >= (byte) '0' && c <= (byte) '9');
}
Of course, instead of writing your own hodge podge IMAP library, you could just use MimeKit and MailKit ;-)

Reading multi language text file in c#

I have to read a text file which can contains char from following languages: English, Japanese, Chinese, French, Spanish, German, Italian
My task is to simply read the data and write it to new text file (placing new line char \n after 100 chars).
I cannot use File.ReadAllText and File.ReadAllLines as file size can be more than 500 MB. So I have written following code:
using (var streamReader = new StreamReader(inputFilePath, Encoding.ASCII))
{
using (var streamWriter = new StreamWriter(outputFilePath,false))
{
char[] bytes = new char[100];
while (streamReader.Read(bytes, 0, 100) > 0)
{
var data = new string(bytes);
streamWriter.WriteLine(data);
}
MessageBox.Show("Compleated");
}
}
Other than ASCII encoding I have tried UTF-7, UTF-8, UTF-32 and IBM500. But no luck in reading and writing multi language characters.
Please help me to achieve this.

You will have to take a look at the first 4 bytes of the file you are parsing.
these bytes will give you a hint on what encoding you have to use.
Here is a helper method I have written to do the task:
public static string GetStringFromEncodedBytes(this byte[] bytes) {
var encoding = Encoding.Default;
var skipBytes = 0;
if (bytes[0] == 0x2b && bytes[1] == 0x2f && bytes[2] == 0x76) {
encoding = Encoding.UTF7;
skipBytes = 3;
}
if (bytes[0] == 0xef && bytes[1] == 0xbb && bytes[2] == 0xbf) {
encoding = Encoding.UTF8;
skipBytes = 3;
}
if (bytes[0] == 0xff && bytes[1] == 0xfe) {
encoding = Encoding.Unicode;
skipBytes = 2;
}
if (bytes[0] == 0xfe && bytes[1] == 0xff) {
encoding = Encoding.BigEndianUnicode;
skipBytes = 2;
}
if (bytes[0] == 0 && bytes[1] == 0 && bytes[2] == 0xfe && bytes[3] == 0xff) {
encoding = Encoding.UTF32;
skipBytes = 4;
}
return encoding.GetString(bytes.Skip(skipBytes).ToArray());
}

This is a good enough start to get to the answer. If i is not equal to 100 you need to read more chars. No trouble with french chars like é - they are all handled in C# char class.
char[] soFlow = new char[100];
int posn = 0;
using (StreamReader sr = new StreamReader("a.txt"))
using (StreamWriter sw = new StreamWriter("b.txt", false))
while(sr.EndOfStream == false)
{
try {
int i = sr.Read(soFlow, posn%100, 100);
//if i < 100 need to read again with second char array
posn += 100;
sw.WriteLine(new string(soFlow));
}
catch(Exception e){Console.WriteLine(e.Message);}
}
Spec: Read(Char[], Int32, Int32) Reads a specified maximum of characters from the current stream into a buffer, beginning at the specified index.
Certainly worked for me anyway :)

How to replace extended ASCII characters in C#?

I am trying to replace non-printable characters ie extended ASCII characters from a HUGE string.
foreach (string line in File.ReadLines(txtfileName.Text))
{
MessageBox.Show( Regex.Replace(line,
#"\p{Cc}",
a => string.Format("[{0:X2}]", " ")
)); ;
}
this doesnt seem to be working.
EX:
AAÂAA should be converted to AA AA

Assuming the Encoding to be UTF8 try this:
string strReplacedVal = Encoding.ASCII.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.GetEncoding(
Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(" "),
new DecoderExceptionFallback()
),
Encoding.UTF8.GetBytes(line)
)
);

Since you are opening the file as UTF-8, it must be. So, its code units are one byte and UTF-8 has the very nice feature of encoding characters above ␡ with bytes exclusively above 0x7f and characters at or below ␡ with bytes exclusively at or below 0x7f.
For efficiency, you can rewrite the file in place a few KB at a time.
Note: that some characters might be replaced by more than one space, though.
// Operates on a UTF-8 encoded text file
using (var stream = File.Open(path, FileMode.Open, FileAccess.ReadWrite))
{
const int size = 4096;
var buffer = new byte[size];
int count;
while ((count = stream.Read(buffer, 0, size)) > 0)
{
var changed = false;
for (int i = 0; i < count; i++)
{
// obliterate all bytes that are not encoded characters between ␠ and ␡
if (buffer[i] < ' ' | buffer[i] > '\x7f')
{
buffer[i] = (byte)' ';
changed = true;
}
}
if (changed)
{
stream.Seek(-count, SeekOrigin.Current);
stream.Write(buffer, 0, count);
}
}
}

C# Base64 encoding / decoding fails when using custom encrypted password

Im currently writing a program that is encrypting a password (using a custom method), and then encoding the password to Base64 using the To/FromBase64Transform classes. The problem is, when i encode my encrypted password, I am unable to decode it back to its proper encrypted state. The Base64Helper class is just a wrapper for the To/FromBase64Transform classes.
My Test Code:
static void Main(string[] args)
{
bool Worked = false;
string Password = "testing";
Console.WriteLine("Password: " + Password);
// == Encode then decode 64 test. DecPass64 should equal password == //
// Encodes to Base64 using ToBase64Transform
string EncPass64 = Base64Helper.EncodeString(Password);
// Decodes a Base64 string using FromBase64Transform
string DecPass64 = Base64Helper.DecodeString(EncPass64);
// Test if base 64 ecoding / decoding works
Worked = (Password == DecPass64);
Console.WriteLine();
Console.WriteLine("Base64 Pass Encoded: " + EncPass64);
Console.WriteLine("Base64 Pass Decoded: " + DecPass64);
Console.WriteLine("Base64 Encode to Base64 Decode Worked? : " + Worked); // True
// gspassenc uses XOR to switch passwords back and forth between encrypted and decrypted
string GsEncodedPass = gspassenc(Password);
string GsDecodedPass = gspassenc(GsEncodedPass);
Worked = (Password == GsDecodedPass);
// GsDecodedPass should equal the original Password
Console.WriteLine();
Console.WriteLine("GsPass Encoded: " + GsEncodedPass);
Console.WriteLine("GsPass Decoded: " + GsDecodedPass);
Console.WriteLine("GsEncode to GsDecode Worked? : " + Worked); // True
// Bas64 encode the encrypted password. Then decode the base64. B64_GsDecodedPass should equal
// the GsEncoded Password... But it doesn't for some reason!
string B64_GsEncodedPass = Base64Helper.EncodeString(GsEncodedPass);
string B64_GsDecodedPass = Base64Helper.DecodeString(B64_GsEncodedPass);
Worked = (B64_GsDecodedPass == GsEncodedPass);
// Print results
Console.WriteLine();
Console.WriteLine("Base64 Encoded GsPass: " + B64_GsEncodedPass);
Console.WriteLine("Base64 Decoded GsPass: " + B64_GsDecodedPass);
Console.WriteLine("Decoded == GS Encoded Pass? : " + Worked); // False
// Stop console from closing till we say so
Console.Read();
}
private static int gslame(int num)
{
int c = (num >> 16) & 0xffff;
int a = num & 0xffff;
c *= 0x41a7;
a *= 0x41a7;
a += ((c & 0x7fff) << 16);
if (a < 0)
{
a &= 0x7fffffff;
a++;
}
a += (c >> 15);
if (a < 0)
{
a &= 0x7fffffff;
a++;
}
return a;
}
private static string gspassenc(string pass)
{
int a = 0;
int num = 0x79707367; // gspy
int len = pass.Length;
char[] newPass = new char[len];
for (int i = 0; i < len; ++i)
{
num = gslame(num);
a = num % 0xFF;
newPass[i] = (char)(pass[i] ^ a);
}
return new String(newPass);
}
And the result is:
Any help will be much appreciated!
UPDATE: Here is my Base64Helper Class:
class Base64Helper
{
public static string DecodeString(string encoded)
{
return Encoding.ASCII.GetString(Convert.FromBase64String(encoded));
}
public static string EncodeString(string decoded)
{
return Convert.ToBase64String(Encoding.ASCII.GetBytes(decoded));
}
}

It's because of the way you are interfering with the Unicode "Chars" of the string with the encoding algorithm and then constructing a String using those "Chars" which then might not form a valid Unicode stream.
When converting from your String to a Byte array and back again, you need to decide which encoding to use....and you can't arbitrarily change the byte stream (via your encryption routine) and expect it to produce a valid string when being converted back.
I've modified your code to show some string to byte[] conversion steps...you can adjust these depending on your need.
static void Main(string[] args)
{
bool Worked = false;
string Password = "testing";
Console.WriteLine("Password: " + Password);
// == Encode then decode 64 test. DecPass64 should equal password == //
// Encodes to Base64 using ToBase64Transform
string EncPass64 = Base64Helper.EncodeString(Password);
// Decodes a Base64 string using FromBase64Transform
string DecPass64 = Base64Helper.DecodeString(EncPass64);
// Test if base 64 ecoding / decoding works
Worked = (Password == DecPass64);
Console.WriteLine();
Console.WriteLine("Base64 Pass Encoded: " + EncPass64);
Console.WriteLine("Base64 Pass Decoded: " + DecPass64);
Console.WriteLine("Base64 Encode to Base64 Decode Worked? : " + Worked); // True
// gspassenc uses XOR to switch passwords back and forth between encrypted and decrypted
byte [] passwordbytes = Encoding.UTF8.GetBytes(Password);
byte [] bytes_GsEncodedPass = gspassenc(passwordbytes);
string GsEncodedPass = Encoding.UTF8.GetString(bytes_GsEncodedPass);
byte[] bytes_GsDecodedPass = gspassenc(bytes_GsEncodedPass);
string GsDecodedPass = Encoding.UTF8.GetString(bytes_GsDecodedPass);
Worked = (Password == GsDecodedPass);
// GsDecodedPass should equal the original Password
Console.WriteLine();
Console.WriteLine("GsPass Encoded: " + GsEncodedPass);
Console.WriteLine("GsPass Decoded: " + GsDecodedPass);
Console.WriteLine("GsEncode to GsDecode Worked? : " + Worked); // True
// Bas64 encode the encrypted password. Then decode the base64. B64_GsDecodedPass should equal
// the GsEncoded Password... But it doesn't for some reason!
string B64_GsEncodedPass = Convert.ToBase64String(bytes_GsEncodedPass);
byte []bytes_B64_GsDecodedPass = Convert.FromBase64String(B64_GsEncodedPass);
string B64_GsDecodedPass = Encoding.UTF8.GetString(bytes_B64_GsDecodedPass);
Worked = (B64_GsDecodedPass == GsEncodedPass);
// Print results
Console.WriteLine();
Console.WriteLine("Base64 Encoded GsPass: " + B64_GsEncodedPass);
Console.WriteLine("Base64 Decoded GsPass: " + B64_GsDecodedPass);
Console.WriteLine("Decoded == GS Encoded Pass? : " + Worked); // False
// Stop console from closing till we say so
Console.Read();
}
private static int gslame(int num)
{
int c = (num >> 16) & 0xffff;
int a = num & 0xffff;
c *= 0x41a7;
a *= 0x41a7;
a += ((c & 0x7fff) << 16);
if (a < 0)
{
a &= 0x7fffffff;
a++;
}
a += (c >> 15);
if (a < 0)
{
a &= 0x7fffffff;
a++;
}
return a;
}
private static byte[] gspassenc(byte [] pass)
{
int a = 0;
int num = 0x79707367; // gspy
int len = pass.Length;
byte[] newPass = new byte[len];
for (int i = 0; i < len; ++i)
{
num = gslame(num);
a = num % 0xFF;
newPass[i] = (byte)(pass[i] ^ a);
}
return newPass;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Unicode character is written with wrong byteorder - c#

Related

Keep the last eight bytes when reading raw image file with C#

Decode cyrillic quoted-printable content

Reading multi language text file in c#

How to replace extended ASCII characters in C#?

C# Base64 encoding / decoding fails when using custom encrypted password

Categories

Resources