C# Encoding.CreateTranscodingStream does not output file with desired target encoding

C# Encoding.CreateTranscodingStream does not output file with desired target encoding - c#

I have the following piece of code:
using (Stream inputFileStream = File.OpenRead("C:\\Users\\User\\Downloads\\test.txt"))
{
using (Stream transcodingStream = Encoding.CreateTranscodingStream(inputFileStream, Encoding.GetEncoding(500), new UnicodeEncoding(bigEndian: true, byteOrderMark: true)))
{
using (Stream outputStream = File.OpenWrite("C:\\Users\\User\\Downloads\\test.txt"))
{
await transcodingStream.CopyToAsync(outputStream, cancellationToken);
}
}
}
My file before transcoding has the following first 16 bytes and is of Ebcdic type encoding (code page 500):
F5 F1 F1 F0 F2 C2 D4 E6 40 40 40 40 40 40 F1 F1 = 51102BMW 11
After performing the transcoding to Unicode with Big-Endiant and Byte Order Markings, I expect the file to begin with:
FF FE
However, I get:
00 35 00 31 00 31 00 30 00 32 00 42 00 4D 00 57 = �5�1�1�0�2�B�M�W
Where am I going wrong with this?

It seems like the transcoding stream does not care about maintaining the BOM for target encoding and it's something you have to manage yourself.
I've implemented the following solution:
targetEncoding is of type Encoding
outputStream.Seek(0, SeekOrigin.Begin);
await outputStream.WriteAsync(targetEncoding.Preamble.ToArray(), 0, targetEncoding.Preamble.Length, cancellationToken);

Related

UTF8 conversion of C# is not same as Java

I have to develop an application that connect to Java server listening over a port, third party gave me the snippet of Java code to send and receive data over the port .
String request = "request";
try (Socket soc = new Socket("localhost", 1234)) {
DataOutputStream output = new DataOutputStream(soc.getOutputStream());
output.writeUTF(request);
output.flush();
DataInputStream input = new DataInputStream(soc.getInputStream());
String response = input.readUTF(); }
I have to write or convert same in C# so as to use by other users and as per company needs. I wrote a code in C# as follows, but it does differently while it encodes the request-
string Request = "request";
byte[] data = new byte[1024];
IPAddress ipAdress = IPAddress.Parse("127.0.0.1");
IPEndPoint ipEndpoint = new IPEndPoint(ipAdress, 1234);
Socket client = new Socket(ipAdress.AddressFamily, SocketType.Stream, ProtocolType.Tcp);
try
{
client.Connect(ipEndpoint);
byte[] sendmsg = Encoding.UTF8.GetBytes(ComRequest);
int n = client.Send(sendmsg);}
Bytes written over port by C# code is different than the Java code.
Is there any equivalent to writeUTF and readUTF function of Java in C#?
Third party also gave me the function of C++ which performs the same operation of string to UTF8 conversion, but I am bad in C++, don't know what c_str() function does?
C++ code-
void convertStringToModifiedUTF8(string content, char* outputBuffer){
strcpy(outputBuffer+2, content.c_str());
outputBuffer[0]= content.length() & 0xFF00;
outputBuffer[1]= content.length() & 0x00FF;
}
Can anyone help in either way to have Java equivalent function or C# code equivalent to above C++ code?
I am highlighting some of the bytes changes-
Java-
0000 02 00 00 00 45 00 00 f8 4c 4e 40 00 80 06 00 00 ....E...LN#..... 0010 7f 00 00 01 7f 00 00 01 c9 c0 16 e3 2f ac ea cf ............/... 0020 8b 6a 2e d1 50 18 01 00 58 71 00 00 00 ce 7b 22 .j..P...Xq....
C# -
0000 02 00 00 00 45 00 00 **fa** 4c **5f** 40 00 80 06 00 00 ....E...L_#..... 00 10 7f 00 00 01 7f 00 00 01 c9 dd 16 e3 81 e5 cd 90 ................ 0020 87 47 fb 46 50 18 01 00 76 64 00 00 7b 20 22 6a .G.FP...vd..

Programmatically get user that is locking an excel workbook

I am using C# framework 4.5, netoffice 1.6 and sharpdevelop 4.4.1 to manipulate an excel workbook, located on a network share, from within Outlook.
At some point I need to change the file access of the workbook object (ewb) to readwrite like so:
ewb.ChangeFileAccess(Excel.Enums.XlFileAccess.xlReadWrite, System.Reflection.Missing.Value, true);
Before I change the file access, I check if the file is locked on the server. If the file is locked, I will notify the user to retry the action at a later point.
Now, I want to include the username that is locking the excel file in the notification. I have searched msdn, netoffice forum, etcetera... and have not found a solution.
I know that, if you open the excel file readwrite, it will store the user's name in the xlsx file. How can I access that particular piece of information through c#?
EDIT:
I ended up doing this:
public string GetExcelFileOwner(string path, NetOffice.ExcelApi.Enums.XlFileFormat ffmt) {
string tempmark = "~$";
if(ffmt==NetOffice.ExcelApi.Enums.XlFileFormat.xlExcel8) {
tempmark = "";
}
string uspath = Path.Combine(Path.GetDirectoryName(path), tempmark + Path.GetFileName(path));
if (!File.Exists(uspath)) return "";
var sharing = FileShare.ReadWrite | FileShare.Delete;
using (var fs = new FileStream(uspath, FileMode.Open, FileAccess.Read, sharing))
using (var br = new BinaryReader(fs, Encoding.Default)) {
if(ffmt==NetOffice.ExcelApi.Enums.XlFileFormat.xlExcel8) {
byte[] ByteBuffer = new byte[500];
br.BaseStream.Seek(150, SeekOrigin.Begin);
br.Read(ByteBuffer, 0, 500);
return matchRegex(System.Text.Encoding.UTF8.GetString(ByteBuffer), #"(?=\w\w\w)([\w, ]+)").Trim();
}
else {
return br.ReadString();
}
}
}
private static string matchRegex(string txt, string rgx) {
Regex r;
Match m;
try {
r = new Regex(rgx, RegexOptions.IgnoreCase);
m = r.Match(txt);
if (m.Success) {
return m.Groups[1].Value.ToString();
}
else {
return "";
}
}
catch {
return "";
}
}
We are using excel 2003 and excel 2007+ file format (.xls and .xlsx). For .xls I had to look in the .xls file itself. For .xlsx, the locking user is stored in the ~$ temp file.
I know, for the .xls file, it is dirty code, but I have no clue of how the .xls file format is structured. Therefore, i just read a bunch of bytes which includes the ascii username and do a regex to extract that username.

it will store the user's name in the xlsx file
No, not the in .xlsx file. Excel creates another file to store the user name. It has the Hidden file attribute turned on so you cannot normally see it with Explorer.
It normally has the same name as the original file, but prefixed with ~$. So for a file named test.xlsx you'll get a file named ~$test.xlsx. It is a binary file and contains the user name both encoded in the default code page and in utf-16. A hex dump to show what it looks like:
0000000000: 0C 48 61 6E 73 20 50 61 │ 73 73 61 6E 74 20 20 20 ♀Hans Passant
0000000010: 20 20 20 20 20 20 20 20 │ 20 20 20 20 20 20 20 20
0000000020: 20 20 20 20 20 20 20 20 │ 20 20 20 20 20 20 20 20
0000000030: 20 20 20 20 20 20 20 0C │ 00 48 00 61 00 6E 00 73 ♀ H a n s
0000000040: 00 20 00 50 00 61 00 73 │ 00 73 00 61 00 6E 00 74 P a s s a n t
0000000050: 00 20 00 20 00 20 00 20 │ 00 20 00 20 00 20 00 20
0000000060: 00 20 00 20 00 20 00 20 │ 00 20 00 20 00 20 00 20
0000000070: 00 20 00 20 00 20 00 20 │ 00 20 00 20 00 20 00 20
0000000080: 00 20 00 20 00 20 00 20 │ 00 20 00 20 00 20 00 20
0000000090: 00 20 00 20 00 20 00 20 │ 00 20 00 20 00 20 00 20
00000000A0: 00 20 00 20 00 │
The oddish 0x0C word in the file is the string length in characters (not bytes), followed by 54 characters to store the user name, padded with spaces. Easiest way to read it is with BinaryReader.ReadString():
public static string GetExcelFileOwner(string path) {
string uspath = Path.Combine(Path.GetDirectoryName(path), "~$" + Path.GetFileName(path));
if (!File.Exists(uspath)) return "";
var sharing = FileShare.ReadWrite | FileShare.Delete;
using (var fs = new FileStream(uspath, FileMode.Open, FileAccess.Read, sharing))
using (var br = new BinaryReader(fs, Encoding.Default)) {
return br.ReadString();
}
}
But not necessarily the most correct way, you might want to improve the code and try to locate the utf-16 string (not with ReadString) if 8-bit encodings don't work well in your locale. Seek() to offset 0x37 first. Be sure to use the method correctly, it has an implicit race condition so make sure you only use it after the operation failed and expect an empty string return anyway. I cannot guarantee this method will work correctly on all Excel versions, including future ones, I only tested for Office 2013 on a workstation class machine.

Which part are you having problems with?
WIthout knowing anything about xslx, I can only guess that: you want to open file and specify FileAccess.Read & FileShare.ReadWrite as here: How to read open excel file at C# after that, you use some kind of library to turn XSLX into DataTable, and extract the specific row which you need.

string to byte[] without encoding or changing actual bytes at string

assume i got the following byte[]
0C 00 21 08 01 00 00 00 86 1B 06 00 54 51 53 65 72 76 65 72
with bitconverter BitConverter.ToString i can convert it to
0C-00-21-08-01-00-00-00-86-1B-06-00-54-51-53-65-72-76-65-72
how do i convert it back from string to byte[] to get
0C 00 21 08 01 00 00 00 86 1B 06 00 54 51 53 65 72 76 65 72
ascii encoding and other methods always getting me the equivalent bytes to the string but what i really need is the string to be byte[] as it is, i know if i did a reversing operation (using getbytes then tostring) ill end up with the same string but what i care about is while at getbytes to get the exact bytes
as i said
to put
0C-00-21-08-01-00-00-00-86-1B-06-00-54-51-53-65-72-76-65-72
AS string
and get
0C 00 21 08 01 00 00 00 86 1B 06 00 54 51 53 65 72 76 65 72
As byte[]
thanks in advance

You need this
byte[] bytes = str.Split('-').Select(s => Convert.ToByte(s, 16)).ToArray();

You can use SoapHexBinary class in System.Runtime.Remoting.Metadata.W3cXsd2001 namespace
string s = "0C-00-21-08-01-00-00-00-86-1B-06-00-54-51-53-65-72-76-65-72";
byte[] buf = SoapHexBinary.Parse(s.Replace("-"," ")).Value;

Remenber that BitConverter.ToString returns an equivalent hexadecimal string representation,so
if you decide to stick with it converting back as follow:
string temp = BitConverter.ToString(buf);//buf is your array.
byte[] newbuf = temp.Split('-').Select(s => Convert.ToByte(s,16)).ToArray();
But the safest way to convert bytes to string and back is base64:
string str = Convert.ToBase64String(buf);
byte[] result = Convert.FromBase64String(str);

Bitmap.Save to save an icon actually saves a .png

I need to write a program that will generate 108 combinaisons of icons (standard windows .ico files) based on a tileset image.
I use the class System.Drawing.Bitmap to build each combinaison, and I save them like this:
Bitmap IconBitmap = new Bitmap(16, 16);
// Some processing, writing different parts of the source tileset
// ...
IconBitmap.Save(Path.Combine(TargetPath, "Icon" + Counter + ".ico"),
ImageFormat.Icon);
But I found out that the file saved is actually a PNG. Neither Windows Explorer nor Visual Studio can display it correctly, but GIMP can, and if I open it in an Hex viewer, here is what i see:
00000000 89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 ‰PNG........IHDR
00000010 00 00 00 10 00 00 00 10 08 06 00 00 00 1F F3 FF ..............óÿ
00000020 61 00 00 00 01 73 52 47 42 00 AE CE 1C E9 00 00 a....sRGB.®Î.é..
00000030 00 04 67 41 4D 41 00 00 B1 8F 0B FC 61 05 00 00 ..gAMA..±..üa...
00000040 00 09 70 48 59 73 00 00 0E C3 00 00 0E C3 01 C7 ..pHYs...Ã...Ã.Ç
00000050 6F A8 64 00 00 00 15 49 44 41 54 38 4F 63 60 18 o¨d....IDAT8Oc`.
00000060 05 A3 21 30 1A 02 A3 21 00 09 01 00 04 10 00 01 .£!0..£!........
00000070 72 A5 13 76 00 00 00 00 49 45 4E 44 AE 42 60 82 r¥.v....IEND®B`‚
Also if I rename the .ico to .png Windows Explorer can display it properly.
I have this result even if I do NOTHING on the bitmap (I construct it with new and Save it directly, that gives me a black png).
What am I doing wrong?
I also tried this, which gave me awful 16 color icons, but I would prefer to avoid this solution anyway (using handles) :
Icon NewIcon = Icon.FromHandle(IconBitmap.GetHicon());
FileStream FS = new FileStream(Path.Combine(Target, "Icon" + Counter + ".ico"),
FileMode.Create);
NewIcon.Save(FS);

I made a quick-and-dirty workaround myself, I post it here for the record (it might help someone that need a quick solution, like me).
I won't accept this as the correct answer, it's not an actual icon writer.
It just write a 32bits ARGB bitmap into an ico file, using PNG format (works on Vista or later)
It is based on the ICO file format article from Wikipedia, and some fails and retry.
void SaveAsIcon(Bitmap SourceBitmap, string FilePath)
{
FileStream FS = new FileStream(FilePath, FileMode.Create);
// ICO header
FS.WriteByte(0); FS.WriteByte(0);
FS.WriteByte(1); FS.WriteByte(0);
FS.WriteByte(1); FS.WriteByte(0);
// Image size
FS.WriteByte((byte)SourceBitmap.Width);
FS.WriteByte((byte)SourceBitmap.Height);
// Palette
FS.WriteByte(0);
// Reserved
FS.WriteByte(0);
// Number of color planes
FS.WriteByte(0); FS.WriteByte(0);
// Bits per pixel
FS.WriteByte(32); FS.WriteByte(0);
// Data size, will be written after the data
FS.WriteByte(0);
FS.WriteByte(0);
FS.WriteByte(0);
FS.WriteByte(0);
// Offset to image data, fixed at 22
FS.WriteByte(22);
FS.WriteByte(0);
FS.WriteByte(0);
FS.WriteByte(0);
// Writing actual data
SourceBitmap.Save(FS, ImageFormat.Png);
// Getting data length (file length minus header)
long Len = FS.Length - 22;
// Write it in the correct place
FS.Seek(14, SeekOrigin.Begin);
FS.WriteByte((byte)Len);
FS.WriteByte((byte)(Len >> 8));
FS.Close();
}

Here's a simple ICO file writer I wrote today that outputs multiple System.Drawing.Image images to a file.
// https://en.wikipedia.org/wiki/ICO_(file_format)
public static class IconWriter
{
public static void Write(Stream stream, IReadOnlyList<Image> images)
{
if (images.Any(image => image.Width > 256 || image.Height > 256))
throw new ArgumentException("Image cannot have height or width greater than 256px.", "images");
//
// ICONDIR structure
//
WriteInt16(stream, 0); // reserved
WriteInt16(stream, 1); // image type (icon)
WriteInt16(stream, (short) images.Count); // number of images
var encodedImages = images.Select(image => new
{
image.Width,
image.Height,
Bytes = EncodeImagePng(image)
}).ToList();
//
// ICONDIRENTRY structure
//
const int iconDirSize = 6;
const int iconDirEntrySize = 16;
var offset = iconDirSize + (images.Count*iconDirEntrySize);
foreach (var image in encodedImages)
{
stream.WriteByte((byte) image.Width);
stream.WriteByte((byte) image.Height);
stream.WriteByte(0); // no pallete
stream.WriteByte(0); // reserved
WriteInt16(stream, 0); // no color planes
WriteInt16(stream, 32); // 32 bpp
// image data length
WriteInt32(stream, image.Bytes.Length);
// image data offset
WriteInt32(stream, offset);
offset += image.Bytes.Length;
}
//
// Image data
//
foreach (var image in encodedImages)
stream.Write(image.Bytes, 0, image.Bytes.Length);
}
private static byte[] EncodeImagePng(Image image)
{
var stream = new MemoryStream();
image.Save(stream, ImageFormat.Png);
return stream.ToArray();
}
private static void WriteInt16(Stream stream, short s)
{
stream.WriteByte((byte) s);
stream.WriteByte((byte) (s >> 8));
}
private static void WriteInt32(Stream stream, int i)
{
stream.WriteByte((byte) i);
stream.WriteByte((byte) (i >> 8));
stream.WriteByte((byte) (i >> 16));
stream.WriteByte((byte) (i >> 24));
}
}

It's true that the ImageFormat.Icon does not work for writing as you'd suppose, .NET simply does not support writing .ico files and simply dumps the PNG data.
There are a few projects on CodeProject (and this one) (and another one) and that let you write an .ico file, it's actually not that hard. The file format is pretty straight-forward, and supports BMP and PNG data.

Object-to-bytes conversion

When I'm trying to convert an object into byte array I'm getting a wierd array.
this is the code:
using (MemoryStream ms = new MemoryStream())
{
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(ms, obj);
Console.WriteLine(ByteArrayToString(ms.ToArray()));
}
//int obj = 50;
//string ByteArrayToString(byte[] byteArr) the functionality of this method is pretty obvious
the result is this:
"00 01 00 00 00 FF FF FF FF 01 00 00 00 00 00 00 00 04 01 00 00 00 0C 53 79 73 74 65 6D 2E 49 6E 74 33 32 01 00 00 00 07 6D 5F 76 61 6C 75 65 00 08 32 00 00 00 0B "
Can somebody explain to me WHY?:) the optimal result should be only "32 00 00 00".

Since serializer needs to provide enough information to deserialize the data back, it must include some metadata about the object being serialized. Specifically, the
53 79 73 74 65 6D 2E 49 6E 74 33 32
part stands for System.Int32
If you use BinaryWriter and its Write(Int32) method instead, you'll get the desired effect: your memory stream will contain just the four bytes from your integer. You wouldn't be able to deserialize it without knowing that you wrote an Int32 into the stream.

You're conflating BinaryFormatter serialization with an object's in memory format. What is written to the stream is merely an implementation detail of the BinaryFormatter and should not be relied upon for any interprocess communication not using BinaryFormatter.
If you're looking for the byte representation of the built-in types, use BitConverter.GetBytes (for strings use the appropriate Encoding.GetBytes).

The serialized byte array has both the data itself and the type info. That's why you get more info than you expect. That's neccessary for later deserializing.

The extra stuff in the results would be the BinaryFormatter object. You're not just outputting int obj = 50, you're outputting everything included in the BinaryFormatter as well.

Serialization process uses extra bytes to store information about types - it's the only way to ensure that serialized data will be deserialized into same objects of same types.
If you absolutely sure in what you doing and want to avoid any extra bytes, you may use your own serialization and make your formatter and serializers, which is very complicated. Or, you could use marshalling:
var size = Marshal.SizeOf(your_object);
// Both managed and unmanaged buffers required.
var bytes = new byte[size];
var ptr = Marshal.AllocHGlobal(size);
// Copy object byte-to-byte to unmanaged memory.
Marshal.StructureToPtr(font.LogFont, ptr, false);
// Copy data from unmanaged memory to managed buffer.
Marshal.Copy(ptr, bytes, 0, size);
// Release unmanaged memory.
Marshal.FreeHGlobal(ptr);
And to convert bytes to object:
var bytes = new byte[size];
var ptr = Marshal.AllocHGlobal(size);
Marshal.Copy(bytes, 0, ptr, size);
var your_object = (YourType)Marshal.PtrToStructure(ptr, typeof(YourType));
Marshal.FreeHGlobal(ptr);
This is quite slow and unsafe to use in most cases, but it's easiest way to strictly convert object to byte[] without implementing serialization and without [Serializable] attribute.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Encoding.CreateTranscodingStream does not output file with desired target encoding - c#

Related

UTF8 conversion of C# is not same as Java

Programmatically get user that is locking an excel workbook

string to byte[] without encoding or changing actual bytes at string

Bitmap.Save to save an icon actually saves a .png

Object-to-bytes conversion

Categories

Resources