Is there C# `Encoding.UTF8.GetString` equivalent in C++? - c#

Is there C# Encoding.UTF8.GetString equivalent in C++ ? Or another fast way parse byte array containing the sequence of bytes and decode to string.

You can try this:
auto *wstrBytes = new wchar_t[size];
memcpy_s(wstrBytes , size, rawBytes, size);
std::wstring unicodeStr(wstrBytes , size);
delete [] wstrBytes;

Related

Encode unicode string as byte array C++ and C#

I have C++ code which I want to rewrite to C#. This part
case ID_TYPE_UNICODE_STRING :
if(items[i].GetUString().length() > 0xFFFF)
throw dppError("error");
//GetUstring returns std::wstring type object
DataSize = (WORD) (sizeof(WCHAR)*(items[i].GetUString().length()));
blob.AppendData((const BYTE *) &DataSize, sizeof(WORD)); //blob is byte array
//GetUstring returns std::wstring type object
blob.AppendData((const BYTE *) items[i].GetUString().c_str(), DataSize);
break ;
basically serializes length in bytes of unicode string and string itself to byte array.
Here comes my problem (this code then sends this data to server). I don't know which encoding is used in above lines of code(UTF16, UTF8, etc.).
So I don't know what is the best way to reimplement it in C#.
How can I guess what encoding is used in this C++ project?
And if I can't find encoding used in C++ project, given endianness is same as stated in accepted answer of this question, do you think the two methods (GetBytes and GetString) in accepted answer will work for me (for serializing the unicode string as in C++ project and retrieving it back)? e.g.
these two:
static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
static string GetString(byte[] bytes)
{
char[] chars = new char[bytes.Length / sizeof(char)];
System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
return new string(chars);
}
Or I am better of to learn what is the encoding used in C++ project?
I will then need to reconstruct the string in the same way from byte array too. And if I am better of learning which encoding was used in C++, how do I get the length of the string in bytes in C#, using System.Text.ASCII.WhateverEncodingWasUsedinC++.GetByteCount(string); ??
PS. Do you think the C++ code is working in encoding agnostic way? If yes, how can I repeat that also in C#?
UPDATE: I am guessing the encoding used is UTF16 because I saw that being mentioned in several variables names, so I think I will assume UTF16 is used, and if something doesn't work out during testing, look for alternative solutions. In that case, what is the best way to get the number of bytes of the UTF16 string? Is following method OK: System.Text.ASCII.Unicode.GetByteCount(string); ??
feedback and comments welcome. Am I wrong somewhere in my reasoning? Thanks
Change the method signature as like this for getting byte[] equivalent of input string.
static byte[] GetBytes(string str)
{
UnicodeEncoding uEncoding = new UnicodeEncoding();
byte[] stringContentBytes = uEncoding.GetBytes("Your string");
return stringContentBytes;
}
For reverse:
static string GetString(byte[] bytes)
{
UnicodeEncoding uEncoding = new UnicodeEncoding();
string stringContent=uEncoding.GetString(bytes);
return new string(stringContent);
}

How to get back the data after Marshal.Copy(byte,0,ptr,len)?

we have used the foll. code to marshal the byte array - ie:copy to unmanaged memory space;
Marshal.Copy(byte,0,ptr,len)?
How do I put the data back into a byte array in another program?
Pls advice if my approach is correct :-
string aString = "some text";
byte[] theBytes = System.Text.Encoding.Default.GetBytes(aString);
// Marshal the managed struct to a native block of memory.
int myByteSize = theBytes.Length;
IntPtr pmyByte = Marshal.AllocHGlobal(myByteSize ); //this is pointer
try
{
Marshal.Copy(theBytes, 0, pmyByte , myByteSize );
.............
Following this, I would like to retrieve the data with in this unmanaged memory into a string variable, how do I achieve that?
In VB6 I am doing it using (may be helpful for someone who wants to pass data from c#.net to vb6 app):-
Call CopyMemory(buf(1), ByVal cds.lpData, cds.cbData)
a$ = StrConv(buf, vbUnicode)
a$ = Left$(a$, InStr(1, a$, Chr$(0)) - 1)
Form1.Print a$
How do I pick up the marshaled data in C#.NET?
This code assumes, that you know the unmanaged data length (someArraySize) and character encoding:
// create new managed array
var array = new byte[someArraySize];
// copy data from unmanaged memory, pointed by ptr, into managed array
Marshal.Copy(ptr, array, 0, someArraySize);
// convert array to string; this assumes, that array contains string in UTF-8 encoding
var s = Encoding.UTF8.GetString(array);

PInvoke char* in C DLL handled as String in C#. Issue with null characters

The function in C DLL looks like this:
int my_Funct(char* input, char* output);
I must call this from C# app. I do this in the following way:
...DllImport stuff...
public static extern int my_Funct(string input, string output);
The input string is perfectly transmitted to the DLL (I have visible proof of that). The output that the function fills out although is wrong. I have hexa data in it, like:
3F-D9-00-01
But unfortunately everything that is after the two zeros is cut, and only the first two bytes come to my C# app. It happens, because (I guess) it treats as null character and takes it as the end of the string.
Any idea how could I get rid of it? I tried to specifiy it as out IntPtr instead of a string, but I don't know what to do with it afterwards.
I tried to do after:
byte[] b1 = new byte[2];
Marshal.Copy(output,b1,0,2);
2 should be normally the length of the byte array. But I get all kind of errors: like "Requested range extends past the end of the array." or "Attempted to read or write protected memory..."
I appreciate any help.
Your marshalling of the output string is incorrect. Using string in the p/invoke declaration is appropriate when passing data from managed to native. But you cannot use that when the data flows in the other direction. Instead you need to use StringBuilder. Like this:
[DllImport(...)]
public static extern int my_Funct(string input, StringBuilder output);
Then allocate the memory for output:
StringBuilder output = new StringBuilder(256);
//256 is the capacity in characters - only you know how large a buffer is needed
And then you can call the function.
int retval = my_Funct(inputStr, output);
string outputStr = output.ToString();
On the other hand, if these parameters have null characters in them then you cannot marshal as string. That's because the marshaller won't marshal anything past the null. Instead you need to marshal it as a byte array.
public static extern int my_Funct(
[In] byte[] input,
[Out] byte[] output
);
That matches your C declaration.
Then assuming the ANSI encoding you convert the input string to a byte array like this:
byte[] input = Encoding.Default.GetBytes(inputString);
If you want to use a different encoding, it's obvious how to do so.
And for the output you do need to allocate the array. Assuming it's the same length as the input you would do this:
byte[] output = new byte[input.Length];
And somehow your C function has got to know the length of the arrays. I'll leave that bit to you!
Then you can call the function
int retval = my_Funct(input, output);
And then to convert the output array back to a C# string you use the Encoding class again.
string outputString = Encoding.Default.GetString(output);

C++ <--> C# modify a marshalled array of bytes

I have an unmanaged C++ function which is calling a managed C# method in a DLL. The purpose of the C# method is to take an array of bytes (allocated by the C++ caller), populate the array, and return it. I can get the array INTO the C# method, but the populated data are lost when they get back to the C++ function. Right now, this is my test code to debug the process:
C# DLL Method:
// Take an array of bytes and modify it
public ushort GetBytesFromBlaster([MarshalAs(UnmanagedType.LPArray)] byte[] dataBytes)
{
dataBytes[0] = (byte)'a';
dataBytes[1] = (byte)'b';
dataBytes[2] = (byte)'c';
return 3;
}
C++ function which calls the DLL:
// bytes[] has been already allocated by its caller
short int SimGetBytesP2P(unsigned char bytes[])
{
unsigned short int numBytes = 0;
bytes[0] = 'x';
bytes[1] = 'y';
bytes[2] = 'z';
// bytes[] are {'x', 'y', 'z'} here
guiPtr->GetBytesFromBlaster(bytes, &numBytes);
// bytes[] SHOULD be {'a', 'b', 'c'} here, but they are still {'x', 'y', 'z'}
return(numBytes);
}
I believe it has something to do with C# turning the C++ pointer into a new managed array, but modifying the original one. I have tried several variations using the "ref" modifyer, etc., but no luck. Also, these data are NOT null-terminated strings; the date bytes are raw 1-byte values, not null-terminated.
Can anyone please shed some light on this? Thanks!
Stuart
You could do the marshaling yourself. Have the C# function accept a parameter by value of type IntPtr. Also a second parameter indicating array length. No special marshaling attributes are needed or wanted.
Then, use Marshal.Copy and copy the array from the unmanaged pointer to a managed byte[] array that you allocated. Do your thing, and then when you're done, use Marshal.Copy to copy it back to the C++ unmanaged array.
These particular overloads should get you started:
http://msdn.microsoft.com/en-us/library/ms146625.aspx
http://msdn.microsoft.com/en-us/library/ms146631.aspx
For example:
public ushort GetBytesFromBlaster(IntPtr dataBytes, int arraySize)
{
byte[] managed = new byte[arraySize];
Marshal.Copy(dataBytes, managed, 0, arraySize);
managed[0] = (byte)'a';
managed[1] = (byte)'b';
managed[2] = (byte)'c';
Marshal.Copy(managed, 0, dataBytes, arraySize);
return 3;
}
Alternatively you could implement a custom marshaller as described in http://msdn.microsoft.com/en-us/library/w22x2hw6.aspx if the default one isn't doing what you need it to. But that looks like more work.
I believe that you just need to add a SizeConst attribute:
public ushort GetBytesFromBlaster(
[MarshalAs(UnmanagedType.LPArray, SizeConst=3)]
byte[] dataBytes
)
and the default marshaller should do the rest for you.

Is unsafe normal in .net C#?

I need do create a new instance of String from the array of sbytes (sbyte[]).
For that I need to convert sbyte[] into sbyte*
It is possible only using unsafe keyword.
is that okay or is there any other ways to create a String from array of sbytes?
First:
How to convert a sbyte[] to byte[] in C#?
sbyte[] signed = { -2, -1, 0, 1, 2 };
byte[] unsigned = (byte[]) (Array)signed;
Then:
string yourstring = UTF8Encoding.UTF8.GetString(unsigned);
Why are you using sbyte?
Encoding.Default.GetString() (and any other encoding) takes a byte[] Array as argument, so you could convert the sbyte[] Array using LINQ if all values are non-negative: array.Cast<byte>().ToArray().

Categories