Equivalent of mono_string_to_utf8() without allocation and data copy

Equivalent of mono_string_to_utf8() without allocation and data copy - c#

I am trying to replace the C code below by something more efficient:
void mstr2str(MonoString *mstr)
{
char *str = mono_string_to_utf8(mstr);
printf("mono string: %s\n", str);
g_free(str);
}
My goal is to avoid the memory allocation and data copy that comes with mono_string_to_utf8() because the string returned by C# to C can be very large.
I have read the suggestions about using Windows C++ COM interfaces but being under Linux I try to use a direct way to address the monostring from C.
The reference below indicates that this is "impossible":
http://www.mono-project.com/Interop_with_Native_Libraries#marshal-step-4
"there is no way to control how the runtime allocates the marshaled memory, or how long it lasts. This is crucial. If the runtime marshals
a string (e.g. UTF-16 to Ansi conversion), the marshaled string will
only last as long as the call. The unmanaged code CANNOT keep a
reference to this memory, as it WILL be freed after the call ends.
Failure to heed this restriction can result in "strange behavior",
including memory access violations and process death".
But the subject is later addressed with Marshalling, inPtr, and SafeHandles in the same document (without ready-to-use examples while the reference blogs are from 2004/2005).
Any more up-to-date documentation or code examples available?
Thank you for the solution Lupus. For those who, like me, don't know C#, these calls are macros defined in a file called object.h and therefore they are reported as unresolved symbols if you load the runtime dynamically. Here they are:
#define mono_string_chars(s) ((gunichar2*)(s)->chars)
#define mono_string_length(s) ((s)->length)
It seems that C# uses pointers finally.

That documentation is about marshaling on P/Invoke, which is a different topic, unrelated to the C code you're showing.
As for your question: you can access the string characters and do what you want with them by using mono_string_chars() and mono_string_length(). For example you could use iconv() which is available on Linux and OSX, or you can manually convert from UTF16 to UTF8 for maximum control.

Related

AccesViolationException in C++/CLI Wrapper for native C

I am writing an application in C# which uses a C++/CLI Wrapper .dll which again uses another C++/CLI Wrapper for native C code.
Further Explanation:
My C# Application (to which I refer as Reporter) is nothing more than a windows form which calls the first C++/CLI Wrapper (to which I refer to as Control) which contains a UserControl. This UserControl is a GUI in order to call the last .dll (refered to as Generator). I do this because I want to use my Control in other projects and I do not want to hassle with the marshalling of my types like char *.
So here's my problem: For some times I can call my Generator-Function just as planned. But after some calls, I get an AccessViolationException.
My Generator contains loads of C-functions and also global C variables. Everything is marked properly as extern "C". As I determined, the Violation occurs when I try to a global variable.
I was trying to put all the global variables in my wrapper-class in Generator but I failed, because I could not convert all the C-types into a managed-type.
After I call the functions I was free(x)ing the space of my variables. Before commented that out, I wasn't able to call the function more than twice. Now (after commenting out) I am able to call the function in Generator 4 times. Always.
How can I work around this? Is there a way to give my function like "administrative rights" in order to allow them to do what they want with the global variables?
Thanks to all in advance, I am stuck with this for almost a month now and did much research on how to write wrapper-classes.
Leon
EDIT:
This is how I declare the concerning global variable:
"globals.h"
extern "C"{
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h> // fuer va_start, va_end
#include <string.h>
#include <malloc.h>
#include <windows.h>
//#include <omp.h>
// Used to prevent redefinition, _HAUPT_ is only defined in .dll-Header
#ifdef _HAUPT_
/*#define _HAUPT_*/
#define TYPE
int extreme = 0;
#else
#define TYPE extern
TYPE int extreme;
#endif
}
However, while writing my edit here I found out the problem was on my side. I mixed up a self-written LinkedList, error occured when I was trying to declare an unallocated pointer with a value.

I agree with Taus:
AccessViolationExceptions are most often caused by accessing
freed/unallocated memory or similar. Admin rights have nothing to do
with it. A minimal example would go along way towards identifying the
problem
The marshaler will generally try to create a managed (C#) representation of that data and then attempt to deallocate/free the unmanaged data (C++). However, you're probably not allocating memory in C/C++ in the way that the marshaler is expecting, or maybe you're were not expecting the marshaler to try and free the memory for you.
If the default deallocation behaviour doesn't match your use case then you can handle the deallocation (if at all) of the C++ object manually by using an IntPtr in your C# code. For example, if you're returning a string literal from unmanaged code, then the memory should not be deallocated. See this post for an example.
If you post the code snippet showing how you're allocating memory in your unmanaged application, how you're exposing the data, and finally, how you're accessing it in your C# application, we can help you pinpoint the problem.

You cannot get a useful answer for a question like this. Access violations are the standard way in which C code fails and there are numerous ways to invoke undefined behavior in that language. The kind that corrupts memory, such corruption eventually crashes your code. The exact place where the code fails is very rarely close to the statement with the bug. And it can take a while before the corruption has an affect. Spending a week or more to find the bug is not unusual at all.
Basic ways to go about it:
Write unit tests that exercise the C code, helps to narrow down the number of code execution paths that cause the corruption.
Be sure to build the C code with all debug features turned on, enabled by default for the Debug build in the MSVC compiler. You need all the help you can get from /MDd and /RTC.
Use Application Verifier, a tool that can detect heap corruption and show you what statement caused it. Best used on a unit test that fails.
Contemplate if the C language actually is useful to you to get the job done. You can squeeze an extra ~25% perf out of native code over managed code, give or take, but the price is rather a high one if you can't make it work reliably or lose a month of your life. Hardware is a lot cheaper than your hours.

C# dllimport'ing complex datatypes across platforms?

So I'm writing a wrapper in C# for a C dll. The problem is several of the functions use complex datatypes e.g.:
ComplexType* CreateComplexType(int a, int b);
Is there a way I can declare a valid C# type such that I can use dllimport?
If I were doing a Windows-only solution I'd probably use C++/CLI as a go-between the native complex type and a managed complex type.
I do have access to the source code of the C dll, so would it be possible to instead use an opaque type (e.g. handles)?

Such a function is difficult to call reliably from a C program, it doesn't get better when you pinvoke it. The issue is memory management, that struct needs to be destroyed again. Which requires the calling program to use the exact same memory allocator as the DLL. This rarely turns out well in a C program but you might be lucky that you have the source code for the DLL so you can recompile it and ensure that everybody is using the same shared CRT version.
There is no such luck from C# of course, the pinvoke marshaller will call CoTaskMemFree() to release the struct. Few real C programs use CoTaskMemAlloc() to allocate the struct so that's a silent failure on XP, an AccessViolationException on Vista and higher. Modern Windows versions have a much stricter heap manager that doesn't ignore invalid pointers.
You can declare the return value as IntPtr, that stops the pinvoke marshaller from trying to destroy it. And then manually marshal with Marshal.PtrToStructure(). This doesn't otherwise stop the memory leak, your program will eventually crash with OOM. Usually anyway.

Mono has a good documentation page on using P/Invoke in Windows vs. Linux. Specifically, see the section on marshaling, that discusses simple vs. complex types. If you want to get creative, you could serialize your type to some convenient string-based format like JSON or XML and use that as your marshaling mechanism.

What is unsafe code in C# and why would you use it? [duplicate]

Read this question today about safe and unsafe code I then read about it in MSDN but I still don't understand it. Why would you want to use pointers in C#? Is this purely for speed?

There are three reasons to use unsafe code:
APIs (as noted by John)
Getting actual memory address of data (e.g. access memory-mapped hardware)
Most efficient way to access and modify data (time-critical performance requirements)

Sometimes you'll need pointers to interface your C# to the underlying operating system or other native code. You're strongly discouraged from doing so, as it is "unsafe" (natch).
There will be some very rare occasions where your performance is so CPU-bound that you need that minuscule extra bit of performance. My recommendation would be to write those CPU-intesive pieces in a separate module in assembler or C/C++, export an API, and have your .NET code call that API. An possible additional benefit is that you can put platform-specific code in the unmanaged module, and leave the .NET platform agnostic.

I tend to avoid it, but there are some times when it is very helpful:
for performance working with raw buffers (graphics, etc)
needed for some unmanaged APIs (also pretty rare for me)
for cheating with data
For example of the last, I maintain some serialization code. Writing a float to a stream without having to use BitConverter.GetBytes (which creates an array each time) is painful - but I can cheat:
float f = ...;
int i = *(int*)&f;
Now I can use shift (>>) etc to write i much more easily than writing f would be (the bytes will be identical to if I had called BitConverter.GetBytes, plus I now control the endianness by how I choose to use shift).

There is at least one managed .Net API that often makes using pointers unavoidable. See SecureString and Marshal.SecureStringToGlobalAllocUnicode.
The only way to get the plain text value of a SecureString is to use one of the Marshal methods to copy it to unmanaged memory.

Safe vs Unsafe code

Read this question today about safe and unsafe code I then read about it in MSDN but I still don't understand it. Why would you want to use pointers in C#? Is this purely for speed?

There are three reasons to use unsafe code:
APIs (as noted by John)
Getting actual memory address of data (e.g. access memory-mapped hardware)
Most efficient way to access and modify data (time-critical performance requirements)

Sometimes you'll need pointers to interface your C# to the underlying operating system or other native code. You're strongly discouraged from doing so, as it is "unsafe" (natch).
There will be some very rare occasions where your performance is so CPU-bound that you need that minuscule extra bit of performance. My recommendation would be to write those CPU-intesive pieces in a separate module in assembler or C/C++, export an API, and have your .NET code call that API. An possible additional benefit is that you can put platform-specific code in the unmanaged module, and leave the .NET platform agnostic.

I tend to avoid it, but there are some times when it is very helpful:
for performance working with raw buffers (graphics, etc)
needed for some unmanaged APIs (also pretty rare for me)
for cheating with data
For example of the last, I maintain some serialization code. Writing a float to a stream without having to use BitConverter.GetBytes (which creates an array each time) is painful - but I can cheat:
float f = ...;
int i = *(int*)&f;
Now I can use shift (>>) etc to write i much more easily than writing f would be (the bytes will be identical to if I had called BitConverter.GetBytes, plus I now control the endianness by how I choose to use shift).

There is at least one managed .Net API that often makes using pointers unavoidable. See SecureString and Marshal.SecureStringToGlobalAllocUnicode.
The only way to get the plain text value of a SecureString is to use one of the Marshal methods to copy it to unmanaged memory.

Marshaling – what is it and why do we need it?

What is marshalling and why do we need it?
I find it hard to believe that I cannot send an int over the wire from C# to C and have to marshall it. Why can't C# just send the 32 bits over with a starting and terminating signal, telling C code that it has received an int?
If there are any good tutorials or sites about why we need marshalling and how to use it, that would be great.

Because different languages and environments have different calling conventions, different layout conventions, different sizes of primitives (cf. char in C# and char in C), different object creation/destruction conventions, and different design guidelines. You need a way to get the stuff out of managed land an into somewhere where unmanaged land can see and understand it and vice versa. That's what marshalling is for.

.NET code(C#, VB) is called "managed" because it's "managed" by CLR (Common Language Runtime)
If you write code in C or C++ or assembler it is all called "unmanaged", since no CLR is involved. You are responsible for all memory allocation/de-allocation.
Marshaling is the process between managed code and unmanaged code; It is one of the most important services offered by the CLR.

Marshalling an int is ideally just what you said: copying the memory from the CLR's managed stack into someplace where the C code can see it. Marshalling strings, objects, arrays, and other types are the difficult things.
But the P/Invoke interop layer takes care of almost all of these things for you.

As Vinko says in the comments, you can pass primitive types without any special marshalling. These are called "blittable" types and include types like byte, short, int, long, etc and their unsigned counterparts.
This page contains the list of blittable and non-blittable types.

Marshalling is a "medium" for want of a better word or a gateway, to communicate with the unmanaged world's data types and vice versa, by using the pinvoke, and ensures the data is returned back in a safe manner.

Marshalling is passing signature of a function to a different process which is on a different machine, and it is usually implemented by conversion of structured data to a dedicated format, which can be transferred to other processor systems (serialization / deserialization).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Equivalent of mono_string_to_utf8() without allocation and data copy - c#

Related

AccesViolationException in C++/CLI Wrapper for native C

C# dllimport'ing complex datatypes across platforms?

What is unsafe code in C# and why would you use it? [duplicate]

Safe vs Unsafe code

Marshaling – what is it and why do we need it?

Categories

Resources