Convert single byte character string (half width) to double byte (full width) - c#

Recently I came across this code in a C# application.
cDataString = Strings.StrConv(cDataString, VbStrConv.Wide);
I understand that the StrConv is a string function of VB. You can call it by including 'using Microsoft.VisualBasic;'.
It is supposed to covert half width japanese characters into full width ones.
My question is:
Is there a way to achieve the same WITHOUT using the VB functions and WITHOUT including the VB headers, using only the standard c# functions? I know there are many c# string conversion functions and some of them can convert from unicode to ansi and vice versa and so on. But I am not sure if any of those will directly get the exact same result as the above VB one. So, can this be done in c#?
Thank you for your time and efforts.
Update:
I came across this question that was asked 5 years ago. The answers and discussions do show some ways in which it could be done. What I would specifically like to know is that, after 5 years and new versions and what nots, is there a simpler and better way to do this in .NET without depending on VB functions or VB libraries?

There is no equivalent function in C#.
If you follow the source code for Microsoft.VisualBasic.dll's StrConv, you'll see it actually p/invokes LCMapString internally similar to the answer you linked.
If you don't want to reference Microsoft.VisualBasic.dll, you could wrap the p/invoke into a helper class or service written in C#, something like this...
// NOTE: CODE NOT TESTED
// Code from John Estropia's StackOverflow answer
// https://stackoverflow.com/questions/6434377/converting-zenkaku-characters-to-hankaku-and-vice-versa-in-c-sharp
public static class StringWidthHelper
{
private const uint LOCALE_SYSTEM_DEFAULT = 0x0800;
private const uint LCMAP_HALFWIDTH = 0x00400000;
private const uint LCMAP_FULLWIDTH = 0x00800000;
public static string ToHalfWidth(string fullWidth)
{
StringBuilder sb = new StringBuilder(256);
LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_HALFWIDTH, fullWidth, -1, sb, sb.Capacity);
return sb.ToString();
}
public static string ToFullWidth(string halfWidth)
{
StringBuilder sb = new StringBuilder(256);
LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_FULLWIDTH, halfWidth, -1, sb, sb.Capacity);
return sb.ToString();
}
[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
private static extern int LCMapString(uint Locale, uint dwMapFlags, string lpSrcStr, int cchSrc, StringBuilder lpDestStr, int cchDest);
}
Otherwise, you could build a Dictionary to act as a look-up table.

Not a generic solution, but in my particular case (Half-width Japanese katakana ラーメン to Full-width katakana ラーメン), String#Normalize with NFKC option did the job.
Note that this method is not entirely compatible with VB one (e.g. it converts full-width numbers 42 to half-width numbers 42), so you need to select characters to replace like:
// Half-width katakana to Full-width katakana
Regex halfKatakana = new Regex(#"[\uFF61-\uFF9F]+");
cDataString = halfKatakana.Replace(cDataString, (m) => m.Value.Normalize(NormalizationForm.FormKC));

Related

Marshaling issue with char** while accessing a library function

I'm porting an existing library/DLL writen in C++/VisualStudio to codeblocks/GCC. The DLL in Windows has been tested in C#, C, C++, Python, Delphi, Java, VB.NET, LabVIEW, etc and works fine and stable.
However, when porting it to Linux, I'm having issues while testing it from Mono/C#, while it's working fine from FreePascal and Python.
The root of the issue is a function that detects some devices and returns an integer with the number of devices detected, and a list of the paths (array of ASCII strings of chars) where the devices are located, through parameters:
int DetectDevices(char ** DevicePaths);
They way I'm copying the results in the library is:
i=0;
for (vector<string>::iterator it=lstDetected.begin(); it!=lstDetected.end(); ++it)
strcpy(DevicePaths[i++], (*it).c_str());
In C#, I declare the external function using the following code:
[DllImport(LIBRARY_PATH)]
public static extern int DetectDevices([In, Out, MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.LPStr)] string[] DevicePaths);
I would like to note, that I'm actually reserving some memory space in C# before calling the function and getting the value returned:
string[] DevicePaths = new string[50];
for (int i=0; i<DevicePaths.Length; i++)
DevicePaths[i] = new string('\0', 255);
This is working fine in Windows/VisualStudio, but not in Linux/Mono.
Replacing LPStr with LPWStr and performing a debug, shows that the characters are supposedly arriving but the equivalent ASCII code received is 0 for all the characters in LPStr and 63 in LPWStr.
I'm thinking that this could be related to an issue related with character encoding, but I might be wrong.
Does anyone have any idea on what could be wrong here?
Help will be much appreciated!
I finally managed to find a solution to the Marshaling problem.
While in Windows (.NET framework) & Visual Studio, returning an C array of strings (array of char array) parameter through the following manner is allowed:
[DllImport(LIBRARY_PATH)]
public static extern int DetectDevices([In, Out, MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.LPStr)] string[] DevicePaths);
for some reason this is not working in Linux / Mono and I had to use the following method:
public static extern int DetectDevices(IntPtr[] pDevicePaths);
and then, in the code retrieve each string using the following method:
const int VCOUNT = 50;
const int MAXSTRINGSIZE = 255;
string[] MyValues = new string[VCOUNT];
IntPtr[] ptr = new IntPtr[VCOUNT];
for (int i = 0; i < ptr.Length; i++) ptr[i] = Marshal.AllocCoTaskMem(MAXSTRINGSIZE);
int n = DetectDevices(ptr);
if (n > 0) {
for (int i = 0; i < n; i++) {
StringBuilder sb = new StringBuilder(Marshal.PtrToStringAnsi(ptr[i]));
MyValues[i] = sb.ToString();
}
}
This is a more C/C++ style, which adds complexity but makes sense.
So I believe that either Mono is not fully implemented or there is a bug somewhere.
In case anyone has a better solution, I'll really appreciate it.
Try with LPTStr which will convert the string to the platform’s default string encoding. For Mono this is UTF-8.
UnmanagedType.LPStr => ansi
UnmanagedType.LPWStr => unicode
UnmanagedType.LPTStr => platform default
There are other UnmanagedType that could also help... BStr perhaps...?
If this does not help then consider using Custom Marshaling or Manual Marshaling.
The documentation is pretty good.

C# application on Japanese Windows OS - Present Latin as Full-Width characters

I have a C# winform application, that is installed on a Japanese windows 7.
Some of the labels are presented with a very wide fonts, causing them to not mach the size of the from.
After some research I was told it might be a half/full width issue.
Has it any way to force all strings to be presented as Half width?
For example, this part is not shown correctly:
modelSizeLabel.Text = String.Format("X:{0:0.0},Y:{1:0.0},Z:{2:0.0} [{3}]",
(Model.BBox.dx),
(Model.BBox.dy),
(Model.BBox.dz - Model.Sink),
uc.To.ToString() //units enum
);
Basically there are 2 approaches I know to deal with full-width letters:
1. Using String.Normalize() method
This approach uses standard Unicode normalization forms when converting full-width (zenkaku) to half-width (hankaku):
public static string ToHalfWidth(string fullWidth)
{
return fullWidth.Normalize(System.Text.NormalizationForm.FormKC);
}
NB: This is considered simplest approach to convert letters, numbers and punctuations covered by ANSI encoding written in Japanese IME, but I still don't know how it impact any kana/kanji letters.
2. Using P/Invoke to call LCMapString method in kernel32.dll
This approach requires calling external DLL resource kernel32.dll with API method LCMapString, with flag defined in LCMapStringEx function (note that some flags are mutually exclusive, implementation credit to rshepp/John Estropia):
// edited from /a/40836235
private const uint LOCALE_SYSTEM_DEFAULT = 0x0800;
private const uint LCMAP_HALFWIDTH = 0x00400000;
[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
private static extern int LCMapString(uint Locale, uint dwMapFlags, string lpSrcStr, int cchSrc, StringBuilder lpDestStr, int cchDest);
public static string ToHalfWidth(string fullWidth, int size)
{
StringBuilder sb = new StringBuilder(size);
LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_HALFWIDTH, fullWidth, -1, sb, sb.Capacity);
return sb.ToString();
}
Usage example:
// by default Japanese IME automatically convert all vocal letters to respective kana letters,
// so I used consonants except "n"
Label1.Text = ToHalfWidth("0123456789bcdfghjklmpqrstvwxyz");
Label2.Text = ToHalfWidth("0123456789bcdfghjklmpqrstvwxyz", 256);
PS: You can wrap both methods above in a helper/service class for usage across the same namespace.
Related issues:
Converting zenkaku characters to hankaku and vice-versa in C#
Convert single byte character string (half width) to double byte (full width)

C# marshalling char* to StringBuilder get always empty string

I am developing a C++ library and a C# application that should consume it.
The library takes two numeric input arguments and one string output parameter.
My problem is that in the C# application i get always an empty string for this parameter. Here is my code.
C++ side:
typedef struct sharedItem{
unsigned int tagId;
unsigned char tagValue[256];
}sharedItem;
extern "C" {
int getSharedMemoryVariable(char* value, unsigned int variableTagId, int foundVariables)
{
sharedItem *item;
set item properly...
strcpy(value, (char *)item->tagValue);
check result and return properly...
}
}
C# side
[DllImport("C:\\SharedMemory.dll", CallingConvention=CallingConvention.Cdecl, CharSet=CharSet.Ansi)]
public static extern int getSharedMemoryVariable(StringBuilder variableValue, UInt16 variableTagId, Int16 foundVariables);
StringBuilder value = new StringBuilder(256);
res = SharedMemory.getSharedMemoryVariable(value, 45, 14730);
My problem is that variable value is always an empty string. Please note that, in C++ side, if I replace
strcpy(value, (char *)item->tagValue);
with
strcpy(value, "test");
the application works fine.
I hope somebody can help me.
Thank you
EDIT:
[DllImport] already pins parameters; and there's no need for unsafe code
Thanks #dan
Anyway that can be fixed by doing a memset(item->tagValue, '\0', 256*sizeof(char));

PInvoke char* in C DLL handled as String in C#. Issue with null characters

The function in C DLL looks like this:
int my_Funct(char* input, char* output);
I must call this from C# app. I do this in the following way:
...DllImport stuff...
public static extern int my_Funct(string input, string output);
The input string is perfectly transmitted to the DLL (I have visible proof of that). The output that the function fills out although is wrong. I have hexa data in it, like:
3F-D9-00-01
But unfortunately everything that is after the two zeros is cut, and only the first two bytes come to my C# app. It happens, because (I guess) it treats as null character and takes it as the end of the string.
Any idea how could I get rid of it? I tried to specifiy it as out IntPtr instead of a string, but I don't know what to do with it afterwards.
I tried to do after:
byte[] b1 = new byte[2];
Marshal.Copy(output,b1,0,2);
2 should be normally the length of the byte array. But I get all kind of errors: like "Requested range extends past the end of the array." or "Attempted to read or write protected memory..."
I appreciate any help.
Your marshalling of the output string is incorrect. Using string in the p/invoke declaration is appropriate when passing data from managed to native. But you cannot use that when the data flows in the other direction. Instead you need to use StringBuilder. Like this:
[DllImport(...)]
public static extern int my_Funct(string input, StringBuilder output);
Then allocate the memory for output:
StringBuilder output = new StringBuilder(256);
//256 is the capacity in characters - only you know how large a buffer is needed
And then you can call the function.
int retval = my_Funct(inputStr, output);
string outputStr = output.ToString();
On the other hand, if these parameters have null characters in them then you cannot marshal as string. That's because the marshaller won't marshal anything past the null. Instead you need to marshal it as a byte array.
public static extern int my_Funct(
[In] byte[] input,
[Out] byte[] output
);
That matches your C declaration.
Then assuming the ANSI encoding you convert the input string to a byte array like this:
byte[] input = Encoding.Default.GetBytes(inputString);
If you want to use a different encoding, it's obvious how to do so.
And for the output you do need to allocate the array. Assuming it's the same length as the input you would do this:
byte[] output = new byte[input.Length];
And somehow your C function has got to know the length of the arrays. I'll leave that bit to you!
Then you can call the function
int retval = my_Funct(input, output);
And then to convert the output array back to a C# string you use the Encoding class again.
string outputString = Encoding.Default.GetString(output);

Multiple function calls from C# to C++ unmanaged code causes AccessViolationException

I have declared a DLL import in my C# program that looks like this:
[DllImport("C:\\c_keycode.dll", EntryPoint = "generateKeyCode",
CallingConvention = CallingConvention.Cdecl)]
static extern IntPtr generateKeyCode(char[] serial, char[] option, char c_type);
It references the function generateKeyCode() inside of my DLL.
Here is the code that is causing an error (used breakpoints):
const char* generateKeyCode(char serial[],
char option[],
char c_type)
{
returnBufferString = "";
SHA1_CTX context;
int optionLength = 0;
#ifdef WIN32
unsigned char buffer[16384] = {0};
#else
unsigned char buffer[256] = {0};
#endif
//char output[80];
char keycode[OPTION_KEY_LENGTH+1] = "";
int digest_array_size = 10; //default value for digest array size
unsigned char digest[20] = {0};
char optx[24] = {0};
char c_type_upper;
// Combine serial # and Option or Version number
char str1[30] = {0};
int i;
int size = 0;
int pos = 0;
...
...
}
Basically, I imported this DLL so I could pass the function parameters and it could do its algorithm and simply return me a result. I used this marshaler function...
public static string genKeyCode_marshal(string serial, string option, char type)
{
return Marshal.PtrToStringAnsi(generateKeyCode(serial.ToCharArray(),
option.ToCharArray(), type));
}
...so I could make the call properly. Inside of my C++ header file, I have defined a string, as indicated is helpful in the answer to this question (it is the returnBufferString variable present at the top of the C/C++ function).
I make this function call several times as I use a NumericUpDown control to go from 1.0 to 9.9 in increments of 0.1 (each up or down accompanies another function call), and then back down again. However, every time I try to do this, the program hitches after a seemingly set number of function calls (stops at 1.9 on the way back down if I just go straight up and down, or earlier if I alternate up and down a bit).
Please note that it works and gives me the value I want, there are no discrepancies there.
I changed the buffer size to some smaller number (5012) and when I tried to run the program, on the first function call it threw the AccessViolationException. However, doubling the buffer size to twice (32768) the original had no effect in comparison to the original -- going straight up to 9.9 from 1.0 and down back again, it stops at 1.9 and throws the exception.
EDIT: Default is ANSI, so it is ANSI. No problems there. Is this a memory allocation issue??
I would suggest trying the following:
[DllImport("C:\\c_keycode.dll", EntryPoint = "generateKeyCode",
CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi)]
static extern IntPtr generateKeyCode(string serial, string option, char c_type);
Note the new CharSet field of DllImport attribute.
Next idea is to use MarshalAs attribute explicitely:
[DllImport("C:\\c_keycode.dll", EntryPoint = "generateKeyCode",
CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Auto)]
static extern IntPtr generateKeyCode([MarshalAs(UnmanagedType.LPTStr)] string serial, [MarshalAs(UnmanagedType.LPTStr)] string option, char c_type);
I know this may be unsatisfactory, but once I removed the output redirection I was using to debug from within my C/C++ DLL, the problem stopped. Everything works now, so I guess that's essentially equivalent to answering my own question. Thanks to everyone for the replies.

Categories