Convert char array with zero-char element to C# string - c#

When I receive data from unmanaged code written in C (WinAPI) it asks to reserve a number of bytes and pass the handle (pointer) to the string.
Using Marshal.AllocHGlobal(150) didit.
In return, I received the number of chars, terminated by '/0' - C style.
When I build string from this char array using new string(charBuff) it doesn't cut the string at the '/0' point.
Well, I could use Substring + IndexOf, but is there any elegant way to cut it using some special existing method?

Ok, I found it after wakening up.
It's
Marshal.PtrToStringUni(IntPtr)
string MyStringFromWinAPI()
{
string result;
IntPtr strPtr = Marshal.AllocHGlobal(500);
// here would be any API that gets reserved buffer to rerturn string value
SendMessage(camHwnd, WM_CAP_DRIVER_GET_NAME_UNICODE, 500, strPtr);
// now you could follow 2 ways
// 1-st one is long and boring
char[] charBuff = new char[500];
Marshal.Copy(strPtr, charBuff, 0, 500);
Marshal.FreeHGlobal((IntPtr)strPtr);
result = new string(charBuff);
result = result.Substring(0, result.IndexOf('\0'));
return result;
// or more elegant way
result = Marshal.PtrToStringUni(strPtr);
Marshal.FreeHGlobal((IntPtr)strPtr);
return result;
}

Related

Convert string to ASCII without exceptions (like TryParse)

I am implementing a TryParse() method for an ASCII string class. The method takes a string and converts it to a C-style string (i.e. a null-terminated ASCII string).
I had been using only a Parse(), doing the conversion to ASCII using::
public static bool Parse(string s, out byte[] result)
{
result = null;
if (s == null || s.Length < 1)
return false;
byte[]d = new byte[s.Length + 1]; // Add space for null-terminator
System.Text.Encoding.ASCII.GetBytes(s).CopyTo(d, 0);
// GetBytes can throw exceptions
// (so can CopyTo() but I can replace that with a loop)
result = d;
return true;
}
However, as part of the idea of a TryParse is to remove the overhead of exceptions, and GetBytes() throws exceptions, I'm looking for a different method that does not do so.
Maybe there is a TryGetbytes()-like method?
Or maybe we can reason about the expected format of a standard .Net string and perform the change mathematically (I'm not overly familiar with UTF encodings)?
EDIT: I guess for non-ASCII chars in the string, the TryParse() method should return false
EDIT: I expect when I get around to implementing the ToString() method for this class I may need to do the reverse there.
Two options:
You could just ignore Encoding entirely, and write the loop yourself:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
if (c > 127)
{
return false;
}
buffer[i] = (byte) c;
}
result = buffer;
return true;
}
That's simple, but may be slightly slower than using Encoding.GetBytes.
The second option would be to use a custom EncoderFallback:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
var fallback = new CustomFallback();
var encoding = new ASCIIEncoding { EncoderFallback = fallback };
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
// Use overload of Encoding.GetBytes that writes straight into the buffer
encoding.GetBytes(s, 0, s.Length, buffer, 0);
if (fallback.HadErrors)
{
return false;
}
result = buffer;
return true;
}
That would require writing CustomFallback though - it would need to basically keep track of whether it had ever been asked to handle invalid input.
If you didn't mind an encoding processing the data twice, you could call Encoding.GetByteCount with a UTF-8-based encoding with a replacement fallback (with a non-ASCII replacement character), and check whether that returns the same number of bytes as the number of chars in the string. If it does, call Encoding.ASCII.GetBytes.
Personally I'd go for the first option unless you have reason to believe it's too slow.
There are two possible exceptions that Encoding.GetBytes might throw according to the documentation.
ArgumentNullException is easily avoided. Do a null check on your input and you can ensure this is never thrown.
EncoderFallbackException needs a bit more investigation... Reading the documentation:
A fallback strategy determines how an encoder handles invalid characters or how a decoder handles invalid bytes.
And if we looking in the documentation for ASCII encoding we see this:
It uses replacement fallback to replace each string that it cannot encode and each byte that it cannot decode with a question mark ("?") character.
That means it doesn't use the Exception Fallback and thus will never throw an EncoderFallbackException.
So in summary if you are using ASCII encoding and ensure you don't pass in a null string then you will never have an exception thrown by the call to GetBytes.
The GetBytes method is throwing an exception because your Encoding.EncoderFallback specifies that it should throw an exception.
Create an encoding object with EncoderReplacementFallback to avoid exceptions on unencodable characters.
Encoding encodingWithFallback = new ASCIIEncoding() { DecoderFallback = DecoderFallback.ReplacementFallback };
encodingWithFallback.GetBytes("Hɘ££o wor£d!");
This way imitates the TryParse methods of the primitive .NET value types:
bool TryEncodingToASCII(string s, out byte[] result)
{
if (s == null || Regex.IsMatch(s, "[^\x00-\x7F]")) // If a single ASCII character is found, return false.
{
result = null;
return false;
}
result = Encoding.ASCII.GetBytes(s); // Convert the string to ASCII bytes.
return true;
}

How do I find strings inside a memory dumped byte array converted to UTF8 encoded string?

I'm working on a video game cheat engine with utilizes simple memory manipulation to achieve its goal. I have successfully been able to write a piece of code that dumps a process' memory into a byte[] and iterates over these arrays in search of the desired string. The piece of code that searches is thus:
public bool FindString(byte[] bytes, string pName, long offset)
{
string s = System.Text.Encoding.UTF8.GetString(bytes);
var match = Regex.Match(s, "test");
if (match.Success)
return true;
return false;
}
I then open up a 32-bit version of notepad (since that is what my dumping method is conditioned for) and type the word "test" in it and run my program in debug mode to see if the condition is ever hit. It does not.
Upon further inspect I check out the 's' string's contents on one of the iterations, it is thus:
\0\0\0\0\0\0\0\0���\f\0\u0001����\u0001\0\0\0 \u0001�\0\0\0\0\0 \u0001�\0\0\0\0\0\0\0�\0\0\0\0\0\0\0�\0\0\0\0\0\u0010\0\0\0\0\0\0\0 \a�\0\0\0\0\0\0\0�\0\0\0\0\0\u000f\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\0�\0\0\0\0\0\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0����\f\0\0\0\0\0\0\0�\0\0����\0\0\0\0\0\0\u0010\0\0\0\0\0\0 \0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0�\0\0\0\0\0\0\0�����\u007f\0\0\u0002\0�\u0002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\u001f\0\0\0\0\0\0\0��������\u0010\u0001�\0\0\0\0\0\u0010\u0001�\0\0\0\0\0\u0018\0�\0\0\0\0\0\u0018\0�\0\0\0\0\0\0\0\0\0\0\0\0\0�\u0002�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\00\a�\0\0\0\0\00\a�\0\0\0\0\0�\u0002�\0\0\0\0\0�M�^\u000e\u000e_\u007f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`\a\0\0\0\0\0\0`\a\0\0\0\0\0\0\u0004\0\0\0\0\0\0\0\0�\u001f\0\0\0\0\0�\u001d\u0014)�\u007f\0\0����\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\a\0\u0002\0\0\0\0\0\0\0\0\0\0\0\0�\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0P\u0001�\0\0\0\0\0\0\u0003�\0\0\0\0\0\u0010\u0003�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�
I continued to check each pass-through of this method for the 's' variable and found that I could not see any strings in this format.
My question is simple. What am I doing wrong that I cannot find this string? The dumping is succeeding, but something to do with my method of parsing is causing me trouble.
UPDATE (code for dumping memory)
void ScanProcess(Process process)
{
// getting minimum & maximum address
var sys_info = new SYSTEM_INFO();
GetSystemInfo(out sys_info);
var proc_min_address = sys_info.minimumApplicationAddress;
var proc_max_address = sys_info.maximumApplicationAddress;
var proc_min_address_l = (long)proc_min_address;
var proc_max_address_l = (long)proc_max_address;
//Opening the process with desired access level
var processHandle = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_WM_READ, false, process.Id);
var mem_basic_info = new MEMORY_BASIC_INFORMATION();
var bytesRead = 0; // number of bytes read with ReadProcessMemory
while (proc_min_address_l < proc_max_address_l)
{
VirtualQueryEx(processHandle, proc_min_address, out mem_basic_info, 28); //28 = sizeof(MEMORY_BASIC_INFORMATION)
//If this memory chunk is accessible
if (mem_basic_info.Protect == PAGE_READWRITE && mem_basic_info.State == MEM_COMMIT)
{
//Read everything into a buffer
byte[] buffer = new byte[mem_basic_info.RegionSize];
ReadProcessMemory((int)processHandle, mem_basic_info.BaseAddress, buffer, mem_basic_info.RegionSize, ref bytesRead);
var MemScanner = new MemScan();
Memscanner.FindString(buffer, process.ProcessName, proc_max_address_l);
}
// move to the next memory chunk
proc_min_address_l += mem_basic_info.RegionSize;
proc_min_address = new IntPtr(proc_min_address_l);
if (mem_basic_info.RegionSize == 0)
{
break;
mem_basic_info.RegionSize = 4096;
}
}
}
For starters you can't use NotePad (or any non-binary capable viewing tool to look at your bytes).
You need to use the BitConverter APIs:
https://msdn.microsoft.com/en-us/library/system.bitconverter(v=vs.110).aspx
...to walk the data and compose/search the data to find what you're looking for (keeping whatever encoding you dumped the data in in mind).
BTW - Here's a useful HexEditor: http://www.hexworkshop.com/
I don´t know what MemScan.FindString() does, but I guess the problem is that you are searching a string for a string, rather than for a byte array in a byte array.
By transforming the memory contents using System.Text.Encoding.UTF8.GetString(bytes); you assume that everything stored in memory can be interpreted as valid UTF8 encoding.
Your FindString() must accept parameters as byte[] rather than string, and you need to figure out how the process name is stored in memory (most likely UTF-16).

Convert a string into BASE62

I'm looking for the c# code to convert a string into BASE62, like this:
http://www.molengo.com/base62/title/base62-encoder-decoder
I need those encode and decode-methods for URL-Encoding.
Background on BINARY to TEXT Encoding schemes:
https://en.wikipedia.org/wiki/Base62
https://en.wikipedia.org/wiki/Base64
Good explanation of the BASE62 encoding scheme:
https://www.codeproject.com/Articles/1076295/Base-Encode
Try the C# libraries available here which adds some extension methods to allow you to convert a byte array to and from BASE62 (binary-to-text encoding schemes).
Plenty of base62 libraries on github, have a look:
https://github.com/JoyMoe/Base62.Net
https://github.com/ghost1face/base62
https://github.com/rossdempster/base62csharp
https://github.com/renmengye/base62-csharp (claims below that it doesn't work...raise any issues with them)
If your source data is contained in a "string" then you would first need to convert your "string" to a suitable byte array.
But be careful, to use the correct string to byte conversion call....as you may want the bytes to be the ASCII characters, or the Unicode byte stream etc i.e. Encoding.GetBytes(text) or System.Text.ASCIIEncoding.ASCII.GetBytes(text);, etc
byte[] bytestoencode = .....
string encodedasBASE62 = bytestoencode.ToBase62();
.....
byte[] bytesdecoded = encodedasBASE62.FromBase62();
You can do this for any base, this way:
static string ToBase62(ulong number)
{
var alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
var n = number;
ulong basis = 62;
var ret = "";
while (n > 0)
{
ulong temp = n % basis;
ret = alphabet[(int)temp] + ret;
n = (n / basis);
}
return ret;
}
not the real answer but hopefully this helps you to build a C# Version of it:
Javascript Base62 Encode/Decode:
http://x443.wordpress.com/2012/03/18/javascript-base62-encode-decode/

Binary data conversion to string

I have a chunk of binary data which contains structures with offsets and then strings;
in C++ it is easy:
struct foo
{
int offset;
char * s;
}
void * data;
... data is read and set
foo * header = (foo*) data;
header->s = (int)header-> + (int)data;
int len = strlen(header->s);
char* ns = new char[len+1];
strcpy(ns,header->s);
simple enough...
in C# how would you do this?
The biggest problem is that I don't know the length of the string. It is null terminated.
I have the data in a byte[] and an IntPtr to the memory but I need a POINTER to that data a a string (char *) something that I can get the length of the string.
C# is a high level language, and working with pointers is simply unnatural for this language.
To convert the data from byte array to a string, you can use the BitConverter class:
BitConverter.ToInt32(byte_array, start index);
To convert it to a string, you can use the StringBuilder class:
StringBuilder str = new StringBuilder();
// i=starting index of text
for (int i = 3; i<byte_array.Length; i++)
str.Append(byte_array[i];
return str.ToString();
If there is more data after the string, you can put the stopping condition for the loop byte_array[i]!=0, and when it stops, byte_array[i] will be the string terminator. Save the value of i, and you can get the data after it.
Another method of doing this is to use the ASCIIEncoding.ASCII.GetString() method:
ASCIIEncoding.ASCII.GetString(byte_array, start_index, bytes_count);

How to display text being held in a int variable?

My variable holds some text but is currently being stored as an int (the class used reads the bytes at a memory address and converts to int. Variable.ToString just displays the decimal representation, but doesn't encode it to readable text, or in other words, I would now like to convert the data from int to string with ascii encoding or something.
Here is a demo (based on our Q+A above).
Note: Settings a string with the null terminator as a test, then encoding it into ASCII bytes, then using unsafe (you will need to allow that in Build Option in project properties), itearte through each byte and convert it until 0x0 is reached.
private void button1_Click(object sender, EventArgs e)
{
var ok = "OK" + (char)0;
var ascii = Encoding.ASCII;
var bin = ascii.GetBytes( ok );
var sb = new StringBuilder();
unsafe
{
fixed (byte* p = bin)
{
byte b = 1;
var i = 0;
while (b != 0)
{
b = p[i];
if (b != 0) sb.Append( ascii.GetString( new[] {b} ) );
i++;
}
}
}
Console.WriteLine(sb);
}
Note the FIXED statement, this is required managed strings/arrayts etc are not guaranteed to be statically placed in memory - this ensures it during that section.
assuming an int variable
int x=10;
you can convert this into string as
string strX = x.ToString();
Try this
string s = "9quali52ty3";
byte[] ASCIIValues = Encoding.ASCII.GetBytes(s);
foreach(byte b in ASCIIValues) {
Console.WriteLine(b);
}
Int32.ToString() has an overload that takes a format string. Take a look at the available format strings and use one of those.
Judging by your previous question, the int you have is (probably) a pointer to the string. Depending on whether the data at the pointer is chars or bytes, do one of these to get your string:
var s = new string((char*)myInt);
var s = new string((sbyte*)myInt);
OK. If you variable is a pointer, then Tim is pointing you in the right direction (assuming it is an address and not an offset from an address - in which case you will need the start address to offset from).
If, on the other hand, your variable contains four encoded ascii characters (of a byte each), then you need to split to bytes and convert each byte to a character. Something like this Console.WriteLine(TypeDescriptor.GetConverter(myUint).ConvertTo(myUint, typeof(string))); from Here - MSDN ByteConverter

Categories