Difference in writing string vs. char array with System.IO.BinaryWriter

Difference in writing string vs. char array with System.IO.BinaryWriter - c#

I’m writing text to a binary file in C# and see a difference in quantity written between writing a string and a character array. I’m using System.IO.BinaryWriter and watching BinaryWriter.BaseStream.Length as the writes occur. These are my results:
using(BinaryWriter bw = new BinaryWriter(File.Open(“data.dat”), Encoding.ASCII))
{
string value = “Foo”;
// Writes 4 bytes
bw.Write(value);
// Writes 3 bytes
bw.Write(value.ToCharArray());
}
I don’t understand why the string overload writes 4 bytes when I’m writing only 3 ASCII characters. Can anyone explain this?

The documentation for BinaryWriter.Write(string) states that it writes a length-prefixed string to this stream. The overload for Write(char[]) has no such prefixing.
It would seem to me that the extra data is the length.
EDIT:
Just to be a bit more explicit, use Reflector. You will see that it has this piece of code in there as part of the Write(string) method:
this.Write7BitEncodedInt(byteCount);
It is a way to encode an integer using the least possible number of bytes. For short strings (that we would use day to day that are less than 128 characters), it can be represented using one byte. For longer strings, it starts to use more bytes.
Here is the code for that function just in case you are interested:
protected void Write7BitEncodedInt(int value)
{
uint num = (uint) value;
while (num >= 0x80)
{
this.Write((byte) (num | 0x80));
num = num >> 7;
}
this.Write((byte) num);
}
After prefixing the the length using this encoding, it writes the bytes for the characters in the desired encoding.

From the BinaryWriter.Write(string) docs:
Writes a length-prefixed string to this stream in the current encoding of the BinaryWriter, and advances the current position of the stream in accordance with the encoding used and the specific characters being written to the stream.
This behavior is probably so that when reading the file back in using a BinaryReader the string can be identified. (e.g. 3Foo3Bar6Foobar can be parsed into the string "Foo", "Bar" and "Foobar" but FooBarFoobar could not be.) In fact, BinaryReader.ReadString uses exactly this information to read a string from a binary file.
From the BinaryWriter.Write(char[]) docs:
Writes a character array to the current stream and advances the current position of the stream in accordance with the Encoding used and the specific characters being written to the stream.
It is hard to overstate how comprehensive and useful the docs on MSDN are. Always check them first.

As already stated, BinaryWriter.Write(String) writes the length of the string to the stream, before writing the string itself.
This allows the BinaryReader.ReadString() to know how long the string is.
using (BinaryReader br = new BinaryReader(File.OpenRead("data.dat")))
{
string foo1 = br.ReadString();
char[] foo2 = br.ReadChars(3);
}

Did you look at what was actually written? I'd guess a null terminator.

Related

How do I properly emit binary data from a SecureString, so that it can later be converted to a string?

I have strings of sensitive information that I need to collect from my users. I am using a WPF PasswordBox to request this information. For the uninitiated, the PasswordBox control provides a SecurePassword property which is a SecureString object rather than an insecure string object. Within my application, the data from the PasswordBox gets passed as a SecureString to an encryption method.
What I need to be able to do is marshal the data to a byte array that essentially represents a .Net string value without first converting the data to a .Net string. More specifically, given a SecureString with a value such as...
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!##$%^&*()_-+={[}]|:;"'<,>.?/ ≈篭母
...how can I convert it to a byte array that is the equivalent a .Net string that's been serialized and written to a stream with a StreamWriter?
By using Marshal.SecureStringToCoTaskMemUnicode(...) I am able to do this with more traditional, western text. However, when I created the above text string using additional, not-typical characters and a string of Japanese text (see the last few bolded characters) my method of getting a Unicode byte array assigned to the IntPtr position doesn't seem to properly work anymore.
How can I emit the data of a SecureString in a secure way such that the returned byte data is structured the same as the byte data of a standard .Net string, serialized to binary output?
NOTE
Please ignore all security concerns at the moment. I am working on making various security upgrades to my application. For now, I need to use a SecureString for getting the sensitive data to the encryptor. The decryptor (for now) will still need to decrypt this data to string values, which is why I need to some how serialize the data in the the SecureString to a binary format similar to the binary format of the string object.
I agree that this approach is a bit unfortunate, however, I'm having to make incremental improvements on an existing application, and the first phase is locking down the data in SecureString objects from the user to the encryptor.

If you need to write secure string to stream, I'd suggest to create method like this:
public static class Extensions {
public static void WriteSecure(this StreamWriter writer, SecureString sec) {
int length = sec.Length;
if (length == 0)
return;
IntPtr ptr = Marshal.SecureStringToBSTR(sec);
try {
// each char in that string is 2 bytes, not one (it's UTF-16 string)
for (int i = 0; i < length * 2; i += 2) {
// so use ReadInt16 and convert resulting "short" to char
var ch = Convert.ToChar(Marshal.ReadInt16(ptr + i));
// write
writer.Write(ch);
}
}
finally {
// don't forget to zero memory
Marshal.ZeroFreeBSTR(ptr);
}
}
}
If you really need byte array - you can reuse this method too:
byte[] result;
using (var ms = new MemoryStream()) {
using (var writer = new StreamWriter(ms)) {
writer.WriteSecure(secureString);
}
result = ms.ToArray();
}
Though method from first comment might be a bit more pefomant (not sure if that's important for you).

How do I find strings inside a memory dumped byte array converted to UTF8 encoded string?

I'm working on a video game cheat engine with utilizes simple memory manipulation to achieve its goal. I have successfully been able to write a piece of code that dumps a process' memory into a byte[] and iterates over these arrays in search of the desired string. The piece of code that searches is thus:
public bool FindString(byte[] bytes, string pName, long offset)
{
string s = System.Text.Encoding.UTF8.GetString(bytes);
var match = Regex.Match(s, "test");
if (match.Success)
return true;
return false;
}
I then open up a 32-bit version of notepad (since that is what my dumping method is conditioned for) and type the word "test" in it and run my program in debug mode to see if the condition is ever hit. It does not.
Upon further inspect I check out the 's' string's contents on one of the iterations, it is thus:
\0\0\0\0\0\0\0\0���\f\0\u0001����\u0001\0\0\0 \u0001�\0\0\0\0\0 \u0001�\0\0\0\0\0\0\0�\0\0\0\0\0\0\0�\0\0\0\0\0\u0010\0\0\0\0\0\0\0 \a�\0\0\0\0\0\0\0�\0\0\0\0\0\u000f\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\0�\0\0\0\0\0\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0����\f\0\0\0\0\0\0\0�\0\0����\0\0\0\0\0\0\u0010\0\0\0\0\0\0 \0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0�\0\0\0\0\0\0\0�����\u007f\0\0\u0002\0�\u0002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\u001f\0\0\0\0\0\0\0��������\u0010\u0001�\0\0\0\0\0\u0010\u0001�\0\0\0\0\0\u0018\0�\0\0\0\0\0\u0018\0�\0\0\0\0\0\0\0\0\0\0\0\0\0�\u0002�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\00\a�\0\0\0\0\00\a�\0\0\0\0\0�\u0002�\0\0\0\0\0�M�^\u000e\u000e_\u007f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`\a\0\0\0\0\0\0`\a\0\0\0\0\0\0\u0004\0\0\0\0\0\0\0\0�\u001f\0\0\0\0\0�\u001d\u0014)�\u007f\0\0����\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\a\0\u0002\0\0\0\0\0\0\0\0\0\0\0\0�\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0P\u0001�\0\0\0\0\0\0\u0003�\0\0\0\0\0\u0010\u0003�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�
I continued to check each pass-through of this method for the 's' variable and found that I could not see any strings in this format.
My question is simple. What am I doing wrong that I cannot find this string? The dumping is succeeding, but something to do with my method of parsing is causing me trouble.
UPDATE (code for dumping memory)
void ScanProcess(Process process)
{
// getting minimum & maximum address
var sys_info = new SYSTEM_INFO();
GetSystemInfo(out sys_info);
var proc_min_address = sys_info.minimumApplicationAddress;
var proc_max_address = sys_info.maximumApplicationAddress;
var proc_min_address_l = (long)proc_min_address;
var proc_max_address_l = (long)proc_max_address;
//Opening the process with desired access level
var processHandle = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_WM_READ, false, process.Id);
var mem_basic_info = new MEMORY_BASIC_INFORMATION();
var bytesRead = 0; // number of bytes read with ReadProcessMemory
while (proc_min_address_l < proc_max_address_l)
{
VirtualQueryEx(processHandle, proc_min_address, out mem_basic_info, 28); //28 = sizeof(MEMORY_BASIC_INFORMATION)
//If this memory chunk is accessible
if (mem_basic_info.Protect == PAGE_READWRITE && mem_basic_info.State == MEM_COMMIT)
{
//Read everything into a buffer
byte[] buffer = new byte[mem_basic_info.RegionSize];
ReadProcessMemory((int)processHandle, mem_basic_info.BaseAddress, buffer, mem_basic_info.RegionSize, ref bytesRead);
var MemScanner = new MemScan();
Memscanner.FindString(buffer, process.ProcessName, proc_max_address_l);
}
// move to the next memory chunk
proc_min_address_l += mem_basic_info.RegionSize;
proc_min_address = new IntPtr(proc_min_address_l);
if (mem_basic_info.RegionSize == 0)
{
break;
mem_basic_info.RegionSize = 4096;
}
}
}

For starters you can't use NotePad (or any non-binary capable viewing tool to look at your bytes).
You need to use the BitConverter APIs:
https://msdn.microsoft.com/en-us/library/system.bitconverter(v=vs.110).aspx
...to walk the data and compose/search the data to find what you're looking for (keeping whatever encoding you dumped the data in in mind).
BTW - Here's a useful HexEditor: http://www.hexworkshop.com/

I don´t know what MemScan.FindString() does, but I guess the problem is that you are searching a string for a string, rather than for a byte array in a byte array.
By transforming the memory contents using System.Text.Encoding.UTF8.GetString(bytes); you assume that everything stored in memory can be interpreted as valid UTF8 encoding.
Your FindString() must accept parameters as byte[] rather than string, and you need to figure out how the process name is stored in memory (most likely UTF-16).

Reading files in c# with filestream and streamreader

I have a file, which contains data, I want to read it as byte[] and divide into 3 blocks. First line might be read as string, then 2nd block, might be 1-3 length of lines and all left bytes as block 3.
I was wondering, how can I get that block 1 and block 2 would be a string made of byte[], block 3 would be kept byte[].
File:
00256000 12 // block 1 single line
a2#b2#c2#d2#e2# //
1# // block 2 readline doesn't fit, unknown length of lines
1# //
—q3л // block 3 left bytes
I was trying to do FileStream.Read(bytes, 0, file.length), but it only reads all bytes.
StreamReader.ReadLine() is suitable only for 1st line, but it reads plain string, not bytes, it skips '\n' , '\r' etc.
I don't know which way is better to read files and it would be perfect to read allbytes and somehow divide them to these 3 blocks, to have exact block size.

You can read all bytes and iterate through buffer searching for line endings. When you find line endings convert textparts with
string text = Encoding.UTF8.GetString(buffer, start_len, end_len);
p.s. be sure to use exact encoding... UTF8 is an example...

Create a class with the data members you want, mark it as serializable, then serialize the data (i.e. save it to a file) and deserialize it whenever you want the data.
[Serializable()]
public class Data1
{
public Data1()
{
}
public String[] Block { get; set; }
}
To load that data after you have saved it, use some technique like this:
public Data1 Load(string filename)
{
if (System.IO.File.Exists(filename))
{
using (var stream = System.IO.File.OpenRead(filename))
{
var deserializer = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
return deserializer.Deserialize(stream) as Data1;
}
}
return null;
}
I won't do it all for you, though! You need to look into how to Serialize an instance of Data1.

There is very convenient method for reading small files, it returns array of lines.
string[] lines = File.ReadAllLines("filename");

How to check if a byte array ends with carriage return

I want to know wether my byte array ends on carriage return and if not I want to add it.
Thats what I have tried
byte[] fileContent = File.ReadAllBytes(openFileDialog.FileName);
byte[] endCharacter = fileContent.Skip(fileContent.Length - 2).Take(2).ToArray();
if (!(endCharacter.Equals(Encoding.ASCII.GetBytes(Environment.NewLine))))
{
fileContent = fileContent.Concat(Encoding.ASCII.GetBytes(Environment.NewLine)).ToArray();
}
But I don't get it... Is this the right approach? If so, what's wrong with equals? Even if my byte array ends with {10,13}, the If statement never detects it.

In this case, Equals checks for reference equality; while endCharacter and Encoding.ASCII.GetBytes(Environment.NewLine) may have the same contents, they are not the same array, so Equals returns false.
You're interested in value equality, so you should instead individually compare the values at each position in the arrays:
newLine = Encoding.ASCII.GetBytes(Environment.NewLine);
if (endCharacter[0] != newLine[0] && endCharacter[1] != newLine[1])
{
// ...
}
In general, if you want to compare arrays for value equality, you could use something like this method, provided by Marc Gravell.
However, a much more efficient solution to your problem would be to convert the last two bytes of your file into ASCII and do a string comparison (since System.String already overloads == to check for value equality):
string endCharacter = Encoding.ASCII.GetString(fileContent, fileContent.Length - 2, 2);
if (endCharacter == Environment.NewLine)
{
// ...
}
You may also need to be careful about reading the entire file into memory if it's likely to be large. If you don't need the full contents of the file, you could do this more efficiently by just reading in the final two bytes, inspecting them, and appending directly to the file as necessary. This can be achieved by opening a System.IO.FileStream for the file (through System.IO.File.Open).

I found the solution, I must take SequenceEqual (http://www.dotnetperls.com/sequenceequal) in place of Equals. Thanks to everyone!
byte[] fileContent = File.ReadAllBytes(openFileDialog.FileName);
byte[] endCharacter = fileContent.Skip(fileContent.Length - 2).Take(2).ToArray();
if (!(endCharacter.SequenceEqual(Encoding.ASCII.GetBytes(Environment.NewLine))))
{
fileContent = fileContent.Concat(Encoding.ASCII.GetBytes(Environment.NewLine)).ToArray();
File.AppendAllText(openFileDialog.FileName, Environment.NewLine);
}

C# - RSACryptoServiceProvider Decrypt into a SecureString instead of byte array

I have a method that currently returns a string converted from a byte array:
public static readonly UnicodeEncoding ByteConverter = new UnicodeEncoding();
public static string Decrypt(string textToDecrypt, string privateKeyXml)
{
if (string.IsNullOrEmpty(textToDecrypt))
{
throw new ArgumentException(
"Cannot decrypt null or blank string"
);
}
if (string.IsNullOrEmpty(privateKeyXml))
{
throw new ArgumentException("Invalid private key XML given");
}
byte[] bytesToDecrypt = Convert.FromBase64String(textToDecrypt);
byte[] decryptedBytes;
using (var rsa = new RSACryptoServiceProvider())
{
rsa.FromXmlString(privateKeyXml);
decryptedBytes = rsa.Decrypt(bytesToDecrypt, FOAEP);
}
return ByteConverter.GetString(decryptedBytes);
}
I'm trying to update this method to instead return a SecureString, but I'm having trouble converting the return value of RSACryptoServiceProvider.Decrypt from byte[] to SecureString. I tried the following:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
char[] chars = ByteConverter.GetChars(new[] { b });
if (chars.Length != 1)
{
throw new Exception(
"Could not convert a single byte into a single char"
);
}
secStr.AppendChar(chars[0]);
}
return secStr;
However, using this SecureString equality tester, the resulting SecureString was not equal to the SecureString constructed from the original, unencrypted text. My Encrypt and Decrypt methods worked before, when I was just using string everywhere, and I've also tested the SecureString equality code, so I'm pretty sure the problem here is how I'm trying to convert byte[] into SecureString. Is there another route I should take for using RSA encryption that would allow me to get back a SecureString when I decrypt?
Edit: I didn't want to convert the byte array to a regular string and then stuff that string into a SecureString, because that seems to defeat the point of using a SecureString in the first place. However, is it also bad that Decrypt returns byte[] and I'm then trying to stuff that byte array into a SecureString? It's my guess that if Decrypt returns a byte[], then that's a safe way to pass around sensitive information, so converting one secure representation of the data to another secure representation seems okay.

A char and a byte can be used interchangeably with casting, so modify your second chunk of code as such:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
secStr.AppendChar((char)b);
}
return secStr;
This should work properly, but keep in mind that you're still bringing the unencrypted information into the "clear" in memory, so there's a point at which it could be compromised (which sort of defeats the purpose to a SecureString).
** Update **
A byte[] of your sensitive information is not secure. You can look at it in memory and see the information (especially if it's just a string). The individual bytes will be in the exact order of the string, so 'read'ing it is pretty straight-forward.
I was (actually about an hour ago) just struggling with this same issue myself, and as far as I know there is no good way to go straight from the decrypter to the SecureString unless the decryter is specifically programmed to support this strategy.

I think the problem might be your ByteConvert.GetChars method. I can't find that class or method in the MSDN docs. I'm not sure if that is a typo, or a homegrown function. Regardless, it is mostly likely not interpreting the encoding of the bytes correctly. Instead, use the UTF8Encoding's GetChars method. It will properly convert the bytes back into a .NET string, assuming they were encrypted from a .NET string object originally. (If not, you'll want to use the GetChars method on the encoding that matches the original string.)
You're right that using arrays is the most secure approach. Because the decrypted representations of your secret are stored in byte or char arrays, you can easily clear them out when done, so your plaintext secret isn't left in memory. This isn't perfectly secure, but more secure than converting to a string. Strings can't be changed and they stay in memory until they are garbage collected at some indeterminate future time.
var secStr = new SecureString();
var chars = System.Text.Encoding.UTF8.GetChars(decryptedBytes);
for( int idx = 0; idx < chars.Length; ++idx )
{
secStr.AppendChar(chars[idx]);
# Clear out the chars as you go.
chars[idx] = 0
}
# Clear the decrypted bytes from memory, too.
Array.Clear(decryptedBytes, 0, decryptedBytes.Length);
return secStr;

Based on Coding Gorilla's answer, I tried the following in my Decrypt method:
string decryptedString1 = string.Empty;
foreach (byte b in decryptedBytes)
{
decryptedString1 += (char)b;
}
string decryptedString2 = ByteConverter.GetString(decryptedBytes);
When debugging, decryptedString1 and decryptedString2 were not equal:
decryptedString1 "m\0y\0V\0e\0r\0y\0L\0o\0n\0g\0V\03\0r\0y\05\03\0c\0r\03\07\0p\04\0s\0s\0w\00\0r\0d\0!\0!\0!\0"
decryptedString2 "myVeryLongV3ry53cr37p4ssw0rd!!!"
So it looks like I can just go through the byte[] array, do a direct cast to char, and skip \0 characters. Like Coding Gorilla said, though, this does seem to again in part defeat the point of SecureString, because the sensitive data is floating about in memory in little byte-size chunks. Any suggestions for getting RSACryptoServiceProvider.Decrypt to return a SecureString directly?
Edit: yep, this works:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
var c = (char)b;
if ('\0' == c)
{
continue;
}
secStr.AppendChar(c);
}
return secStr;
Edit: correction: this works with plain old English strings. Encrypting and then attempting to decrypt the string "標準語 明治維新 english やった" doesn't work as expected because the resulting decrypted string, using this foreach (byte b in decryptedBytes) technique, does not match the original unencrypted string.
Edit: using the following works for both:
var secStr = new SecureString();
foreach (char c in ByteConverter.GetChars(decryptedBytes))
{
secStr.AppendChar(c);
}
return secStr;
This still leaves a byte array and a char array of the password in memory, which sucks. Maybe I should find another RSA class that returns a SecureString. :/

What if you stuck to UTF-16?
Internally, .NET (and therefore, SecureString) uses UTF-16 (double byte) to store string contents. You could take advantage of this and translate your protected data two bytes (i.e. 1 char) at a time...
When you encrypt, peel off a Char, and use Encoding.UTF16.GetBytes() to get your two bytes, and push those two bytes into your encryption stream. In reverse, when you are reading from your encrypted stream, read two bytes at a time, and UTF16.GetString() to get your char.
It probably sounds awful, but it keeps all the characters of your secret string from being all in one place, AND it gives you the reliability of character "size" (you won't have to guess if the next single byte is a char, or a UTF marker for a double-wide char). There's no way for an observer to know which characters go with which, nor in which order, so guessing the secret should be near impossible.
Honestly, this is just a suggested idea... I'm about to try it myself, and see how viable it is. My goal is to produce extension methods (SecureString.Encrypt and ICrypto.ToSecureString, or something like that).

Use System.Encoding.Default.GetString
GetString MSDN

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.