How to create big sized .txt file? - c#

For certain reasons, I have to create a 1024 kb .txt file.
Below is my current code:
int size = 1024000 //1024 kb..
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
//newPath is local directory, where I store the created file
using (System.IO.StreamWriter sw = File.CreateText(newPath))
{
sw.WriteLine(tobewritten);
}
I have to wait at least 30 minutes to execute this piece of code, which I consider too long.
Now, I would like to ask for advice on how to actually achieve my mentioned objective effectively. Are there any alternatives to do this task? Am I writing bad code? Any help is appreciated.

There are several misunderstandings in the code you provided:
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
You seem to think that your are initializing each byte in your array bytearray with zero. Instead you just set the loop variable bit (unfortunate naming) to zero size times. Actually this code wouldn't even compile since you cannot assign to the foreach iteration variable.
Also you didn't need initialization here in the first place: byte array elements are automatically initialized to 0.
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
You want to combine the string representation of each byte in your array to the string variable tobewritten. Since strings are immutable you create a new string for each element that has to be garbage collected along with the string you created for bit, this is relatively expensive, especially when you create 2048000 one of them - use a Stringbuilder instead.
Lastly all of that is not needed at all anyway - it seems you just want to write a bunch of "0" characters to a text file - if you are not worried about creating a single large string of zeros (it depends on the value of size whether this makes sense) you can just create the string directly to do this one go - or alternatively write a smaller string directly to the stream a bunch of times.
using (var file = File.CreateText(newpath))
{
file.WriteLine(new string('0', size));
}

Replace the string with a pre-sized StringBuilder to avoid unnecessary allocations.
Or, better yet, write each piece directly to the StreamWriter instead of pointlessly building a 100MB in-memory string first.

Related

Get base64-encoded string from ReadOnlySequence<byte>

Given ReadOnlySequence<byte>, how to get its base64-encoded representation?
There's ReadOnlySpan<Byte> overload, but sequence is... a sequence of spans, not a single span.
I know, that it's possible to make an array from sequence, but this is what I want to avoid: I use RecyclableMemoryStream and trying to avoid extra memory allocations.
Any solution?
You need something which can keep state as you pass in different blocks of bytes. That thing is a ToBase64Transform.
Annoyingly, none of the methods on it take a Memory or ROS (apart from the async ones) -- they're all array-based. I guess you could have a scratch array which you copy chunks into, then pass those chunks to ToBase64Transform, but at that point you might as well just use Convert.ToBase64String on each chunk and concatenate the results.
You could pass ToBase64Transform to a CryptoStream. You could then call Stream.Write(ReadOnlySpan<byte>), but that's implemented to copy the span into an array (and CryptoStream doesn't override this), which is what you're trying to avoid.
However, CryptoStream does override WriteByte, so that might be your best bet.
Something like:
using var writer = new StringWriter();
using (var stream = new CryptoStream(writer, new ToBase64Transform(), CryptoStreamMode.Write))
{
foreach (var memory in readOnlySequence)
{
if (MemoryMarshal.TryGetArray(memory, out var segment))
{
stream.Write(segment.Array, segment.Offset, segment.Count);
}
else
{
var span = memory.Span;
for (int i = 0; i < span.Length; i++)
{
stream.WriteByte(span[i]);
}
}
}
}
string result = writer.ToString();
At this point, though, it's looking like it might be neater to just use Base64.EncodeToUtf8 and a StringWriter yourself, manually...

How do I properly emit binary data from a SecureString, so that it can later be converted to a string?

I have strings of sensitive information that I need to collect from my users. I am using a WPF PasswordBox to request this information. For the uninitiated, the PasswordBox control provides a SecurePassword property which is a SecureString object rather than an insecure string object. Within my application, the data from the PasswordBox gets passed as a SecureString to an encryption method.
What I need to be able to do is marshal the data to a byte array that essentially represents a .Net string value without first converting the data to a .Net string. More specifically, given a SecureString with a value such as...
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!##$%^&*()_-+={[}]|:;"'<,>.?/ ≈篭母
...how can I convert it to a byte array that is the equivalent a .Net string that's been serialized and written to a stream with a StreamWriter?
By using Marshal.SecureStringToCoTaskMemUnicode(...) I am able to do this with more traditional, western text. However, when I created the above text string using additional, not-typical characters and a string of Japanese text (see the last few bolded characters) my method of getting a Unicode byte array assigned to the IntPtr position doesn't seem to properly work anymore.
How can I emit the data of a SecureString in a secure way such that the returned byte data is structured the same as the byte data of a standard .Net string, serialized to binary output?
NOTE
Please ignore all security concerns at the moment. I am working on making various security upgrades to my application. For now, I need to use a SecureString for getting the sensitive data to the encryptor. The decryptor (for now) will still need to decrypt this data to string values, which is why I need to some how serialize the data in the the SecureString to a binary format similar to the binary format of the string object.
I agree that this approach is a bit unfortunate, however, I'm having to make incremental improvements on an existing application, and the first phase is locking down the data in SecureString objects from the user to the encryptor.
If you need to write secure string to stream, I'd suggest to create method like this:
public static class Extensions {
public static void WriteSecure(this StreamWriter writer, SecureString sec) {
int length = sec.Length;
if (length == 0)
return;
IntPtr ptr = Marshal.SecureStringToBSTR(sec);
try {
// each char in that string is 2 bytes, not one (it's UTF-16 string)
for (int i = 0; i < length * 2; i += 2) {
// so use ReadInt16 and convert resulting "short" to char
var ch = Convert.ToChar(Marshal.ReadInt16(ptr + i));
// write
writer.Write(ch);
}
}
finally {
// don't forget to zero memory
Marshal.ZeroFreeBSTR(ptr);
}
}
}
If you really need byte array - you can reuse this method too:
byte[] result;
using (var ms = new MemoryStream()) {
using (var writer = new StreamWriter(ms)) {
writer.WriteSecure(secureString);
}
result = ms.ToArray();
}
Though method from first comment might be a bit more pefomant (not sure if that's important for you).

How do I find strings inside a memory dumped byte array converted to UTF8 encoded string?

I'm working on a video game cheat engine with utilizes simple memory manipulation to achieve its goal. I have successfully been able to write a piece of code that dumps a process' memory into a byte[] and iterates over these arrays in search of the desired string. The piece of code that searches is thus:
public bool FindString(byte[] bytes, string pName, long offset)
{
string s = System.Text.Encoding.UTF8.GetString(bytes);
var match = Regex.Match(s, "test");
if (match.Success)
return true;
return false;
}
I then open up a 32-bit version of notepad (since that is what my dumping method is conditioned for) and type the word "test" in it and run my program in debug mode to see if the condition is ever hit. It does not.
Upon further inspect I check out the 's' string's contents on one of the iterations, it is thus:
\0\0\0\0\0\0\0\0���\f\0\u0001����\u0001\0\0\0 \u0001�\0\0\0\0\0 \u0001�\0\0\0\0\0\0\0�\0\0\0\0\0\0\0�\0\0\0\0\0\u0010\0\0\0\0\0\0\0 \a�\0\0\0\0\0\0\0�\0\0\0\0\0\u000f\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\0�\0\0\0\0\0\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0����\f\0\0\0\0\0\0\0�\0\0����\0\0\0\0\0\0\u0010\0\0\0\0\0\0 \0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0�\0\0\0\0\0\0\0�����\u007f\0\0\u0002\0�\u0002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\u001f\0\0\0\0\0\0\0��������\u0010\u0001�\0\0\0\0\0\u0010\u0001�\0\0\0\0\0\u0018\0�\0\0\0\0\0\u0018\0�\0\0\0\0\0\0\0\0\0\0\0\0\0�\u0002�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\00\a�\0\0\0\0\00\a�\0\0\0\0\0�\u0002�\0\0\0\0\0�M�^\u000e\u000e_\u007f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`\a\0\0\0\0\0\0`\a\0\0\0\0\0\0\u0004\0\0\0\0\0\0\0\0�\u001f\0\0\0\0\0�\u001d\u0014)�\u007f\0\0����\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\a\0\u0002\0\0\0\0\0\0\0\0\0\0\0\0�\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0P\u0001�\0\0\0\0\0\0\u0003�\0\0\0\0\0\u0010\u0003�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�
I continued to check each pass-through of this method for the 's' variable and found that I could not see any strings in this format.
My question is simple. What am I doing wrong that I cannot find this string? The dumping is succeeding, but something to do with my method of parsing is causing me trouble.
UPDATE (code for dumping memory)
void ScanProcess(Process process)
{
// getting minimum & maximum address
var sys_info = new SYSTEM_INFO();
GetSystemInfo(out sys_info);
var proc_min_address = sys_info.minimumApplicationAddress;
var proc_max_address = sys_info.maximumApplicationAddress;
var proc_min_address_l = (long)proc_min_address;
var proc_max_address_l = (long)proc_max_address;
//Opening the process with desired access level
var processHandle = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_WM_READ, false, process.Id);
var mem_basic_info = new MEMORY_BASIC_INFORMATION();
var bytesRead = 0; // number of bytes read with ReadProcessMemory
while (proc_min_address_l < proc_max_address_l)
{
VirtualQueryEx(processHandle, proc_min_address, out mem_basic_info, 28); //28 = sizeof(MEMORY_BASIC_INFORMATION)
//If this memory chunk is accessible
if (mem_basic_info.Protect == PAGE_READWRITE && mem_basic_info.State == MEM_COMMIT)
{
//Read everything into a buffer
byte[] buffer = new byte[mem_basic_info.RegionSize];
ReadProcessMemory((int)processHandle, mem_basic_info.BaseAddress, buffer, mem_basic_info.RegionSize, ref bytesRead);
var MemScanner = new MemScan();
Memscanner.FindString(buffer, process.ProcessName, proc_max_address_l);
}
// move to the next memory chunk
proc_min_address_l += mem_basic_info.RegionSize;
proc_min_address = new IntPtr(proc_min_address_l);
if (mem_basic_info.RegionSize == 0)
{
break;
mem_basic_info.RegionSize = 4096;
}
}
}
For starters you can't use NotePad (or any non-binary capable viewing tool to look at your bytes).
You need to use the BitConverter APIs:
https://msdn.microsoft.com/en-us/library/system.bitconverter(v=vs.110).aspx
...to walk the data and compose/search the data to find what you're looking for (keeping whatever encoding you dumped the data in in mind).
BTW - Here's a useful HexEditor: http://www.hexworkshop.com/
I don´t know what MemScan.FindString() does, but I guess the problem is that you are searching a string for a string, rather than for a byte array in a byte array.
By transforming the memory contents using System.Text.Encoding.UTF8.GetString(bytes); you assume that everything stored in memory can be interpreted as valid UTF8 encoding.
Your FindString() must accept parameters as byte[] rather than string, and you need to figure out how the process name is stored in memory (most likely UTF-16).

How to compare files using Byte Array and Hash

Background
I am converting media files to a new format and need a way of knowing if I've previously in current runtime, converted a file.
My solution
To hash each file and store the hash in an array. Each time I go to convert a file I hash it and check the hash against the hashes stored in the array.
Problem
My logic doesn't seem able to detect when I've already seen a file and I end up converting the same file multiple times.
Code
//Byte array of already processed files
private static readonly List<byte[]> Bytelist = new List<byte[]>();
public static bool DoCheck(string file)
{
FileInfo info = new FileInfo(file);
while (FrmMain.IsFileLocked(info)) //Make sure file is finished being copied/moved
{
Thread.Sleep(500);
}
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
if (Bytelist.Count != 0)
{
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash == item)
{
return true;
}
}
}
Bytelist.Add(myHash);
return false;
}
Question
Is there more efficient way of trying to acheive my end goal? What am I doing wrong?
There are multiple questions, I'm going to answer the first one:
Is there more efficient way of trying to acheive my end goal?
TL;DR yes.
You're storing hashes and comparing hashes only for the files, which is a really expensive operation. You can do other checks before calculating the hash:
Is the file size the same? If not, go to the next check.
Are the first bunch of bytes the same? If not, go to the next check.
At this point you have to check the hashes (MD5).
Of course you will have to store size/first X bytes/hash for each processed file.
In addition, same MD5 doesn't mean the files are the same so you might want to take an extra step to check if they're really the same, but this might be an overkill, depends on how heavy the cost of reprocessing the file is, might be more important not to calculate expensive hashes.
EDIT: The second question: is likely to fail as you are comparing the reference of two byte arrays that will never be the same as you create a new one every time, you need to create a sequence equal comparison between byte[]. (Or convert the hash to a string and compare strings then)
var exists = Bytelist.Any(hash => hash.SequenceEqual(myHash));
Are you sure this new file format doesn't add extra meta data into
the content? like last modified, or attributes that change ?
Also, if you are converting to a known format, then there should be a
way using a file signature to know if its already in this format or
not, if this is your format, then add some extra bytes for signature to identify it.
Don't forget that if your app gets closed and opened again it will
reporcess all files again by your approach.
Another last point regarding the code, I prefer not storing byte
arrays, but if you should, its better you create HashSet
instead of list, it has an access time of O(1).
There's a lot of room for improvement with regard to efficiency, effectiveness and style, but this isn't CodeReview.SE, so I'll try to stick the problem at hand:
You're checking if a two byte arrays are equivalent by using the == operator. But that will only perform reference equality testing - i.e. test if the two variables point to the same instance, the very same array. That, of course, won't work here.
There are many ways to do it, starting with a simple foreach loop over the arrays (with an optimization that checks the length first, probably) or using Enumerable.SequenceEquals as you can find in this answer here.
Better yet, convert your hash's byte[] to a string (any string - Convert.ToBase64String would be a good choice) and store that in your Bytelist cache (which should be a Hashset, not a List). Strings are optimized for these sort of comparisons, and you won't run into the "reference equality" problem here.
So a sample solution would be this:
private static readonly HashSet<string> _computedHashes = new HashSet<string>();
public static bool DoCheck(string file)
{
/// stuff
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
string hashString = Convert.ToBase64String(myHash);
return _computedHashes.Contains(hashString);
}
Presumably, you'll add the hash to the _computedHashes set after you've done the conversion.
You have to compare the byte arrays item by item:
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash.Length == item.Length)
{
bool isequal = true;
for (int i = 0; i < myHash.Length; i++)
{
if (myHash[i] != item[i])
{
isequal = false;
}
}
if (isequal)
{
return true;
}
}
}

Which is a better way to compare 2 files?

I have the following situation in C#:
ZipFile z1 = ZipFile.Read("f1.zip");
ZipFile z2 = ZipFile.Read("f2.zip");
MemoryStream ms1 = new MemoryStream();
MemoryStream ms2 = new MemoryStream()
ZipEntry zipentry1 = zip1["f1.dll"];
ZipEntry zipentry1 = zip2["f2.dll"];
zipentry1.Extract(ms1);
zipentry2.Extract(ms2);
byte[] b1 = new byte[ms1.Length];
byte[] b2 = new byte[ms2.Length];
ms1.Seek(0, SeekOrigin.Begin);
ms2.Seek(0, SeekOrigin.Begin);
what I have done here is opened 2 zip files f1.zip and f2.zip. Then I extract 2 files inside them (f1.txt and f2.txt inside f1.zip and f2.zip respectively) onto the MemoryStream objects. I now want to compare the files and find out if they are the same or not. I had 2 ways in mind:
1) Read the memory streams byte by byte and compare them.
For this I would use
ms1.BeginRead(b1, 0, (int) ms1.Length, null, null);
ms2.BeginRead(b2, 0, (int) ms2.Length, null, null);
and then run a for loop and compare each byte in b1 and b2.
2) Get the string values for both the memory streams and then do a string compare. For this I would use
string str1 = Encoding.UTF8.GetString(ms1.GetBuffer(), 0, (int)ms1.Length);
string str2 = Encoding.UTF8.GetString(ms2.GetBuffer(), 0, (int)ms2.Length);
and then do a simple string compare.
Now, I know comparing byte by byte will always give me a correct result. But the problem with it is, it will take a lot time as I have to do this for thousands of files. That is why I am thinking about the string compare method which looks to find out if the files are equal or not very quickly. But I am not sure if string compare will give me the correct result as the files are either dlls or media files etc and will contain special characters for sure.
Can anyone tell me if the string compare method will work correctly or not ?
Thanks in advance.
P.S. : I am using DotNetLibrary.
The baseline for this question is the native way to compare arrays: Enumerable.SequenceEqual. You should use that unless you have good reason to do otherwise.
If you care about speed, you could attempt to p/invoke to memcmp in msvcrt.dll and compare the byte arrays that way. I find it hard to imagine that could be beaten. Obviously you'd do a comparison of the lengths first and only call memcmp if the two byte arrays had the same length.
The p/invoke looks like this:
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] lhs, byte[] rhs, UIntPtr count);
But you should only contemplate this if you really do care about speed, and the pure managed alternatives are too slow for you. So, do some timings to make sure you are not optimising prematurely. Well, even to make sure that you are optimising at all.
I'd be very surprised if converting to string was fast. I'd expect it to be slow. And in fact I'd expect your code to fail because there's no reason for your byte arrays to be valid UTF-8. Just forget you ever had that idea!
Compare ZipEntry.Crc and ZipEntry.UncompressedSize of the two files, only if they match uncompress and do the byte comparison. If the two files are the same, their CRC and Size will be the same too. This strategy will save you a ton of CPU cycles.
ZipEntry zipentry1 = zip1["f1.dll"];
ZipEntry zipentry2 = zip2["f2.dll"];
if (zipentry1.Crc == zipentry2.Crc && zipentry1.UncompressedSize == zipentry2.UncompressedSize)
{
// uncompress
zipentry1.Extract(ms1);
zipentry2.Extract(ms2);
byte[] b1 = new byte[ms1.Length];
byte[] b2 = new byte[ms2.Length];
ms1.Seek(0, SeekOrigin.Begin);
ms2.Seek(0, SeekOrigin.Begin);
ms1.BeginRead(b1, 0, (int) ms1.Length, null, null);
ms2.BeginRead(b2, 0, (int) ms2.Length, null, null);
// perform a byte comparison
if (Enumerable.SequenceEqual(b1, b2)) // or a simple for loop
{
// files are the same
}
else
{
// files are different
}
}
else
{
// files are different
}

Categories