Get base64-encoded string from ReadOnlySequence<byte> - c#

Given ReadOnlySequence<byte>, how to get its base64-encoded representation?
There's ReadOnlySpan<Byte> overload, but sequence is... a sequence of spans, not a single span.
I know, that it's possible to make an array from sequence, but this is what I want to avoid: I use RecyclableMemoryStream and trying to avoid extra memory allocations.
Any solution?

You need something which can keep state as you pass in different blocks of bytes. That thing is a ToBase64Transform.
Annoyingly, none of the methods on it take a Memory or ROS (apart from the async ones) -- they're all array-based. I guess you could have a scratch array which you copy chunks into, then pass those chunks to ToBase64Transform, but at that point you might as well just use Convert.ToBase64String on each chunk and concatenate the results.
You could pass ToBase64Transform to a CryptoStream. You could then call Stream.Write(ReadOnlySpan<byte>), but that's implemented to copy the span into an array (and CryptoStream doesn't override this), which is what you're trying to avoid.
However, CryptoStream does override WriteByte, so that might be your best bet.
Something like:
using var writer = new StringWriter();
using (var stream = new CryptoStream(writer, new ToBase64Transform(), CryptoStreamMode.Write))
{
foreach (var memory in readOnlySequence)
{
if (MemoryMarshal.TryGetArray(memory, out var segment))
{
stream.Write(segment.Array, segment.Offset, segment.Count);
}
else
{
var span = memory.Span;
for (int i = 0; i < span.Length; i++)
{
stream.WriteByte(span[i]);
}
}
}
}
string result = writer.ToString();
At this point, though, it's looking like it might be neater to just use Base64.EncodeToUtf8 and a StringWriter yourself, manually...

Related

What is the best way to read and then update record in a binary file with c#

I'm trying to edit some records in a binary file, but I just can't seem to get the hang of it.
I can read the file, but than I can't find the position where I want the record to edit, so I can replace.
This is my code so far:
public MyModel Put(MyModel exMyModel)
{
List<MyModel> list = new List<MyModel>();
try
{
IFormatter formatter = new BinaryFormatter();
using (Stream stream = new FileStream(_exMyModel, FileMode.Open, FileAccess.Read, FileShare.Read))
{
while (stream.Position < stream.Length)
{
var obj = (MyModel)formatter.Deserialize(stream);
list.Add(obj);
}
MyModel mymodel = list.FirstOrDefault(i => i.ID == exMyModel.ID);
mymodel.FirstName = exMyModel.FirstName;
mymodel.PhoneNumber = exMyModel.PhoneNumber;
// Now I want to update the current record with this new object
// ... code to update
}
return phoneBookEntry;
}
catch (Exception ex)
{
Console.WriteLine("The error is " + ex.Message);
return null;
}
}
I'm really stuck here guys. Any help would be appreciated.
I already checked these answers:
answer 1
answer 2
Thank you in advance :)
I would recommend just writing all objects back to the stream. You could perhaps just write the changed object and each after it, but I would not bother.
Start by resetting the stream: stream.Position = 0. You can then write a loop an serialize each object using formatter.Serialize(stream, object)
If this is a coding task I guess you have no choice in the matter. But you should know that BinaryFormatter has various problems. It more or less saves the objects the same way they are stored in memory. This is inefficient, insecure, and changes to the classes may prevent you from deserializing stored objects. The most common serialization method today is json, but there are also binary alternatives like protobuf.net.
How you update the file is going to rely pretty heavily on whether or not your records serialize as fixed length.
Variable-Length Records
Since you're using strings in the record then any change in string length (as serialized bytes) or anything other change that affects the length of the serialized object will make it impossible to do an in-place update of the record.
With that in mind you're going to have to do some extra work.
First, test the objects inside the read loop. Capture current position before you deserialize each object, test the object for equivalence, save the offset when you find the record you're looking for then deserialize the rest of the objects in the stream... or copy the rest of the stream to a MemoryStream instance for later.
Next, set stream.Position and stream.Length equal to the start position of the record you're updating, truncating the file. Serialize the new copy of the record into the stream, then copy the MemoryStream that holds the rest of the records back into the stream... or capture and serialize the rest of the objects.
In other words (untested but showing the general structure):
public MyModel Put(MyModel exMyModel)
{
try
{
IFormatter formatter = new BinaryFormatter();
using (Stream stream = File.Open(_exMyModel))
using (var buffer = new MemoryStream())
{
long location = -1;
while (stream.Position < stream.Length)
{
var position = stream.Position;
var obj = (MyModel)formatter.Deserialize(stream);
if (obj.ID == exMyModel.ID)
{
location = position;
stream.CopyTo(buffer);
buffer.Position = 0;
stream.Position = stream.Length = position;
}
}
formatter.Serialize(stream);
if (location > 0 && buffer.Length > 0)
{
buffer.CopyTo(stream);
}
}
return phoneBookEntry;
}
catch (Exception ex)
{
Console.WriteLine("The error is " + ex.Message);
return null;
}
}
Note that in general a MemoryStream holding the serialized data will be faster and take less memory than deserializing the records and then serializing them again.
Static-Length Records
This is unlikely, but in the case that your record type is annotated in such a way that it always serializes to the same number of bytes then you can skip everything to do with the MemoryStream and truncating the binary file. In this case just read records until you find the right one, rewind the stream to that position (after the read) and write a new copy of the record.
You'll have to examine the classes yourself to see what sort of serialization modifier attributes are on the string properties, and I'd suggest testing this extensively with different string values to ensure that you're actually getting the same data length for all of them. Adding or removing a single byte will screw up the remainder of the records in the file.
Edge Case - Same Length Strings
Since replacing a record with data that's the same length only requires an overwrite, not a rewrite of the file, you might get some use out of testing the record length before grabbing the rest of the file. If you get lucky and the modified record is the same length then just seek back to the right position and write the data in-place. That way if you have a file with a ton of records in it you'll get a much faster update whenever the length is the same.
Changing Format...
You said that this is a coding task so you probably can't take this option, but if you can alter the storage format... let's just say that BinaryFormatter is definitely not your friend. There are much better ways to do it if you have the option. SQLite is my binary format of choice :)
Actually, since this appears to be a coding test you might want to make a point of that. Write the code they asked for, then if you have time write a better format that doesn't rely on BinaryFormatter, or throw SQLite at the problem. Using an ORM like LinqToDB makes SQLite trivial. Explain to them that the file format they're using is inherently unstable and should be replaced with something that is both stable, supported and efficient.

How do I find strings inside a memory dumped byte array converted to UTF8 encoded string?

I'm working on a video game cheat engine with utilizes simple memory manipulation to achieve its goal. I have successfully been able to write a piece of code that dumps a process' memory into a byte[] and iterates over these arrays in search of the desired string. The piece of code that searches is thus:
public bool FindString(byte[] bytes, string pName, long offset)
{
string s = System.Text.Encoding.UTF8.GetString(bytes);
var match = Regex.Match(s, "test");
if (match.Success)
return true;
return false;
}
I then open up a 32-bit version of notepad (since that is what my dumping method is conditioned for) and type the word "test" in it and run my program in debug mode to see if the condition is ever hit. It does not.
Upon further inspect I check out the 's' string's contents on one of the iterations, it is thus:
\0\0\0\0\0\0\0\0���\f\0\u0001����\u0001\0\0\0 \u0001�\0\0\0\0\0 \u0001�\0\0\0\0\0\0\0�\0\0\0\0\0\0\0�\0\0\0\0\0\u0010\0\0\0\0\0\0\0 \a�\0\0\0\0\0\0\0�\0\0\0\0\0\u000f\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\0�\0\0\0\0\0\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0����\f\0\0\0\0\0\0\0�\0\0����\0\0\0\0\0\0\u0010\0\0\0\0\0\0 \0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\0\u0010\0\0\0\0\0\0�\0\0\0\0\0\0\0�����\u007f\0\0\u0002\0�\u0002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\u000f�\0\0\0\0\0�\u000f�\0\0\0\0\0\u001f\0\0\0\0\0\0\0��������\u0010\u0001�\0\0\0\0\0\u0010\u0001�\0\0\0\0\0\u0018\0�\0\0\0\0\0\u0018\0�\0\0\0\0\0\0\0\0\0\0\0\0\0�\u0002�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\00\a�\0\0\0\0\00\a�\0\0\0\0\0�\u0002�\0\0\0\0\0�M�^\u000e\u000e_\u007f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\0\0\0\u0010\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`\a\0\0\0\0\0\0`\a\0\0\0\0\0\0\u0004\0\0\0\0\0\0\0\0�\u001f\0\0\0\0\0�\u001d\u0014)�\u007f\0\0����\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\a\0\u0002\0\0\0\0\0\0\0\0\0\0\0\0�\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\0\0\0P\u0001�\0\0\0\0\0\0\u0003�\0\0\0\0\0\u0010\u0003�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�
I continued to check each pass-through of this method for the 's' variable and found that I could not see any strings in this format.
My question is simple. What am I doing wrong that I cannot find this string? The dumping is succeeding, but something to do with my method of parsing is causing me trouble.
UPDATE (code for dumping memory)
void ScanProcess(Process process)
{
// getting minimum & maximum address
var sys_info = new SYSTEM_INFO();
GetSystemInfo(out sys_info);
var proc_min_address = sys_info.minimumApplicationAddress;
var proc_max_address = sys_info.maximumApplicationAddress;
var proc_min_address_l = (long)proc_min_address;
var proc_max_address_l = (long)proc_max_address;
//Opening the process with desired access level
var processHandle = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_WM_READ, false, process.Id);
var mem_basic_info = new MEMORY_BASIC_INFORMATION();
var bytesRead = 0; // number of bytes read with ReadProcessMemory
while (proc_min_address_l < proc_max_address_l)
{
VirtualQueryEx(processHandle, proc_min_address, out mem_basic_info, 28); //28 = sizeof(MEMORY_BASIC_INFORMATION)
//If this memory chunk is accessible
if (mem_basic_info.Protect == PAGE_READWRITE && mem_basic_info.State == MEM_COMMIT)
{
//Read everything into a buffer
byte[] buffer = new byte[mem_basic_info.RegionSize];
ReadProcessMemory((int)processHandle, mem_basic_info.BaseAddress, buffer, mem_basic_info.RegionSize, ref bytesRead);
var MemScanner = new MemScan();
Memscanner.FindString(buffer, process.ProcessName, proc_max_address_l);
}
// move to the next memory chunk
proc_min_address_l += mem_basic_info.RegionSize;
proc_min_address = new IntPtr(proc_min_address_l);
if (mem_basic_info.RegionSize == 0)
{
break;
mem_basic_info.RegionSize = 4096;
}
}
}
For starters you can't use NotePad (or any non-binary capable viewing tool to look at your bytes).
You need to use the BitConverter APIs:
https://msdn.microsoft.com/en-us/library/system.bitconverter(v=vs.110).aspx
...to walk the data and compose/search the data to find what you're looking for (keeping whatever encoding you dumped the data in in mind).
BTW - Here's a useful HexEditor: http://www.hexworkshop.com/
I don´t know what MemScan.FindString() does, but I guess the problem is that you are searching a string for a string, rather than for a byte array in a byte array.
By transforming the memory contents using System.Text.Encoding.UTF8.GetString(bytes); you assume that everything stored in memory can be interpreted as valid UTF8 encoding.
Your FindString() must accept parameters as byte[] rather than string, and you need to figure out how the process name is stored in memory (most likely UTF-16).

How to compare files using Byte Array and Hash

Background
I am converting media files to a new format and need a way of knowing if I've previously in current runtime, converted a file.
My solution
To hash each file and store the hash in an array. Each time I go to convert a file I hash it and check the hash against the hashes stored in the array.
Problem
My logic doesn't seem able to detect when I've already seen a file and I end up converting the same file multiple times.
Code
//Byte array of already processed files
private static readonly List<byte[]> Bytelist = new List<byte[]>();
public static bool DoCheck(string file)
{
FileInfo info = new FileInfo(file);
while (FrmMain.IsFileLocked(info)) //Make sure file is finished being copied/moved
{
Thread.Sleep(500);
}
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
if (Bytelist.Count != 0)
{
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash == item)
{
return true;
}
}
}
Bytelist.Add(myHash);
return false;
}
Question
Is there more efficient way of trying to acheive my end goal? What am I doing wrong?
There are multiple questions, I'm going to answer the first one:
Is there more efficient way of trying to acheive my end goal?
TL;DR yes.
You're storing hashes and comparing hashes only for the files, which is a really expensive operation. You can do other checks before calculating the hash:
Is the file size the same? If not, go to the next check.
Are the first bunch of bytes the same? If not, go to the next check.
At this point you have to check the hashes (MD5).
Of course you will have to store size/first X bytes/hash for each processed file.
In addition, same MD5 doesn't mean the files are the same so you might want to take an extra step to check if they're really the same, but this might be an overkill, depends on how heavy the cost of reprocessing the file is, might be more important not to calculate expensive hashes.
EDIT: The second question: is likely to fail as you are comparing the reference of two byte arrays that will never be the same as you create a new one every time, you need to create a sequence equal comparison between byte[]. (Or convert the hash to a string and compare strings then)
var exists = Bytelist.Any(hash => hash.SequenceEqual(myHash));
Are you sure this new file format doesn't add extra meta data into
the content? like last modified, or attributes that change ?
Also, if you are converting to a known format, then there should be a
way using a file signature to know if its already in this format or
not, if this is your format, then add some extra bytes for signature to identify it.
Don't forget that if your app gets closed and opened again it will
reporcess all files again by your approach.
Another last point regarding the code, I prefer not storing byte
arrays, but if you should, its better you create HashSet
instead of list, it has an access time of O(1).
There's a lot of room for improvement with regard to efficiency, effectiveness and style, but this isn't CodeReview.SE, so I'll try to stick the problem at hand:
You're checking if a two byte arrays are equivalent by using the == operator. But that will only perform reference equality testing - i.e. test if the two variables point to the same instance, the very same array. That, of course, won't work here.
There are many ways to do it, starting with a simple foreach loop over the arrays (with an optimization that checks the length first, probably) or using Enumerable.SequenceEquals as you can find in this answer here.
Better yet, convert your hash's byte[] to a string (any string - Convert.ToBase64String would be a good choice) and store that in your Bytelist cache (which should be a Hashset, not a List). Strings are optimized for these sort of comparisons, and you won't run into the "reference equality" problem here.
So a sample solution would be this:
private static readonly HashSet<string> _computedHashes = new HashSet<string>();
public static bool DoCheck(string file)
{
/// stuff
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
string hashString = Convert.ToBase64String(myHash);
return _computedHashes.Contains(hashString);
}
Presumably, you'll add the hash to the _computedHashes set after you've done the conversion.
You have to compare the byte arrays item by item:
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash.Length == item.Length)
{
bool isequal = true;
for (int i = 0; i < myHash.Length; i++)
{
if (myHash[i] != item[i])
{
isequal = false;
}
}
if (isequal)
{
return true;
}
}
}

Which class should I use to write binary data to a buffer (say a List of bytes)

I would like to encode data into a binary format in a buffer which I will later either write to a file or transfer over a socket. What C# class or classes would be best to use to create a List<byte> containing the binary data.
I will be storing integers, single byte character strings (i.e., ASCII), floating point numbers and other data in this buffer in a custom encoded format (for the strings) and regular binary numeric layout for the ints and floating point types.
BinaryWriter looks like it has the methods I need, but it has to manage a growing buffer for me that I want to produce a List<byte> result from when I am done encoding.
Thanks
BinaryWriter, writing to a MemoryStream. If you need more than the available memory, you can easily switch to a temporary file stream.
using (var myStream = new MemoryStream()) {
using (var myWriter = new BinaryWriter(myStream)) {
// write here
}
using (var myReader = new BinaryReader(myStream)) {
// read here
}
// put the bytes into an array...
var myBuffer = myStream.ToArray();
// if you *really* want a List<Byte> (you probably don't- see my comment)
var myBytesList = myStream.ToArray().ToList();
}
BinaryWriter writes to a stream. Give it a MemoryStream, and when you want your List<byte>, use new List<byte>(stream.GetBuffer()).

How to create big sized .txt file?

For certain reasons, I have to create a 1024 kb .txt file.
Below is my current code:
int size = 1024000 //1024 kb..
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
//newPath is local directory, where I store the created file
using (System.IO.StreamWriter sw = File.CreateText(newPath))
{
sw.WriteLine(tobewritten);
}
I have to wait at least 30 minutes to execute this piece of code, which I consider too long.
Now, I would like to ask for advice on how to actually achieve my mentioned objective effectively. Are there any alternatives to do this task? Am I writing bad code? Any help is appreciated.
There are several misunderstandings in the code you provided:
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
You seem to think that your are initializing each byte in your array bytearray with zero. Instead you just set the loop variable bit (unfortunate naming) to zero size times. Actually this code wouldn't even compile since you cannot assign to the foreach iteration variable.
Also you didn't need initialization here in the first place: byte array elements are automatically initialized to 0.
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
You want to combine the string representation of each byte in your array to the string variable tobewritten. Since strings are immutable you create a new string for each element that has to be garbage collected along with the string you created for bit, this is relatively expensive, especially when you create 2048000 one of them - use a Stringbuilder instead.
Lastly all of that is not needed at all anyway - it seems you just want to write a bunch of "0" characters to a text file - if you are not worried about creating a single large string of zeros (it depends on the value of size whether this makes sense) you can just create the string directly to do this one go - or alternatively write a smaller string directly to the stream a bunch of times.
using (var file = File.CreateText(newpath))
{
file.WriteLine(new string('0', size));
}
Replace the string with a pre-sized StringBuilder to avoid unnecessary allocations.
Or, better yet, write each piece directly to the StreamWriter instead of pointlessly building a 100MB in-memory string first.

Categories