Fastest way to read binary representation of data [closed]

Fastest way to read binary representation of data [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am trying to read a file (>150 mb) and I need to read the binary representation of that file.
The file type is a .MP4.
I am trying to use this:
string.Join("-", x.Select(byt => Convert.ToString(byt, 2).PadLeft(8, '0')));
but the problems are:
1) It is too slow
2) It uses a lot of RAMs memory
If I read the raw bytes with
File.ReadAllBytes(path);
How Can I do that without having to convert the file into a string (method below)?

When working with big files like in your case, it would be better to just view a small part of the file (It's not like you can show the entire file at once anyhow).
Some Streams (like the FileStream) have the ability to Seek a certain position, which you can use to set your starting position.
if(position > _stream.Length)
throw new IndexOutOfRangeException();
if (position + length > _stream.Length)
length = (int) (_stream.Length - position);
_stream.Seek(position, SeekOrigin.Begin);
_stream.Read(buffer, 0, length);
The conversion to binary isn't that hard eiter, depending on the bit order you want, you'll probably have to reverse this (this is highest bit left 1 = 00000001). To gain some performance when building the string, use a StringBuilder instead of just concating strings with += or +.
public string ToBinary(byte value)
{
string result = "";
for (int i = 0; i < 8; i++)
{
result = value%2 + result;
value /= 2;
}
return result;
}
private string ToBinary(byte[] values)
{
StringBuilder builder = new StringBuilder();
int column = 0;
foreach (byte value in values)
{
builder.Append(ToBinary(value) + " ");
column++;
if (column == 8)
{
builder.AppendLine();
column = 0;
}
}
return builder.ToString();
}
Can can then eiter use it in a console application
https://dotnetfiddle.net/GVLm27
or put those two together with a TextBox and a ScrollBar and you have a good starting point:
ong position = (long) scrollBar1.Value;
byte[] data = new byte[128];
_file.GetSection(data, position, data.Length);
textBox1.Text = ToBinary(data);
After all those comments on your question I hope the original title is still what you are after
C# Fastest way to read binary representation of data

Related

Converting a String to hex and calculate binary result in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I got stuck with my code, tried anything on this side and many other things Google showed.
To the problem:
I try to convert some code-snips from C# to phyton, but on this special point i got stuck.
public static long decode(string data, int size, int offset = 0)
{
long value = 0;
for (int i = 0; i < size; ++i) {
value <<= 6;
value |= (long)data[offset + i] - 0x30;
}
return value;
}
The String Data could be something like 1Dh. Based on this I convert each char to the hex-equivalent: 0x31, 0x44, 0x68 and subtract 0x30; so I get 0x1, 0x14, 0x38;
In the next step I have to convert to the binary equivalent 000001, 010100, 111000 and merge this to
000001010100111000. From this I want to get the integer meaning, in this case 5432.
Is there a possibility to do this in a smart and easy way in python?

It's actually pretty easy, and the translation is pretty straight forward. You can continue to use your bit shifting. The only change is the syntax of for-loop and using ord() to get the integer value from a character.
def decode(data, size, offset=0):
value = 0
for ch in data[offset:size]:
value <<= 6
value |= ord(ch) - 0x30
return value
Running this in the interpreter, I get 5432:
>>> decode("1Dh", 3)
5432

C# Program performance with big bytearrays [duplicate]

This question already has answers here:
byte[] to hex string [duplicate]
(19 answers)
Closed 8 years ago.
I'm trying to create a simple Hex Editor with C#.
For this I'm writing the file into an byte-array, which works fine. But as soon as I put out the bytes to a Textbox in form of a string, the overall performance of the program becomes pretty bad. For example a 190kb file takes about 40 seconds, till it is displayed in the textbox. While that the program is not responding.
The function:
void open()
{
fullstring = "";
OpenFileDialog op = new OpenFileDialog();
op.ShowDialog();
file = op.FileName;
byte[] fileB = File.ReadAllBytes(file);
long b = fileB.Length;
for (int i = 0; i < fileB.Length; i++)
{
fullstring = fullstring + fileB[i].ToString("X") + " ";
}
textBox9.Text = fullstring;
}
Is there a way to improve performance in this function?

Take a look at this post How do you convert Byte Array to Hexadecimal String, and vice versa?
You can use the code there to output your byte array to text file. One problem you have in your code is that you are using String concatenation instead of StringBuilder. It is better to use StringBuilder otherwise the performance degrades.

Processing Huge Files In C#

I have a 4Gb file that I want to perform a byte based find and replace on. I have written a simple program to do it but it takes far too long (90 minutes+) to do just one find and replace. A few hex editors I have tried can perform the task in under 3 minutes and don't load the entire target file into memory. Does anyone know a method where I can accomplish the same thing? Here is my current code:
public int ReplaceBytes(string File, byte[] Find, byte[] Replace)
{
var Stream = new FileStream(File, FileMode.Open, FileAccess.ReadWrite);
int FindPoint = 0;
int Results = 0;
for (long i = 0; i < Stream.Length; i++)
{
if (Find[FindPoint] == Stream.ReadByte())
{
FindPoint++;
if (FindPoint > Find.Length - 1)
{
Results++;
FindPoint = 0;
Stream.Seek(-Find.Length, SeekOrigin.Current);
Stream.Write(Replace, 0, Replace.Length);
}
}
else
{
FindPoint = 0;
}
}
Stream.Close();
return Results;
}
Find and Replace are relatively small compared with the 4Gb "File" by the way. I can easily see why my algorithm is slow but I am not sure how I could do it better.

Part of the problem may be that you're reading the stream one byte at a time. Try reading larger chunks and doing a replace on those. I'd start with about 8kb and then test with some larger or smaller chunks to see what gives you the best performance.

There are lots of better algorithms for finding a substring in a string (which is basically what you are doing)
Start here:
http://en.wikipedia.org/wiki/String_searching_algorithm
The gist of them is that you can skip a lot of bytes by analyzing your substring. Here's a simple example
4GB File starts with: A B C D E F G H I J K L M N O P
Your substring is: N O P
You skip the length of the substring-1 and check against the last byte, so compare C to P
It doesn't match, so the substring is not the first 3 bytes
Also, C isn't in the substring at all, so you can skip 3 more bytes (len of substring)
Compare F to P, doesn't match, F isn't in substring, skip 3
Compare I to P, etc, etc
If you match, go backwards. If the character doesn't match, but is in the substring, then you have to do some more comparing at that point (read the link for details)

Instead of reading file byte by byte read it by buffer:
buffer = new byte[bufferSize];
currentPos = 0;
length = (int)Stream .Length;
while ((count = Stream.Read(buffer, currentPos, bufferSize)) > 0)
{
currentPos += count;
....
}

Another, easier way of reading more than one byte at a time:
var Stream = new BufferedStream(new FileStream(File, FileMode.Open, FileAccess.ReadWrite));
Combining this with Saeed Amiri's example of how to read into a buffer, and one of the better binary find/replace algorithms should give you better results.

You should try using memory-mapped files. C# supports them starting with version 4.0.
A memory-mapped file contains the contents of a file in virtual memory.
Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.

Search ReadAllBytes for specific values

I am writing a program that reads '.exe' files and stores their hex values in an array of bytes for comparison with an array containing a series of values. (like a very simple virus scanner)
byte[] buffer = File.ReadAllBytes(currentDirectoryContents[j]);
I have then used BitConverter to create a single string of these values
string hex = BitConverter.ToString(buffer);
The next step is to search this string for a series of values(definitions) and return positive for a match. This is where I am running into problems. My definitions are hex values but created and saved in notepad as defintions.xyz
string[] definitions = File.ReadAllLines(#"C:\definitions.xyz");
I had been trying to read them into a string array and compare the definition elements of the array with string hex
bool[] test = new bool[currentDirectoryContents.Length];
test[j] = hex.Contains(definitions[i]);
This IS a section from a piece of homework, which is why I am not posting my entire code for the program. I had not used C# before last Friday so am most likely making silly mistakes at this point.
Any advice much appreciated :)

It is pretty unclear exactly what kind of format you use of the definitions. Base64 is a good encoding for a byte[], you can rapidly convert back and forth with Convert.ToBase64String and Convert.FromBase64String(). But your question suggests the bytes are encoded in hex. Let's assume it looks like "01020304" for a new byte[] { 1, 2, 3, 4}. Then this helper function converts such a string back to a byte[]:
static byte[] Hex2Bytes(string hex) {
if (hex.Length % 2 != 0) throw new ArgumentException();
var retval = new byte[hex.Length / 2];
for (int ix = 0; ix < hex.Length; ix += 2) {
retval[ix / 2] = byte.Parse(hex.Substring(ix, 2), System.Globalization.NumberStyles.HexNumber);
}
return retval;
}
You can now do a fast pattern search with an algorithm like Boyer-Moore.

I expect you understand that this is a very inefficient way to do it. But except for that, you should just do something like this:
bool[] test = new bool[currentDirectoryContents.Length];
for(int i=0;i<test.Length;i++){
byte[] buffer = File.ReadAllBytes(currentDirectoryContents[j]);
string hex = BitConverter.ToString(buffer);
test[i] = ContainsAny(hex, definitions);
}
bool ContainsAny(string s, string[] values){
foreach(string value in values){
if(s.Contains(value){
return true;
}
}
return false;
}
If you can use LINQ, you can do it like this:
var test = currentDirectoryContents.Select(
file=>definitions.Any(
definition =>
BitConverter.ToString(
File.ReadAllBytes(file)
).Contains(definition)
)
).ToArray();
Also, make sure that your definitions-file is formatted in a way that matches the output of BitConverter.ToString(): upper-case with dashes separating each encoded byte:
12-AB-F0-34
54-AC-FF-01-02

Extracting Byte Arrays from a File

I'm trying to read a file and extract 2 blocks of data, let's call them block1 and block2, from the file where the file would contain many blocks of data. Both blocks need to be
returned in a byte array. Block1 would begin at place in the file where the line begins
"block1:" followed by the number of bytes to read. Block2, not necessarily appearing after
block1, would begin at place in the file where the line begins "block2:" followed by the
number of bytes to read. I am limited to .Net 3.5 at the highest.

You can use File.ReadAllBytes and extract your blocks from the returned byte[] using one of the Array.Copy overloads if you know the indexes they are in.

As others have mentioned, without header information you'll need to, at the very least, stream the contents of the file through a filter of some kind looking for your "block" markers.
If you do have header information (or at least some information somewhere as to the offset of your block markers), you could use a memory mapped file:
http://www.developer.com/net/article.php/3828586/Using-Memory-Mapped-Files-in-NET-40.htm
This requires .NET 4.0, although you could also use the Win32 API if you're not using .NET 4.

Without any sort of header information in your file, you'll have to scan the entire file, searching for your block1: or block2: markers.
Update:
Here's a sample of how you'd do this (not necessarily the best implementation):
byte[] GetBlockOfData(string fileName, string blockName)
{
var allBytes = File.ReadAllBytes(fileName);
// Assuming block names are ASCII-encoded
var blockMarker = Encoding.ASCII.GetBytes(blockName + ":");
// Scan for the first byte of the marker
for (var i = 0; i < allBytes.Length; i++)
{
if (allBytes[i] == blockMarker[i])
{
// See if this is the entire marker
var isMatch == true;
for (var j = 0; j < blockMarker.Length; j++)
{
if (allBytes[i + j] != blockMarker[j])
{
isMatch = false;
break;
}
}
if (isMatch)
{
// Assuming it's a byte...
var blockLength = allBytes[i + blockMarker.Length];
var result = new byte[blockLength];
Array.Copy(
allBytes, i + blockMarker.Length + 1, result, 0,
blockLength);
return result;
}
}
}
return null;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fastest way to read binary representation of data [closed] - c#

Related

Converting a String to hex and calculate binary result in python [closed]

C# Program performance with big bytearrays [duplicate]

Processing Huge Files In C#

Search ReadAllBytes for specific values

Extracting Byte Arrays from a File

Categories

Resources