I wrote a quick wave file normalizer in c# using naudio.
Currently it locks the thread and creates 1 KB files. sm is the highest peak of the file
using (WaveFileReader reader = new WaveFileReader(aktuellerPfad))
{
using (WaveFileWriter writer = new WaveFileWriter("temp.wav", reader.WaveFormat))
{
byte[] bytesBuffer = new byte[reader.Length];
int read = reader.Read(bytesBuffer, 0, bytesBuffer.Length);
writer.WriteSample(read *32768/sm);
}
}
You need to do mathematical operation in audio buffer to normalise the signal. The normalising steps would be:
a. Read audio buffer as you are doing. (Although, I would prefer reading in chunks).
byte[] bytesBuffer = new byte[reader.Length];
reader.Read( bytesBuffer, 0, bytesBuffer.Length );
b. Calculate multiplier value. There are different ways to calculate the value. I don't know how you are calculating but it looks the value is 32768/sm. I would denote multiplier as "mul".
c. Now iterate through the buffer and multiply the value with the multiplier.
for ( int i = 0; i < bytesBuffer.Length; i++ )
{
bytesBuffer[i] = bytesBuffer[i] * mul;
}
d. Finally, write samples to file.
writer.WriteSamples( bytesBuffer, 0, bytesBuffer.Length );
Related
If I am given a .cmp file and a byte offset 0x598, how can I read a file from this offset?
I can ofcourse read file bytes like this
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
But how can I read it from byte offset 0x598
To explain a bit more, actually from this offset the actual data starts that I have to read and before this byte offset it is just header data, so basically I have to read file from that offset till end.
Try code like this:
using (BinaryReader reader = new BinaryReader(File.Open("upgradefile.cmp", FileMode.Open)))
{
long offset = 0x598;
if (reader.BaseStream.Length > offset)
{
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[]fileBytes = reader.ReadBytes((int) (reader.BaseStream.Length - offset));
}
}
If you are not familiar with Streams, Linq, or whatever, I have simplest solution for you:
Read entire file into memory (I hope you deal with small files):
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
Calculate how many bytes are present in array after given offset:
long startOffset = 0x598; // this is just hexadecimal representation for human, it can be decimal or whatever
long howManyBytesToRead = fileBytes.Length - startOffset;
Then just copy data to new array:
byte[] newArray = new byte[howManyBytesToRead];
long pos = 0;
for (int i = startOffset; i < fileBytes.Length; i++)
{
newArray[pos] = fileBytes[i];
pos = pos + 1;
}
If you understand how it works you can look at Array.Copy method in Microsoft documentation.
By not using ReadAllBytes.
Get a stream, move to potition, read rest of files.
You basically complain that a convenience method made to allow a one line read of a whole file is not what you want - ignoring that it is just that, a convenience method. The normal way to deal with files is opening them and using a Stream.
first time asking though i have been visiting for some time.
Here's the problem:
I'm currently trying to isolate the base frequenciy of a signal contained in a WAVE data file with these properties:
PCM Audio Format i.e Liner Quantization
8000 Hz Sample Rate
16 Bits Per Sample
16000 Byte Rate
One Channel only there is no interleaving.
Getting the byte value:
System.IO.FileStream WaveFile = System.IO.File.OpenRead(#"c:\tet\fft.wav");
byte[] data = new byte[WaveFile.Length];
WaveFile.Read(data,0,Convert.ToInt32(WaveFile.Length));
Converting it to an Array of Doubles:
for (int i = 0; i < 32768; i++)//this is only for a relatively small chunk of the file
{
InReal[i] =BitConverter.ToDouble(data, (i + 1) * 8 + 44);
}
and finanly passing it to a Transform Function.
FFT FftObject = new FFT();
FftObject.Transform(InReal, InImg, 0, 32768, out outReal, out outImg, false);
Now the first question, as i understand the PCM values of the wav file should be in the boundaries of
-1 and 1, but when converting to Double i get this values:
2.65855908666825E-235
2.84104982662944E-285
-1.58613492930337E+235
-1.25617351166869E+264
1.58370933499389E-242
6.19284549187335E-245
-2.92969500042228E+254
-5.90042665390976E+226
3.11954507295188E-273
3.06831908609091E-217
NaN
2.77113146323761E-302
6.76597919848376E-306
-1.55843653898344E+291
These are the firs few of the array in those limits is the rest of array too.
My conclusion of this is that i have some sort of code malfunction but i can seem to be able to find it.
Any help would be appreciated.
And the second question, because i'm only providing real data to the FFT algorithm in the response vector should i expect only Real part data too??
Thank you very much.
I was finally able to find out what was going wrong it seems that i didn't accounted for the pulse code modulation of the signal in the data representation, and because i found many unanswered questions here on wave file preparing for Fourier transformation here is the code in a function that prepares the wave file.
public static Double[] prepare(String wavePath, out int SampleRate)
{
Double[] data;
byte[] wave;
byte[] sR= new byte[4];
System.IO.FileStream WaveFile = System.IO.File.OpenRead(wavePath);
wave = new byte[WaveFile.Length];
data = new Double[(wave.Length - 44) / 4];//shifting the headers out of the PCM data;
WaveFile.Read(wave,0,Convert.ToInt32(WaveFile.Length));//read the wave file into the wave variable
/***********Converting and PCM accounting***************/
for (int i = 0; i < data.Length - i * 4; i++)
{
data[i] = (BitConverter.ToInt32(wave, (1 + i) * 4)) / 65536.0;
//65536.0.0=2^n, n=bits per sample;
}
/**************assigning sample rate**********************/
for (int i = 24; i < 28; i++)
{
sR[i-24]= wave[i];
}
SampleRate = BitConverter.ToInt32(sR,0);
return data;
}
all you need to do now is to send the sample rate and the returned result to your FFT algorithm.
The code is not handled so do your own handling as needed.
I has been tested for phone recordings, of busy, ringing and speech, it functions correctly.
I have a 4Gb file that I want to perform a byte based find and replace on. I have written a simple program to do it but it takes far too long (90 minutes+) to do just one find and replace. A few hex editors I have tried can perform the task in under 3 minutes and don't load the entire target file into memory. Does anyone know a method where I can accomplish the same thing? Here is my current code:
public int ReplaceBytes(string File, byte[] Find, byte[] Replace)
{
var Stream = new FileStream(File, FileMode.Open, FileAccess.ReadWrite);
int FindPoint = 0;
int Results = 0;
for (long i = 0; i < Stream.Length; i++)
{
if (Find[FindPoint] == Stream.ReadByte())
{
FindPoint++;
if (FindPoint > Find.Length - 1)
{
Results++;
FindPoint = 0;
Stream.Seek(-Find.Length, SeekOrigin.Current);
Stream.Write(Replace, 0, Replace.Length);
}
}
else
{
FindPoint = 0;
}
}
Stream.Close();
return Results;
}
Find and Replace are relatively small compared with the 4Gb "File" by the way. I can easily see why my algorithm is slow but I am not sure how I could do it better.
Part of the problem may be that you're reading the stream one byte at a time. Try reading larger chunks and doing a replace on those. I'd start with about 8kb and then test with some larger or smaller chunks to see what gives you the best performance.
There are lots of better algorithms for finding a substring in a string (which is basically what you are doing)
Start here:
http://en.wikipedia.org/wiki/String_searching_algorithm
The gist of them is that you can skip a lot of bytes by analyzing your substring. Here's a simple example
4GB File starts with: A B C D E F G H I J K L M N O P
Your substring is: N O P
You skip the length of the substring-1 and check against the last byte, so compare C to P
It doesn't match, so the substring is not the first 3 bytes
Also, C isn't in the substring at all, so you can skip 3 more bytes (len of substring)
Compare F to P, doesn't match, F isn't in substring, skip 3
Compare I to P, etc, etc
If you match, go backwards. If the character doesn't match, but is in the substring, then you have to do some more comparing at that point (read the link for details)
Instead of reading file byte by byte read it by buffer:
buffer = new byte[bufferSize];
currentPos = 0;
length = (int)Stream .Length;
while ((count = Stream.Read(buffer, currentPos, bufferSize)) > 0)
{
currentPos += count;
....
}
Another, easier way of reading more than one byte at a time:
var Stream = new BufferedStream(new FileStream(File, FileMode.Open, FileAccess.ReadWrite));
Combining this with Saeed Amiri's example of how to read into a buffer, and one of the better binary find/replace algorithms should give you better results.
You should try using memory-mapped files. C# supports them starting with version 4.0.
A memory-mapped file contains the contents of a file in virtual memory.
Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.
I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
What are the different ways to speed up the initialization of these objects?
The fastest way is to manually serialize the data.
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).
An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
Both times include the work the garbage collector has to do.
Obviously method 2 is faster than method 1, though possibly less readable.
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).
If you've just got a bunch of integers, then using JSON will indeed be pretty inefficient in terms of parsing. You can use BinaryReader and BinaryWriter to write binary files efficiently... but it's not clear to me why you need to read the file every time you create an object anyway. Why can't each new object keep a reference to the original array, which has been read once? Or if they need to mutate the data, you could keep one "canonical source" and just copy that array in memory each time you create an object.
The fastest way to create a byte array from an array of integers is to use Buffer.BlockCopy
byte[] result = new byte[a.Length * sizeof(int)];
Buffer.BlockCopy(a, 0, result, 0, result.Length);
// write result to FileStream or wherever
If you store the size of the array in the first element, you can use it again to deserialize. Make sure everything fits into memory, but looking at your file sizes it should.
var buffer = File.ReadAllBytes(#"...");
int size = BitConverter.ToInt32(buffer,0);
var result = new int[size];
Buffer.BlockCopy(buffer, 0, result, result.length);
Binary is not human readable, but definetely faster than JSON.
I tried using byte by byte comparison and also comparing calculated hash of files (code samples below). I have a file, copy it - compare both - result is TRUE. But problem starts when I open one of the files:
With MS word files - after opening and closing one of files, result is still TRUE, but, for example, when I delete last symbol in file, then write it back again, and then try to compare again - result FALSE. Files are basically the same, bet theoretically it seems that in byte-by-byte they are not the same anymore.
With Excel files - even opening a file causes function to return false. Should it really be like that? Only thing that I can think of that has changed is Last Access time. But does it takes into consideration when comparing byte-by-byte?
So I wanted to ask, does this comparison really should work like this, and is there anything I could do to evade this? In my program I will compare mostly.pdf files, where editing won't be much of an option, but I still wanted to know why is it acting like this.
Byte-by-Byte with buffer:
static bool FilesAreEqualFaster(string f1, string f2)
{
// get file length and make sure lengths are identical
long length = new FileInfo(f1).Length;
if (length != new FileInfo(f2).Length)
return false;
byte[] buf1 = new byte[4096];
byte[] buf2 = new byte[4096];
// open both for reading
using (FileStream stream1 = File.OpenRead(f1))
using (FileStream stream2 = File.OpenRead(f2))
{
// compare content for equality
int b1, b2;
while (length > 0)
{
// figure out how much to read
int toRead = buf1.Length;
if (toRead > length)
toRead = (int)length;
length -= toRead;
// read a chunk from each and compare
b1 = stream1.Read(buf1, 0, toRead);
b2 = stream2.Read(buf2, 0, toRead);
for (int i = 0; i < toRead; ++i)
if (buf1[i] != buf2[i])
return false;
}
}
return true;
}
Hash:
private static bool CompareFileHashes(string fileName1, string fileName2)
{
// Compare file sizes before continuing.
// If sizes are equal then compare bytes.
if (CompareFileSizes(fileName1, fileName2))
{
// Create an instance of System.Security.Cryptography.HashAlgorithm
HashAlgorithm hash = HashAlgorithm.Create();
// Declare byte arrays to store our file hashes
byte[] fileHash1;
byte[] fileHash2;
// Open a System.IO.FileStream for each file.
// Note: With the 'using' keyword the streams
// are closed automatically.
using (FileStream fileStream1 = new FileStream(fileName1, FileMode.Open),
fileStream2 = new FileStream(fileName2, FileMode.Open))
{
// Compute file hashes
fileHash1 = hash.ComputeHash(fileStream1);
fileHash2 = hash.ComputeHash(fileStream2);
}
return BitConverter.ToString(fileHash1) == BitConverter.ToString(fileHash2);
}
else
{
return false;
}
}
Aside from anything else, this code is wrong:
b1 = stream1.Read(buf1, 0, toRead);
b2 = stream2.Read(buf2, 0, toRead);
for (int i = 0; i < toRead; ++i)
if (buf1[i] != buf2[i])
return false;
You're ignoring the possibility of b1 and b2 being unequal to each other and to toRead. What if you only read 10 bytes from the first stream and 20 from the second, when you asked for 30? You may not have reached the end of the files, but it can still potentially return you less data than you ask for. Never ignore the return value of Stream.Read. (You're saving it in a variable but then ignoring the variable.)
Basically you'll need to have independent buffers, which are replenished when necessary - keep track of where you are within each buffer, and how much useful data is there. Read more data into each buffer when you need to.
Then there's the other problem of files actually changing just by opening them, as Henk mentioned.