Need help manipulating WAV (RIFF) Files at a byte level

Need help manipulating WAV (RIFF) Files at a byte level - c#

I'm writing an an application in C# that will record audio files (*.wav) and automatically tag and name them. Wave files are RIFF files (like AVI) which can contain meta data chunks in addition to the waveform data chunks. So now I'm trying to figure out how to read and write the RIFF meta data to and from recorded wave files.
I'm using NAudio for recording the files, and asked on their forums as well on SO for way to read and write RIFF tags. While I received a number of good answers, none of the solutions allowed for reading and writing RIFF chunks as easily as I would like.
But more importantly I have very little experience dealing with files at a byte level, and think this could be a good opportunity to learn. So now I want to try writing my own class(es) that can read in a RIFF file and allow meta data to be read, and written from the file.
I've used streams in C#, but always with the entire stream at once. So now I'm little lost that I have to consider a file byte by byte. Specifically how would I go about removing or inserting bytes to and from the middle of a file? I've tried reading a file through a FileStream into a byte array (byte[]) as shown in the code below.
System.IO.FileStream waveFileStream = System.IO.File.OpenRead(#"C:\sound.wav");
byte[] waveBytes = new byte[waveFileStream.Length];
waveFileStream.Read(waveBytes, 0, waveBytes.Length);
And I could see through the Visual Studio debugger that the first four byte are the RIFF header of the file.
But arrays are a pain to deal with when performing actions that change their size like inserting or removing values. So I was thinking I could then to the byte[] into a List like this.
List<byte> list = waveBytes.ToList<byte>();
Which would make any manipulation of the file byte by byte a whole lot easier, but I'm worried I might be missing something like a class in the System.IO name-space that would make all this even easier. Am I on the right track, or is there a better way to do this? I should also mention that I'm not hugely concerned with performance, and would prefer not to deal with pointers or unsafe code blocks like this guy.
If it helps at all here is a good article on the RIFF/WAV file format.

I did not write in C#, but can point on some places which are bad from my point of view:
1) Do not read whole WAV files in memory unless the files are your own files and knowingly have small size.
2) There is no need to insert a data in memory. You can simply for example do about the following: Analyze source file, store offsets of chunks, and read metadata in memory; present the metadata for editing in a dialog; while saving write RIFF-WAV header, fmt chunk, transfer audio data from source file (by reading and writing blocks), add metadata; update RIFF-WAV header.
3) Try save metadata in the tail of file. This will results in alternating only tag will not require re-writing of whole file.
It seems some sources regarding working with RIFF files in C# are present here.

Related

How to convert stored MP3 file data (byte[]) into a float[] Unity can use as AudioSource-Data?

I did tons of research during the last days and nothing helped me out for my special problem. I am writing my own music engine for a Unity3d game and for this I created custom files containing the mp3 data and other information.
What I'm trying to do now is to take these split up mp3-byte-arrays (which can be played when I store them individually, I tested it - so the audio data seems to be fine) and convert them to Unity's AudioSource(s) somehow. I think converting the byte[] into a float[] containing the needed sample data of my audio would be enough, because audioClip.setData( ... ); should do the trick then (I hope).
But I continuously fail at decompressing and/or converting my raw mp3 buffer[] to anything like float[] - and even if I somehow succeed, the only thing I hear is nasty whitenoise-like nonsense.
Any ideas? I would love to hear from you and solve this problem!

Implementation of a decompression algorithm that will take a compressed MP3 stream and output it as uncompressed data is not a trivial task. You can either find a library that does this (I am not aware of any), or work around the issue in some way.
You can use untiy engine to decompress (aka play back) through an AudioSource, and sample it then, or execute some FFMPEG command line to get the wav. Most ways of doing it are cumbersome and ideally you should find a way of not doing it at all.
byte[] or float[] is not really a signifficant difference here - the difference is encoded vs raw

Find and replace data in file (without loading the entire thing)?

I want to replace some data in a file, however I do not know exactly where this 200MB file would contain it. Is it possible to find (and replace them with something else) these values without loading a 200mb+ file into the memory?

Searching the file is not a problem. What you need is to work with the FileStream which is available via File.Open method. You can read through the file up to the bytes you need to replace.
Problem arises when you need to insert something. The FileStream allows you to overwrite some or all of the file contents from a particular byte forth and to append new content to its end but it does not allow you to insert data in the middle of the file. In order to overcome this problem you are going to need a temporary file. If you agree to that you could do the following:
Open the FileStream on the original file.
Create a temporary file that will hold the draft version.
Search through the original file and copy all "good" data into temporary file up to the point where modifications are to be made.
Insert modified and new data into the temporary file.
Finish up the temporary file with the remaining "good" content from the original file.
Replace the original file with the temporary one.
Delete the temporary file.
You could use the Path.GetTempFileName method for convenient way of utilizing a temporary file.
P.S. If you modify an exe then you probably make replacements on text constants and you neither need to insert new bytes nor to remove any. In such a case you do not need to bother with the temporary file and the FileStream is all you need.
P.P.S. Working with the FileStream you decide on size of a buffer you read from file and write back. Keep in mind that this size is the tradeoff between memory consumption, I/O performance and complexity of your code. Choose wisely. I would make it per-byte for the first time and try to optimize increasing the buffer to say 64k when it works. You can count on the FileStream to buffer data; it is not performing disk I/O each time you request another byte from it. If you dive into buffering yourself then try not to fragment the Large Object Heap. The threshold for .NET 4.5 is 85000 bytes.

Just a thought, how about reading your file line by line or may be in chunk of bytes and see in each chunk if u have the data that needs to be replaced. Also while reading make sure get the file pointer till where you have read the file so that when u find the match then u can go back to that location and over write those exact bytes which u have targetted.

Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?

I have a zip file that contains folder hierarchies and files.
\images\
\images\1.jpg
\images\2.jpg
\something\something\a.exe
\something\something\b.exe
1.txt
I need to decompress the contents of this zip file to a location. I also need to preserve the structure of the zip file.
I've read about .NET's GZipStream and DeflateStream but I am of the opinion that it is too "complicated" for my purpose.
I've also used DotNetZip and SharpZipLib in the past for personal projects but since this is work related and I'm working at a huge company, I would have a hard time convincing legal to use these libraries.
Question:
Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?
PS: I've also read this but I think it's hacky because you'll need to produce another executable just to hide the progress dialog.
Thanks!

Check out if Ionic Zip helps?

DotNetZip would do what you want, but I understand your concerns about legal approval.
On a side note, It might be good for you to navigate the legal jungle associated with getting an open-source library approved for use in the company, just to understand what's involved. But I'll leave that up to you.
Getting back to rolling your own...
DotNetZip is pretty full featured, and it handles a number of scenarios you probably don't care about. Like Unicode filenames and comments, setting windows timestamps and permissions of extracted files, getting timestamps of zip files created on old unix systems, split archives, Encrypted archives, files over 2gb, or self-extracting archives, etc etc etc. Many zip files use none of those things.
Also DotNetZip does eventing and zip updates and zip creation - all the code associated with these things is probably not of interest to you, if you confine yourself just to the requirements you described in your question.
You could, though, grab the DotNetZip code and use it to help you roll your own solution. If you constrain yourself to JUST reading zip files and not dealing with all the possible special cases, the zip format is not difficult to parse.
here's how to do it:
open the zip file using new FileStream() or File.Open. You want a FileStream object.
Read 4 bytes. Verify that it is the zip-entry-header descriptor. (0x04034b50)
In the file, the order you will find these bytes is 50 4b 03 04.
if you find a match, you're in business.
at offset 14 is a 4-byte CRC. Get it. (Same byte ordering as above)
at offset 18 - the 4-byte length of the compressed blob. get it. (N)
at offset 22 - the 4-byte length of the UNcompressed blob. get it. (U)
at 26 - the 2-byte length of the filename. get it (L)
at 28 - the 2-byte length of the "extra field". get it (E)
Beyond the extra field, at offset 30, is the actual filename. read L bytes for the filename, and call System.Text.Encoding.ASCII.GetString(). The result will include a directory path, with the backslashes replaced with slashes (unix style). String.Replace() the slashes.
after the filename comes the extra field - seek E bytes to get beyond it. You can mostly ifgnore it. This is where the compressed data starts.
Open a System.IO.DeflateStream() on the zip FileStream, using CompressionMode.Decompress, and using the current offset of the FileStream as input. open a new FileStream, for output, with the file path you read in step 3. in a loop, call inflater.Read(). and output.Write(), to write the decompressed output of the DeflateStream to a filesystem file with the correct name. You will need to stop reading from the DeflateStream when you read exactly U (uncompressed) bytes.
Check the uncompressed size (U) against the data you actually wrote out from the DeflateStream (after compression). They should match.
If you are fancy, you can check the CRC of the output against what was in the header.
go to step 2, to look for the next entry in the file.
The most complicated part is step 3. Working code for that is easily found in this source module, look for the ReadHeader method.

Maybe the full features set of GZipStream it's a bit complicated, but note that the sample in the msdn page it's exactly what you need. I mean this msdn web (the 4.0 version) not the one you supply in the question.
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx#Y2750

How to determine file type?

I need to know if my file is audio file: mp3, wav, etc...
How to do this?

Well, the most robust way would be to write a parser for the file types you want to detect and then just try – if there are no errors, it's obviously of the type you tried. This is an expensive approach, however, but it would ensure that you can successfully load the file as well since it will also check the rest of the file for semantic soundness.
A much less expensive variant would be to look for “magic” bytes – signatures at the start or known offsets of the file. For example, if a file starts with an ID3 tag you can be reasonably sure it's an MP3 file. If a file starts with RIFF¼↕☻ WAVEfmt, then it's a WAV file. However, such detection cannot guarantee you that the file is really of that type – it could just be the signature and following that garbage.

While you can use the extension to make a reasonable guess as to what the file is it's not guaranteed to work 100% of the time. If you are targeting Windows then it will work 99.9% of the time as that's how Windows keeps track of what file is what type.
If you are getting your files from non-Windows sources the only sure way is to open the file and look for a specific string or set of bytes which will unambiguously identify it. For example, you could look for the ID3 tags in an mp3 file:
The ID3v1 tag occupies 128 bytes, beginning with the string TAG.
or
ID3v2 tags are of variable size, and usually occur at the start of the file, to aid streaming media.
How far you go depends on how robust you want your solution to be, and does rely on there being a header or pattern that's always present.
Doing it this way can help guard against malicious content where someone posts a piece of malware as a mp3 file (say) and hopes that it will just be run by a program prone to some exploit (a buffer overrun perhaps).

You can use the file extension to figure it out:
using System.IO;
class Program
{
static void Main()
{
string filepath = #"C:\Users\Sam\Documents\Test.txt";
string extension = Path.GetExtension(filepath);
if (extension == ".mp3")
{
Console.WriteLine(extension);
}
}
}
The file extension is the first point of call for the OS to figure out what file type it's dealing with, if you really want to know the file type 100% the only way to do it is read into the file. But this comes with a catch, image files are easy as they include headers in a pretty easy to read format, however it can get a little more complex with a completely variable file type.
You could check out this post on an old post for a bit of help. Here is a post about finding just media file types.
Ultimately it depends on why your trying to do this.

Path.GetExtension(PathToFile)

See this post. You end up passing the first (up to) 256 bytes of data from the file to FindMimeFromData (part of the Urlmon.dll).

C#: Archiving a File into Parts of 100MB

In my application, the user selects a big file (>100 mb) on their drive. I wish for the program to then take the file that was selected and chop it up into archived parts that are 100 mb or less. How can this be done? What libraries and file format should I use? Could you give me some sample code? After the first 100mb archived part is created, I am going to upload it to a server, then I will upload the next 100mb part, and so on until the upload is finished. After that, from another computer, I will download all these archived parts, and then I wish to connect them into the original file. Is this possible with the 7zip libraries, for example? Thanks!
UPDATE: From the first answer, I think I'm going to use SevenZipSharp, and I believe I understand now how to split a file into 100mb archived parts, but I still have two questions:
Is it possible to create the first 100mb archived part and upload it before creating the next 100mb part?
How do you extract a file with SevenZipSharp from multiple splitted archives?
UPDATE #2: I was just playing around with the 7-zip GUI and creating multi-volume/split archives, and I found that selecting the first one and extracting from it will extract the whole file from all of the split archives. This leads me to believe that paths to the subsequent parts are included in the first one (or is it consecutive?). However, I'm not sure if this would work directly from the console, but I will try that now, and see if it solves question #2 from the first update.

Take a look at SevenZipSharp, you can use this to create your spit 7z files, do whatever you want to upload them, then extract them on the server side.
To split the archive look at the SevenZipCompressor.CustomParameters member, passing in "v100m". (you can find more parameters in the 7-zip.chm file from 7zip)

You can split the data into 100MB "packets" first, and then pass each packet into the compressor in turn, pretending that they are just separate files.
However, this sort of compression is usually stream-based. As long as the library you are using will do its I/O via a Stream-derived class, it would be pretty simple to implement your own Stream that "packetises" the data any way you like on the fly - as data is passed into your Write() method you write it to a file. When you exceed 100MB in that file, you simply close that file and open a new one, and continue writing.
Either of these approaches would allow you to easily upload one "packet" while continuing to compress the next.
edit
Just to be clear - Decompression is just the reverse sequence of the above, so once you've got the compression code working, decompression will be easy.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.