Wondering if there's someone here with some experience with gzip format. I have a very large gzip file that I need to parse. However, I may only need a small portion of the decompressed text file. Is is possible to stream this zip file without decompressing the entire file?
Anyone experience with gzip?
You do realise that you can stream using standard java library classes right? It's quite trivial, something like:
GZIPInputStream stream = new GZIPInputStream(new FileInputStream("some_file.gz"));
BufferedReader reader = new BufferedReader(stream);
// Now read line by line... till you hit the content you want.
The entire file is not decompressed on the disk, just chunks as you need it in memory. And you can optionally re-compress and write back out again using the corresponding output streams.
Related
I have an application where I'm reading log lines line-byline and compressing them together using gzip. I'd like to have the resulting archive be right around 5kb, so I want to check its size after every Write() to the GZipStream. Is there any way to do this even though the stream doesn't have a Length property?
I am not new to C# but quite new to file handling. My current idea is to read files (of any kind, for example jpg, txt, pdf etc) to a buffer to be able to do something with it later, for example just write an exact copy to the same folder (for testing) or send it to another pc via network. I know that there is a specific method for sending files via network, but I'd like to be able to handle the file itself and understand how to open files the correct way and write them the correct way to have a working copy.
If I just open a file and use for example a StreamReader like this:
using (StreamReader sr = new StreamReader(sourcePath, GetEncoding(sourcePath)))
{
// Read the stream to a string, and write the string to the console.
String line = sr.ReadToEnd();
Console.WriteLine(line);
WriteFile(outputFile, GetEncoding(sourcePath), line);
}
it will create a bigger file (for example of an jpg) which does not work in the end. I think it has something to do with the encoding, but since I have to little knowledge about files itself maybe someone can give me some helpful tips.
I'm trying to save a large amount of data to a XML and the file ends up with a very large size. I've searched compression but all examples I found first write the file, then read it to compress to another file, ending with both the large and the compressed files, and the closest I got to removing the intermediate step of writing then reading, ended up with a zip containing an extension-less file(which I can open in notepad as a XML).
this is what I have now:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
using (FileStream outFile = File.Create(#"File.zip"))
{
using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
{
using (XmlWriter writer = XmlWriter.Create(Compress, settings))
{
//write the XML
}
}
}
How do I make the file inside the zip have the XML extension?
I think this might be a little misunderstanding of fundamentals. From what I know, GZip is a compression system, but not an archiving system. When working with UNIX systems, they tend to be treated as two separate things (whereas ZIP or RAR compression does both). Archiving puts a number of files in one file, and compression makes that file smaller.
Have you ever seen Unix packages that are downloaded as "filename.tar.gz"? That's generally the naming format - they took an archive file (filename.tar) and applied GZip compression to it (filename.tar.gz)
Actually, you're technically kind of causing a bit of confusion by naming your file ".zip" (which is a completely different, more commonly-used format). if you want to follow along with UNIX traditions, just name your file "file.xml.gz". If you want to archive it, use a Tar archiving library. Other libraries such as 7-zip's may have simpler compression systems that will do both for you, for instance if you want this file to be a .zip, easily read by people on Windows computers.
I think you have to write to a temp file first. Take a look at
DotNetPerls
Does anyone know how to stream Ogg files without fully downloading them first over a Socket(in byte[] format).
I am trying to create a music streaming application and I managed to do it with MP3's but I understand there's licensing issues invovled after certain limit hence why I want to use OGG(Vorbis). I maanged to find this C# Vorbis Wrapper but no documentation, and I cannot figure out how to get a byte[] stream to play.
I have tried the following
var rawData = File.ReadAllBytes(#"SoundFile.ogg");
var enc = new OggVorbisEncodedStream(rawData);
var sp = new SoundPlayer(enc);
sp.Play();
But an exception gets thrown showing that the Wav file header is incorrect. I understand SoundPlayer is used for only playing .wav files? Does anybody know how to stream a OGG file?
The wrapper is a very thin one, it directly calls the native ogg codec functions. Look for the docs of ov_read to see what you get back from OggVorbisEncodedStream.Read(). Raw PCM would be my guess. The wrapper doesn't attempt any kind of format conversion.
Yes, SoundPlayer won't work here, it requires wav and can't stream. You'll need a player that can take chunks of PCM as an input. Not sure what does that, the NAudio project is usually good for stuff like this.
string filePath = "License.lic";
using (var stream = new StreamReader(filePath))
I'm writing an an application in C# that will record audio files (*.wav) and automatically tag and name them. Wave files are RIFF files (like AVI) which can contain meta data chunks in addition to the waveform data chunks. So now I'm trying to figure out how to read and write the RIFF meta data to and from recorded wave files.
I'm using NAudio for recording the files, and asked on their forums as well on SO for way to read and write RIFF tags. While I received a number of good answers, none of the solutions allowed for reading and writing RIFF chunks as easily as I would like.
But more importantly I have very little experience dealing with files at a byte level, and think this could be a good opportunity to learn. So now I want to try writing my own class(es) that can read in a RIFF file and allow meta data to be read, and written from the file.
I've used streams in C#, but always with the entire stream at once. So now I'm little lost that I have to consider a file byte by byte. Specifically how would I go about removing or inserting bytes to and from the middle of a file? I've tried reading a file through a FileStream into a byte array (byte[]) as shown in the code below.
System.IO.FileStream waveFileStream = System.IO.File.OpenRead(#"C:\sound.wav");
byte[] waveBytes = new byte[waveFileStream.Length];
waveFileStream.Read(waveBytes, 0, waveBytes.Length);
And I could see through the Visual Studio debugger that the first four byte are the RIFF header of the file.
But arrays are a pain to deal with when performing actions that change their size like inserting or removing values. So I was thinking I could then to the byte[] into a List like this.
List<byte> list = waveBytes.ToList<byte>();
Which would make any manipulation of the file byte by byte a whole lot easier, but I'm worried I might be missing something like a class in the System.IO name-space that would make all this even easier. Am I on the right track, or is there a better way to do this? I should also mention that I'm not hugely concerned with performance, and would prefer not to deal with pointers or unsafe code blocks like this guy.
If it helps at all here is a good article on the RIFF/WAV file format.
I did not write in C#, but can point on some places which are bad from my point of view:
1) Do not read whole WAV files in memory unless the files are your own files and knowingly have small size.
2) There is no need to insert a data in memory. You can simply for example do about the following: Analyze source file, store offsets of chunks, and read metadata in memory; present the metadata for editing in a dialog; while saving write RIFF-WAV header, fmt chunk, transfer audio data from source file (by reading and writing blocks), add metadata; update RIFF-WAV header.
3) Try save metadata in the tail of file. This will results in alternating only tag will not require re-writing of whole file.
It seems some sources regarding working with RIFF files in C# are present here.