On one of the working steps of my algorithm I have a big array of binary data, that I want to compress.
Which algorithm (or may be, standard class) can you advise to use to compress the data as much efficient as possible?
EDIT:
The data firstly represented as byte[n] of 0 and 1. Then I join every 8 bytes into 1 and get byte[n/8] array.
The GZipStream or the DeflateStream are pretty standard classes to be used in such situations.
Obviously depending on the binary data you are trying to compress you will have better or worse compression ratio. For example if you try to compress a jpeg image with those algorithms
you cannot expect very good compression ratio. If on the other hand the binary data represents text it will compress nicely.
I'll add DotNetZip and SharpZLib . The .NET "base" libraries (GZipStream/DeflateStream) are stream-based (so they compress a single stream of data. A stream is not a file, but let's say that the content of a file can be read as a stream). DotNetZip is more similar to the classic PkZip/WinZip/WinRar
Related
So I've delved into serializing data using Binary Formatter, which I am impressed with. But the problem is compatibility. I want my serialized data to be portable, therefore accessible by different platforms. So XML serialization may seem like the answer, but the files produced are too large and there is no need for human-readability.
So I thought about creating my own encoding/serialization system so that I can write a long[] array and a string[]/List<string> containing Hexadecimal vales to a file.
I thought about converting all of the arrays to into byte[], but I'm not sure whether I should be concerned about character text encoding. I only intend on serializing/encoding arrays containing Hexadecimal and long values.
byte[] Bytes = HexArray.Select(s => Convert.ToByte(s, 16)).ToArray();
After converting all of the arrays to byte[], I could write them to a file, whilst noting of the byte offsets of the individual arrays so that they could be recovered.
Any ideas on a better way to do this? I really don't wanna resort to XML. Wish the BinaryFormatter was portable. This has to cross-platform so it can't be affected by endianness
You might want to take a look at Protocol Buffers (protobuf):
a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
A couple of popular C# libraries are:
protobuf (Google) and
protobuf-net
I have a scenario where I need to serialize a struct to a byte array or string, encrypt it and then store it in a SQL table.
The issue is that it's possible that this table will run into millions of rows and so size is all important.
I've been using the BinaryFormatter to serialize to a byte[], converting it to Ascii and then compressing it with the RijndaelManaged provider.
Prior to encryption, the string is about 230 bytes. Post encryption, it is over 600 bytes.
Does anyone know if any of the other cryptographic providers would do a better job of this? Or, if indeed there's a better way that I should be going about this?
Thanks very much in advance,
Z
In general, encryption algorithms only add a fixed overhead (for IV and, in the case of asymmetric encryption algorithms, the encrypted symmetric key). If you're seeing a 2x increase in size, that indicates that you are doing something before or after encrypting that increases the size of the data.
I would remove the ascii conversion - encryption algorithms work just fine on raw binary data - and make sure you are storing into the database as a raw binary blob, rather than base64-ing or similar. If you still have trouble, please post the code you're using.
I've found the issue.
I've done a few things
1/ stopped using .net serialization. I notice that serialized content includes assembly information (eg culture, version etc) with every object. It even does this when you override ISerializable and 'roll your own'. I can see why it needs that to re-serialize... but I don't want to store that in every row. Instead I've just built a delimited string and converted that to a byte array. I 're-serialize' myself.
2/ I stopped trying to convert the byte[] to a string (with some sort of encoding) prior to encryption. I now just pass the byte[] into rijindal.
3/ I've started using varbinary in the table and am writing the encrypted result as raw bytes. That seems a lot more efficient than trying to use some sort of encoding.
So - with all that, I've got the final, encrypted product down to about 150 bytes. Big improvement! ... also, I haven't tried compressing it yet either.
Thanks for peoples help :-)
Z
I'm trying to write a simple reader for AutoCAD's DWG files in .NET. I don't actually need to access all data in the file so the complexity that would otherwise be involved in writing a reader/writer for the whole file format is not an issue.
I've managed to read in the basics, such as the version, all the header data, the section locator records, but am having problems with reading the actual sections.
The problem seems to stem from the fact that the format uses a custom method of storing some data types. I'm going by the specs here:
http://www.opendesign.com/files/guestdownloads/OpenDesign_Specification_for_.dwg_files.pdf
Specifically, the types that depend on reading in of individual bits are the types I'm struggling to read. A large part of the problem seems to be that C#'s BinaryReader only lets you read in whole bytes at a time, when in fact I believe I need the ability to read in individual bits and not simply 8 bits or a multiple of at a time.
It could be that I'm misunderstanding the spec and how to interpret it, but if anyone could clarify how I might go about reading in individual bits from a stream, or even how to read in some of the variables types in the above spec that require more complex manipulation of bits than simply reading in full bytes then that'd be excellent.
I do realise there are commercial libraries out there for this, but the price is simply too high on all of them to be justifiable for the task at hand.
Any help much appreciated.
You can always use BitArray class to do bit wise manipulation. So you read bytes from file and load them into BitArray and then access individual bits.
For the price of any of those libraries you definitely cannot develop something stable yourself. How much time did you spend so far?
I'm building an network application that needs to be able to switch from normal network traffic to a zlib compressed stream, mid stream. My thoughts on the matter involve boolean switch that when on will cause the network code to pass all the data through a class that I can feed IEnumerable<byte> into, and then pull out the decompressed stream, passing that on to the already existing protocol parsing code.
Things I've looked at:
ZLib.NET - It seems a little... Ecclectic, and not quite what I want. Would still make a decent start to build off though. (Jon Skeet's comments here hardly inspire me either.)
SharpZipLib - This doesn't seem to support zlib at all? Can anyone confirm or deny this?
I would very much prefer and all managed solution, but let's have at it... are there any other implementations of this library out there in .NET, that might be better suited to what I want to do, or should I take ZLib.NET and build off that as a start?
PS:
Jon's asked for more detail, so here it is.
I'm trying to implement MCCP 2. This involves a signal being sent in the network stream, and everything after this signal is a zlib compressed data stream. There's links to exactly what they mean by that in the above link. Anyway, to be clear, I'm on the recieving end of this (client, not server), and I have a bunch of data read out of the network stream already, and the toggle will be in the middle of this (in all likelyhood atleast), so any solution needs to be able to have some extra data fed into it, before it takes over the NetworkStream (or I manually feed in the rest of the data).
SharpZipLib does support ZLib. Look in the FAQ.
Additionally, have you checked whether the System.IO.Compression namespace supports what you need?
I wouldn't use an IEnumerable<byte> though - streams are designed to be chained together.
EDIT: Okay... it sounds like you need a stream which supports buffering, but with more control than BufferedStream provides. You'd need to "rewind" the stream if you saw the decompression toggle, and then create a GZipStream on top of it. Your buffer would need to be at least as big as your biggest call to Read() so that you could always have enough buffer to rewind.
Included in DotNetZip there is a ZlibStream, for compressing or decompressing zlib streams of data. You didn't ask, but there is also a GZipStream and a DeflateStream. As well as a ZlibCodec class, if that is your thing. (just inflates or deflates buffers, as opposed to streams).
DotNetZip is a fully-managed library with a liberal license. You don't need to use any of the .zip capability to get at the Zlib stuff. And the zlib stuff is packaged as a separate (smaller) DLL just for this purpose.
I can recommend you Gerry Shaw's zlib wrapper for .NET:
http://www.organicbit.com/zip/
As far as I know the ZLib (gzip) library doesn't support listing the files in the header. Assuming that matters to you, but it seems a big shortcoming. This was when I used the sharp zip library a while ago, so I'm willing to delete this :)
Old question, but System.IO.Compression.DeflateStream is actually the right answer if you need proper zlib support:
Starting with the .NET Framework 4.5, the DeflateStream class uses the
zlib library. As a result, it provides a better compression algorithm
and, in most cases, a smaller compressed file than it provides in
earlier versions of the .NET Framework.
Doesn't get better than that.
I want to compress some files (into the ZIP format) and encrypt them if possible using C#. Is there some way to do this?
Can encryption be done as a part of the compression itself?
For compression, look at the System.IO.Compression namespace and for encryption look at System.Security.Cryptography.
For Zip Compression, have you seen http://www.icsharpcode.net/OpenSource/SharpZipLib/
I know the question is already old, but I must add my two cents.
First, some definitions:
Zip: Archive format for regrouping files and folders into a single file, and optionally encrypting data.
Deflate: One of the compression algorithms used within a Zip file to compress the data. The most popular one.
GZip: A single file compressed with deflate, with a small header and footer.
Now, System.IO.Compression does not do Zip archiving. It does deflate and gzip compression, thus will compress a single blob of data into another single blob of data.
So, if you're looking for an archive format that can group many files and folders, you need Zip libraries like:
Xceed Zip (it does support strong encryption)
SharpZipLib
If you only need to compress and encrypt a single blob of data, then look under System.IO.Compression and System.Security.Cryptography.
The GZipStream class is a native way to handle compression.
As for encryption, there are many ways to do it, most of them in the System.Security namespace. They can be done chained (encrypt a compressed stream or compress an encrypted stream).
Chilkat provides .NET libraries for compression and encryption.
I'm not sure if the steps can be combined, but .NET has good support for basic crypto. Here's an article on it.
If they cannot be combined, do compression first and then encryption. Compressing an already encrypted file will lead to poor compression ratios, because a lot of redundancy is removed.
Here is a useful topic:
Help in creating Zip files from .Net and reading them from Java
System.IO.Packaging namespace gives you useful classes to compress data in zip format and support rights management.
There isn't anything you can use directly in C#, however you can use some libraries from J# to do it for you:
http://msdn.microsoft.com/en-us/magazine/cc164129.aspx
Should do just what you want?
With regards to the encryption, have a look at these links:
http://www.codeproject.com/KB/security/fileencryptdecrypt.aspx
http://www.obviex.com/samples/EncryptionWithSalt.aspx