Redis compressing string values in .net MVC - c#

In my app I want to compress the data that get stored in redis string keys.
I don't want to compress all of them though because small data values don't compress well and I want to avoid the cpu overhead on them.
My question is how to detect that a value is compressed when I read the string key in order to perform decompression?
I tried some code to append a custom header to the zip stream but i didn't had any luck.

A common pattern is to use a payload prefix combined with a delimiter.
For example, you could use a format like this:
[key];[encoding];[metatype];[version]\t[payload]
I use delimiters ; and \t here. Choose other delimiters if you like them better. Ofcourse you must prevent these delimiters from occurring in your prefix tags themselves. [payload] contains for example binary data, string data, whatever. [encoding] can for example be zip,msgpack,utf8,base64,json (just some ideas).
The benefit of using a payload prefix is that you don't have to deserialize or uncompress the payload itself to use it as an entity. In Redis-Lua for example, you can't unzip. But you can do a simple read of the preload prefix, and respond to client requests. Even if you can deserialize inside Redis-Lua, like JSON or MsgPack formats, you might not want to do that because of performance reasons.
There are other options ofcourse. If you don't like prefixes with delimiters, you could also put the payload and encoding-tag in an array, and serialize it as MsgPack. Or, use JSON for the prefix, then a null character, then the payload. Or even (a bit more memory efficient): use 4 or 8 bytes for the prefix size, MsgPack for the prefix, and use the prefix size to determine where the payload starts (which might even be MsgPack as well).
Final word of advice: don't mess with the payload itself (like altering the zip header), that's bound to get you in a whole lot of unnecessary trouble.
Hope this helps, TW

Related

C# Parse HTML Post Data

I have MemoryStream data (HTML POST Data) which i need to parse it.
Converting it to string give result like below
key1=value+1&key2=val++2
Now the problem is that all this + are space in html. Am not sure why space is converting to +
This is how i am converting MemoryStream to string
Encoding.UTF8.GetString(request.PostData.ToArray())
If you are using Content-Type of application/x-www-form-urlencoded, your data needs to be url encoded.
Use System.Web.HttpUtility.UrlEncode():
using System.Web;
var data = HttpUtility.UrlEncode(request.PostData);
See more in MSDN.
You can also use JSON format for POST.
I suppose that the data you are retrieving are encoded with URL rules.
You can discover why data are encoded to this format reading this simple article from W3c school.
To encode/decode your post string you may use this couple of methods:
System.Web.HttpUtility.UrlEncode(yourString); // Encode
System.Web.HttpUtility.UrlDecode(yourString); // Decode
You can find more informations about URL manipulation functions here.
Note: If you need to encode/decode an array of string you need to enumerate your collection with a for or foreach statement. Remember that with this kind of cycles you cannot directly change the cycle variable value during the enumeration (so probably you need a temporary storage variable).
At least, to efficiently parse strings, I suggest you to use the System.Text.RegularExpression.Regex class and learn the regex "language".
You can find some example on how to use Regex here; Regex101 site has also a C# code generator that shows you how to translate your regex into code.

case sensitive to URL case insensitive

I'm generating an encoded value to get passed within my URL, the issue is, our SEO manager configure the application, to pass lowercase URL, and he says he won't change the configuration. now i have to somehow encode my url, that uppercase, or whole string get encoded by their character code, so i can pass it without ruin the main value,
for example, my resulting base64 string is as following:
aHR0cDovL2xvY2FsaG9zdDoxMzUwL2hvdGVscy9nMy8xMzk1LTA1LTEwLzEvOTI3MjIyZmY
but it turn to be like this, when is passed to controller:
ahr0cdovl2xvy2fsag9zddoxmzuwl2hvdgvscy9nmy8xmzk1lta1ltewlzevoti3mjiyzmy
which can't be read... the case cause issue while decode.
You cannot encode it using base64 if it will be transformed to lowercase out of your control, base64 relies upon using uppercase characters.
If the configuration your manager is insisting on is that incoming or outgoing query string parameters be incorrectly lower cased, however, you should inform him that he is in violation of the URI specification, specifically the query string section. Of course it is ultimately up to your own internal company choices whether you want only lower case in your internal URIs, but you should not assume that other applications handling URIs will operate like this.
As #sachin stated above, if you can make this a POST request (instead of a GET like I assume it is now), and provided that your manager is not lower casing those upon sending them as well :/ You can send this data via POST.
Alternatively, you could use Base32 instead to get around this, it does rely on uppercase characters only, but you can simply transform the recieved value to upper case upon recieveing it prior to decoding the (now Base32) string. This is a pretty ridiculous solution though...
Just to be clear: "lol" would encode in Base32 to "NRXWY===" which would then be lower cased to "nrxwy===" which you could then uppercase back to "NRXWY===" prior to decoding.
These are two NuGet packages that do Base32 encoding:
Base32 as per RFC4648 here and the author claims it's tested and working correctly.
Another package, which looks appealing because it supports zBase32 here, the advantage with zBase32 is that it already uses lowercase characters only, so you won't have to worry about changing the case. The porter/author has included instructions on how to get zBase32 encoding
Both of the these (Base32 and zBase32) use a subset of Base64 characters, so they'll both work fine with URIs, all of the charcaters used are valid in URIs (the utf-8 content is irrelevant since you're just encoding bytes, so you'll get the same bytes back when you decode from Base32)

Is there cross-platform method to encode a string into another string without any whitespaces and then decode it back?

I am trying to pass a block of text to a system I do not own, which will pass the data to a system I do own.
Unfortunately, when the first system talks to the second system, it uses a TSV format. Thus, I wonder if there's a convenient way to take my block of text and encode it in an ASCII format without any kind of whitespace (mostly newlines and tabs, of course), and then later decode it.
When I'm doing the encoding, I'm working in C#. When I'm doing the decoding, I'm working in Javascript.
I realize that I can write my own code to essentially "manually" perform the encoding and decoding by creating my own scheme, but I wonder if there already exists one for this purpose.
One option which would blow up the size of your data but be really simple to implement: UTF-8 encode all the text, base64-encode that:
byte[] utf8 = Encoding.UTF8.GetBytes(text);
string base64 = Convert.ToBase64(utf);
That won't contain any whitespace, and can be converted back. It'll be significantly larger than the original string, and unreadable... but it'll work.
You could try using HttpUtility.UrlEncode(string) or Uri.EscapeDataString(string), which would percent-encode any whitespace in the passed in text (as well as other special characters, which means the encoded text may be much larger than the original).
On the javascript side you could then use decodeURIComponent(string) to decode it back to the original text.

byte[] buffer handling on c-sharp

I'm writing a class which is used to work against a byte[] buffer. It contains methods like char Peek() and string ReadRestOfLine().
The problem is that I would like to add support for unicode and I don't really know how I should change those methods (they only support ASCII now).
How do I detect that the next bytes in the buffer is a unicode sequence (utf8 or utf16)? And how do I convert them to a char?
Update
Yes, the class is a bit similar to the StreamReader, but with the difference that it will avoid creating objects (like string, char[]) etc until the entire wanted string has been found. It's used in a high performance socket framework.
For instance: Let's say that I want write a proxy that will only check the URI in a HTTP request. If I where to use the StreamReader I would have to build a temp char array each time a new receive have been completed just to see if a new line character have been received.
By using a class that works directly against the byte[] buffer that socket.ReceiveAsync uses, I just have to traverse the buffer in my parser to know if the next step can be completed. No temporary objects are created.
For most protocols ASCII is used in the header area and UTF8 will not be a problem (the request body can be parsed using StreamReader). I'm just interested in how it can be solved avoiding to create unnecessary objects.
I don't think you want to go there. There are tons of stuff that can go wrong. First of all: What encoding are you using? Then, does the buffer contain the entire encoded string? Or does it start at some random position, possibly inside such a sequence?
Your classes sound a bit like a StreamReader for a MemoryStream. Maybe you can use those?
From the documentation:
Implements a TextReader that reads characters from a byte stream in a particular encoding.
If the point of your exercise is to figure out how to do this yourself... take a peek into how the library did it. I think you'll find the method StreamReader.Read() interesting:
Reads the next character from the input stream and advances the character position by one character.
There is a one-to-one correspondance between bytes and ASCII characters making it easy to treat bytes as characters. Modifying your code to handle various encodings of UNICODE may not be easy. However, to answer part of your question:
How do I detect that the next bytes in the buffer is a unicode sequence (utf8 or utf16)? And how do I convert them to a char?
You can use the System.Text.Encoding class. You can use the predefined encoding objects Encoding.Unicode and Encoding.UTF8 and use methods like GetCharCount, GetChars and GetString.
I've created a BufferSlice class which wraps the byte[] buffer and makes sure that only the assigned slice is used. I've also created a custom reader to parse the buffer.
UTF turned out to not be a problem since I only parse the buffer to find characters that is not multi-bytes (space, minus, semicolon etc). I then use Encoding.GetString from the last delimiter to the current to get a proper string back.

C# parse elements of an HTTP POST response object from a string

I have a set of files, each of which contain the full text of a series of HTTP POST responses. A number of these contain binary objects (e.g. images or PDFs). I've been trying to use regexes to extract the binary objects, but I can't seem to get it correctly. The HTTPListener class (and associated classes) all seem to require an active connection, i.e. parsing a real time request response pair, which I don't have. Is there a good library out there which can parse a file (or a string) as an HTTP response? If not, can anyone think of a better method for doing this than regex?
Thanks,
Rik
You can easily write your own Parser which does the following:
Reads the Response file line by line
Till the line Content Length, which specifies the number of bytes in Payload
Read the payload as binary
Image class has an overload which creates an image from a Stream. This way you can verify whether your result images matches the original image.
Regards

Categories