HTTP POST - Can it contain complex objects directly? - c#

From all I've read it seems that it's always of the form string=string&string=string... (all the strings being encoded to exclude & and =) however, searching for it (e.g. Wikipedia, SO, ...) I haven't found that mentioned as an explicit restriction.
(Of course a base64 string of a binary of complex objects can be sent. That's not the question.) But:
Can POST contain complex objects directly or is it all sent as a string?

There is nothing in HTTP that prevents the posting of binary data. You do not have to convert binary data to base64 or other text encodings. Though the common "key1=val1&key=val2" usage is very widely conventional and convenient it is not required. It only depends upon what the sender and receiver agree upon. See these threads or google "http post binary data" or the like.
Sending binary data over http
How to correctly send binary data over HTTPS POST?

It is just a string, just like any binary stream. There's various ways to encode complex objects to fit into a string though. base64 is an option, and so is json (the latter probably being more desirable).
PHP has a specific way to deal with this.. This:
a[]=1&a[]=2
Will result in an array with 1, 2.
This:
a[foo]=bar&a[gir]=zim
Creates also an array with 2 keys.
I've also seen this format in some frameworks:
a.foo=bar&b.gir=zim
So while urlencoding does not have a specific, standard syntax to do this.. that does not mean you can add meaning and do your own post-processing.
If you're buidling an API, you are probably best off not using urlencoding at all... There's much more capable and better formats. You can use whatever Content-Type you'd like.

HTTP itself is just based on strings. There's no notion of "objects", only text. The definition of "object" is dependent on whatever data format you transport over HTTP (XML, JSON, binary files, ...).
So, POST can contain "complex objects" if they are appropriately encoded into text.

Related

Redis messages contain unicode?

I am currently reading in from a Redis server and from the Redis-Cli monitor I am receiving messages that are publishes that have data in-between unicode that looks like this:
\x93\xa6Xz05FH\x83\xa4type\x02\xa4data\x92\xad
Is this generated from some redis related thing or is this because of the original publisher ?
If I publish messages there is no unicode attached so I find it hard to believe this is Redis' doing but I want to be sure.
Redis doesn't care about the format of things unless you are using numeric operators such as incr, hincrby, etc; other than that, strings, hash fields, pub/sub payloads etc are all binary-safe opaque values defined purely by what the client specified. Since you mention StackExchange.Redis, if you use a .NET string to populate a RedisValue, then that string will be encoded with UTF-8. However, there are other ways of populating a RedisValue for specifying an arbitrary binary payload - the most common of which would be a byte[].
It is possible that redis-cli - since it is a text-based tool - is switching to this representation by a simple heuristic of "does something look non-ASCII / non-printable" - that \x02 is looking particularly suspicious (is that a STX?). But the actual data flowing through redis on a publish is raw binary.

Programming practices when receiving and manipulating received TCP/HTTP data?

Should data manipulation once data is returned either with TCP or HTTP be received as byte arrays or is it an O.K. practice to receive it as a string? I've been trying to find some professional projects on github to get my answer, but have had no luck. Some examples of HTTPClient from Microsoft on MSDN usually make use of the GetByteArrayAsync(website) method, instead of GetStringAsync(website). Is there any reason why they would use GetByteArrayAsync instead of GetStringAsync, which would make data manipulation much easier right off the bat? Are there any advantages to using GetByteArrayAsync first instead?
What moves "through the wire" are bytes, not strings.
They might be text, but can be pictures, or a zip file.
At TCP/HTTP level this is unknown, and it does not matter.
That decision belongs with a higher level.
HTTP has a bit more info than TCP, so you might have a mimetype to help you decide what those bytes are.
Even if you know it is some kind of text, you will need to know the character set. You might get that info in the HTTP header, or in the document itself, or there might be a standard saying what the encoding is.
Only then you will be able to convert to a string.

Why no byte strings in .net / c#?

Is there a good reason that .NET provides string functions (like search, substring extraction, splitting, etc) only for UTF-16 and not for byte arrays? I see many cases when it would be easier and much more efficient to work with 8-bit chars instead of 16-bit.
Let's take MIME (.EML) format for example. It's basically 8-bit text file. You cannot read it properly using ANY encoding (because encoding info is contained within the file, moreover, different parts can have different encodings).
So you basically better read a MIME file as bytes, determine it's structure (ideally, using 8bit-string parsing tools), and after finding encodings for all encoding-dependent data blocks apply encoding.GetString(data) to get normal UTF-16 representation of them.
Another thing is with base64 data blocks (base64 is just an example, there are also UUE and others). Currently .NET expects you to have a base64 16-bit string but it's not effective to read data of double size and do all conversions from bytes to string just to decode this data. When dealing with megabytes of data, it becomes important.
Missing byte string manipulation functions leads to the need to write them manually but the implementation is obviously less efficient than native code implementation of string functions.
I don't say it needs to be called 8-bit chars, let's keep it bytes. Just have a set of native methods which reflect most string manipulation routines, but with byte arrays. Is this needed only by me or am I missing something important about common .NET architecture?
Let's take MIME (.EML) format for example. It's basically 8-bit text file. You cannot read it properly using ANY encoding. (because encoding info is contained within the file, moreover, different parts can have different encodings).
So, you're talking about a case where general-purpose byte-string methods aren't very useful, and you'd need to specialise.
And then for other cases, you'd need to specialise again.
And again.
I actually think byte-string methods would be more useful than your example suggests, but it remains that a lot of cases for them have specialised needs that differ from other uses in incompatible ways.
Which suggests it may not be well-suited for the base library. It's not like you can't make your own that do fit those specialised needs.
Code to deal with mixed-encoding string manipulation is unnecessarily hard and much harder to explain/get right. The way you suggest to handle mixed encoding every "string" would need to keep encoding information in it and framework would have to provide implementations of all possible combinations of encodings.
Standard solution for such problem is to provide well defined way convert all types to/from single "canonical" representation and perform most operations on that canonical type. You see that more easily in image/video processing where random incoming formats converted into one format tool knows about, processed and converted back to original/any other format.
.Net strings are almost there with "canonical" way to represent Unicode string. There are still many ways to represent same string from user point of view that is actually composed from different char elements. Even regular string comparison is huge problem (as frequently in addition to encoding there are locale differences).
Notes
there are already plenty of API dealing with byte arrays to compare/slice - both in Array/List classes and as LINQ helpers. The only real missing part is regex-like matches.
even dealing with single type of encoding for strings (UTF-16 in .Net, UTF-8 in many other systems) is hard enough - even getting "sting length" is a problem (do you need to count surrogate pairs only or include all combining characters, or just .Length is enough).
it is good idea to try to write code yourself to see where complexity come from and whether particular framework decision makes sense. Try to implement 10-15 common string functions to support several encodings - i.e. (UTF8, UTF16, and one of 8-bit encoding).

Is there a preferred manner for sending data over a web socket connection?

Is there a 'correct' or preferred manner for sending data over a web socket connection?
In my case, I am sending the information from a C# application to a python (tornado) web server, and I am simply sending a string consisting of several elements separated by commas. In python, I use rudimentary techniques to split the string and then structure the elements into an object.
e.g:
'foo,0,bar,1'
becomes:
object = {
'foo': 0,
'bar': 1
}
In the other direction, I am sending the information as a JSON string which I then deserialise using Json.NET
I imagine there is no strictly right or wrong way of doing this, but are there significant advantages and disadvantages that I should be thinking of? And, somewhat related, is there a consensus for using string vs. binary formats?
Writing a custom encoding (eg, as "k,v,..") is different than 'using binary'.
It is still text, just a rigid under-defined one-off hand-rolled format that must be manually replicated. (What happens if a key or value contains a comma? What happens if the data needs to contain nested objects? How can null be interpreted differently than '' or 'null'?)
While JSON is definitely the most ubiquitous format for WebSockets one shouldn't (for interchange purposes) write JSON by hand - one uses an existing serialization library on both ends. (There are many reasons why JSON is ubiquitous which are covered in other answers - this doesn't mean it is always the 'best' format, however.)
To this end a binary serializer can also be used (BSON being a trivial example as it is effectively JSON-like in structure and operation). Just replace JSON.parse with FORMATX.parse as appropriate.
The only requirements are then:
There is a suitable serializer/deserializer for the all the clients and servers. JSON works well here because it is so popular and there is no shortage of implementations.
There are various binary serialization libraries with both Python and C# libraries, but it will require finding a 'happy intersection'.
The serialization format can represent the data. JSON usually works sufficiently and it has a very nice 1-1 correspondence with basic object graphs and simple values. It is also inherently schema-less.
Some formats are better are certain tasks and have different characteristics, features, or tool-chains. However most concepts (and arguably most DTOs) can be mapped onto JSON easily which makes it a good 'default' choice.
The other differences between different kinds of binary and text serializations is most mostly dressing - but if you'd like to start talking about schema vs. schema-less, extensibility, external tooling, metadata, non-compressed encoded sizes (or size after transport compression), compliance with a specific existing protocol, etc..
.. but the point to take away is don't create a 'new' one-off format. Unless of course, you just like making wheels or there is a very specific use-case to fit.
First advice would be to use the same format for both ways, not plain text in one direction and JSON in the other.
I personally think {'foo':0,'bar':1} is better than foo,0,bar,1 because everybody understands JSON but for your custom format they might not without some explanations. The idea is you are inventing a data interchange format when JSON is already one and #jfriend00 is right, pretty much every language now understands JSON, Python included.
Regarding text vs binary, there isn't any consensus. As # user2864740 mentions in the comments to my answer as long as the two sides understand each other, it doesn't really matter. This only becomes relevant if one of the sides has a preference for a format (consider for example opening the connection from the browser, using JavaScript - for that people might prefer JSON instead of binary).
My advice is to go with something simple as JSON and design your app so that you can change the wire format by swapping in another implementation without affecting the logic of your application.

Proto Buffers not storing data in readable format

I am saving my protobuf messages to file and the format is all messed. I have seen it done before where the protobug messages would be saved to disk in near the same format as the .proto file. I am doing it like:
using (Stream output = File.OpenWrite(#"logs\listings.txt"))
{
listingBook.AddClisting(_listing);
listingBook.Build().WriteTo(output);
}
But what I get is a mangled file that seems ENTER separated with strange tags. What I want it to look like when it is saved to disk is like the example:
# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "jdoe#example.com"
}
Pay more attention to the comment
This is not the binary format used on the wire.
Protobuf messages are not designed to be human-readable. Storing them in a text file makes no sense; they are not text.
The primary protobuf encoding format is binary. There is a secondary text format exposed by some implementations, but it kinda loses a lot of the advantages of protobuf, and library support for it is patchy (if it is even formally defined). I would say: if you want human readable, use XML or json. Not protocol buffers.
Using PrintTo instead of WriteTo keeps the data in a readable format. Finally found it.
As protobuf is intended to be fast, binary compatible and optimal;Messages stored as human readable is mostly out of the question. There is the JSONFormatter Utility however:
It's primary purpose is that of what you asked for, but be aware that this probably makes everything significantly slower; While adding some overhead because of the conversion.

Categories