Redis messages contain unicode?

Redis messages contain unicode? - c#

I am currently reading in from a Redis server and from the Redis-Cli monitor I am receiving messages that are publishes that have data in-between unicode that looks like this:
\x93\xa6Xz05FH\x83\xa4type\x02\xa4data\x92\xad
Is this generated from some redis related thing or is this because of the original publisher ?
If I publish messages there is no unicode attached so I find it hard to believe this is Redis' doing but I want to be sure.

Redis doesn't care about the format of things unless you are using numeric operators such as incr, hincrby, etc; other than that, strings, hash fields, pub/sub payloads etc are all binary-safe opaque values defined purely by what the client specified. Since you mention StackExchange.Redis, if you use a .NET string to populate a RedisValue, then that string will be encoded with UTF-8. However, there are other ways of populating a RedisValue for specifying an arbitrary binary payload - the most common of which would be a byte[].
It is possible that redis-cli - since it is a text-based tool - is switching to this representation by a simple heuristic of "does something look non-ASCII / non-printable" - that \x02 is looking particularly suspicious (is that a STX?). But the actual data flowing through redis on a publish is raw binary.

Related

How can i evaluate a stream (string)

First of all, what I do at the moment:
I sniff a asyncron serial bus with 9 bit protocol and send the data to the PC. At the PC side I receive the data as an endless string, that looks like that: .12_80E886.02_80E894.13. The Software of the PC-side is written with winforms with C#. Now I have the problem that I haven´t a clearly start you can see it in the stream example. The reason for that is, that I start the sniff somewhere in the protocol.
What I want to do:
I think I can use startindex = IndexOf("_"), and set them now as new start. I have to evaluate sign´s in the stream the stream is build: _(timestamp in milliseconds).(addressbyte databyte). The only what I want to display in my RichTextBox is the databyte, also I need a data management method for the timestamp. Because I have in the GUI the function that I can see the time beetween two or more databyte´s, for that I think I make a sql database. The addressbyte need I to collor the byte with an one as address in a special collor.
Question:
How can I evaluate the stream so that i have alternately timestamp,
addressbyte and than databyte as single substring?
The reason why I want them so, is that, I think I can make an easy if elseif else block to realize all what I want to do.
When someone has an better suggestion for my project pls write it as comment.
With friendly wishes sniffi

I think you're trying to solve two problems at the same time. It would be better to separate them and solve them individually.
There is the issue of transporting the data, for this you are using streams. That is a valid solution. There is sending and receiving the data (bits) over the stream.
You have the problem of transforming these bits (after receiving them) into actual objects (dates, strings, etc..). For that you an use a simple parser, tokenizer, a local script that can get the correct parts from the data and convert it, or you can use a serialization framework (like DataContracts).
If you have simple data, I would opt for using a single method that can parse the data. For more complex scenarios I would look into serialization.
Also be ware that you will need to validate your inputs, since you cannot assume that there is always a trusted (non compromised) piece of software that is sending the bits to you.

I think string is bad choice. Propably data is send as bytes. Sniff rather bytes than string. And you need protocol description to understand data.
You need to read bytes form bus and interpret it.

Is there a preferred manner for sending data over a web socket connection?

Is there a 'correct' or preferred manner for sending data over a web socket connection?
In my case, I am sending the information from a C# application to a python (tornado) web server, and I am simply sending a string consisting of several elements separated by commas. In python, I use rudimentary techniques to split the string and then structure the elements into an object.
e.g:
'foo,0,bar,1'
becomes:
object = {
'foo': 0,
'bar': 1
}
In the other direction, I am sending the information as a JSON string which I then deserialise using Json.NET
I imagine there is no strictly right or wrong way of doing this, but are there significant advantages and disadvantages that I should be thinking of? And, somewhat related, is there a consensus for using string vs. binary formats?

Writing a custom encoding (eg, as "k,v,..") is different than 'using binary'.
It is still text, just a rigid under-defined one-off hand-rolled format that must be manually replicated. (What happens if a key or value contains a comma? What happens if the data needs to contain nested objects? How can null be interpreted differently than '' or 'null'?)
While JSON is definitely the most ubiquitous format for WebSockets one shouldn't (for interchange purposes) write JSON by hand - one uses an existing serialization library on both ends. (There are many reasons why JSON is ubiquitous which are covered in other answers - this doesn't mean it is always the 'best' format, however.)
To this end a binary serializer can also be used (BSON being a trivial example as it is effectively JSON-like in structure and operation). Just replace JSON.parse with FORMATX.parse as appropriate.
The only requirements are then:
There is a suitable serializer/deserializer for the all the clients and servers. JSON works well here because it is so popular and there is no shortage of implementations.
There are various binary serialization libraries with both Python and C# libraries, but it will require finding a 'happy intersection'.
The serialization format can represent the data. JSON usually works sufficiently and it has a very nice 1-1 correspondence with basic object graphs and simple values. It is also inherently schema-less.
Some formats are better are certain tasks and have different characteristics, features, or tool-chains. However most concepts (and arguably most DTOs) can be mapped onto JSON easily which makes it a good 'default' choice.
The other differences between different kinds of binary and text serializations is most mostly dressing - but if you'd like to start talking about schema vs. schema-less, extensibility, external tooling, metadata, non-compressed encoded sizes (or size after transport compression), compliance with a specific existing protocol, etc..
.. but the point to take away is don't create a 'new' one-off format. Unless of course, you just like making wheels or there is a very specific use-case to fit.

First advice would be to use the same format for both ways, not plain text in one direction and JSON in the other.
I personally think {'foo':0,'bar':1} is better than foo,0,bar,1 because everybody understands JSON but for your custom format they might not without some explanations. The idea is you are inventing a data interchange format when JSON is already one and #jfriend00 is right, pretty much every language now understands JSON, Python included.
Regarding text vs binary, there isn't any consensus. As # user2864740 mentions in the comments to my answer as long as the two sides understand each other, it doesn't really matter. This only becomes relevant if one of the sides has a preference for a format (consider for example opening the connection from the browser, using JavaScript - for that people might prefer JSON instead of binary).
My advice is to go with something simple as JSON and design your app so that you can change the wire format by swapping in another implementation without affecting the logic of your application.

HTTP POST - Can it contain complex objects directly?

From all I've read it seems that it's always of the form string=string&string=string... (all the strings being encoded to exclude & and =) however, searching for it (e.g. Wikipedia, SO, ...) I haven't found that mentioned as an explicit restriction.
(Of course a base64 string of a binary of complex objects can be sent. That's not the question.) But:
Can POST contain complex objects directly or is it all sent as a string?

There is nothing in HTTP that prevents the posting of binary data. You do not have to convert binary data to base64 or other text encodings. Though the common "key1=val1&key=val2" usage is very widely conventional and convenient it is not required. It only depends upon what the sender and receiver agree upon. See these threads or google "http post binary data" or the like.
Sending binary data over http
How to correctly send binary data over HTTPS POST?

It is just a string, just like any binary stream. There's various ways to encode complex objects to fit into a string though. base64 is an option, and so is json (the latter probably being more desirable).
PHP has a specific way to deal with this.. This:
a[]=1&a[]=2
Will result in an array with 1, 2.
This:
a[foo]=bar&a[gir]=zim
Creates also an array with 2 keys.
I've also seen this format in some frameworks:
a.foo=bar&b.gir=zim
So while urlencoding does not have a specific, standard syntax to do this.. that does not mean you can add meaning and do your own post-processing.
If you're buidling an API, you are probably best off not using urlencoding at all... There's much more capable and better formats. You can use whatever Content-Type you'd like.

HTTP itself is just based on strings. There's no notion of "objects", only text. The definition of "object" is dependent on whatever data format you transport over HTTP (XML, JSON, binary files, ...).
So, POST can contain "complex objects" if they are appropriately encoded into text.

read sniffing data over tcp

i'm developing application that is listening to the data coming to the pc and store it in a db
when i'm trying to use any sniffing software it decode the data and i can read it...
but in my code ....i cant read it at all
it come in a format like that
1822262151622341817118815518211616121520941131921572041519912321413018224510453482062312258624219217426213385792952422362282081777270129716688629114817282188771708157542505055171418651781981425595109572128317191993018793431541418175198551682143218916536118562071014546919618158204181231187237183188160147127165111798312311810419822146114761993113815821216617541542372062129733198212250147199288115346102031191275215728146245198190171121209115149107193226253199151253205183146112072202559697791491441131572351381412278441552554817712614110121823714822712523618924690185291182071331471286244143181469018522814822821118012620321315924832238219115405615512392145202385512115735771691111055935782371281492476567165158924021493139815144225143762294713291762001113814720516216041120169912317914878167571392103510118386589521910621319622274158971538465206168139190127867123282255271781242497522124211517622131122113236255230254211206911242051832545515823012124925217318223920523316923122925514321122343602492471242........
can any one tell me what kind of data is that and any code to solve it out??

To see what a real packet sniffer looks like, check out WireShark. There are many different protocols over TCP, and many of them are binary. Those that aren't may be using unicode characters, which are two-byte characters so an ascii display of them would be meaningless.
Anyway, the data you're displaying is pretty meaningless. It looks like decimal data, are you concatenating a bunch of decimal representations of the binary stream interpreted as byte or integer values? That would explain it. You should start by running the stream through System.TextEncoding.ASCII.Decode You'll probably see some recognizable strings. Then try System.TextEncoding.Unicode.Decode, etc.

No, we cannot. And the reason is simple, we don't know what application you are sniffing.
That stream of data could mean anything.
But, I suggest you print the data in hexadecimal. Maybe the data would make more sense.

FileHelpers-like data import/export utility for binary data?

I use the excellent FileHelpers library when I work with text data. It allows me to very easily dump text fields from a file or in-memory string into a class that represents the data.
In working with a big endian microcontroller-based system I need to read a serial data stream. In order to save space on the very limited microcontroller platform I need to write raw binary data which contains field of various multi-byte types (essentially just dumping a struct variable out the serial port).
I like the architecture of FileHelpers. I create a class that represents the data and tag it with attributes that tell the engine how to put data into the class. I can feed the engine a string representing a single record and get an deserialized representation of the data. However, this is different from object serialization in that the raw data is not delimited in any way, it's a simple binary fixed record format.
FileHelpers is probably not suitable for reading such binary data as it cannot handle the nulls that show up and* I suspect that there might be unicode issues (the engine takes input as a string, so I have to read bytes from the serial port and translate them into a unicode string before they go to my data converter classes). As an experiment I have set it up to read the binary stream and as long as I'm careful to not send nulls it works quite well so far. It is easy to set up new converters that read the raw data and account for endian foratting issues and such. It currently fails on nulls and cannot process multiple records (it expect a CRLF between records).
What I want to know is if anyone knows of an open-source library that works similarly to FileHelpers but that is designed to handle binary data.
I'm considering deriving something from FileHelpers to handle this task, but it seems like there ought to be something already available to do this.
*It turns out that it does not complain about nulls in the input stream. I had an unrelated bug in my test program that came up where I expected a problem with the nulls. Should have investigated a little deeper first!

I haven't used filehelpers, so I can't do a direct comparison; however, if you have an object-model that represents your objects, you could try protobuf-net; it is a binary serialization engine for .NET using Google's compact "protocol buffers" wire format. Much more efficient than things like xml, but without the need to write all your own serialization code.
Note that "protocol buffers" does include some very terse markers between fields (typically one byte); this adds a little padding, but greatly improves version tolerance. For "packed" data (i.e. blocks of ints, say, from an array) this can be omitted if desired.
So: if you just want a compact output, it might be good. If you need a specific output, probably less so.
Disclosure: I'm the author, so I'm biased; but it is free.

When I am fiddling with GPS data in the SIRFstarIII binary mode, I use the Python interactive prompt with the serial module to fetch the stream from the USB/serial port and the struct module to convert the bytes as needed (per some format defined by SIRF). Using the interactive prompt is very flexible because I can read the string to a variable, process it, view the results and try again if needed. After the prototyping stage is finished, I have the data format strings that I need to put into the final program.
Your question doesn't mention anything about why you have a C# tag. I understand FileHelpers is a C# library, but I that doesn't tell me what environment you are working in. There is an implementation of Python for .NET called IronPython.
I realize this answer might mean you have to learn a new language, but having an interactive prompt is a very powerful tool for any programmer.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.