Reading Protobuf TCP Packets with existing C# classes - c#

This problem seems simple and do-able enough, but I cannot for the life of me get it to work.
I have:
A PCAP file with a few packets I know are some type of ProtoBuf data (probably created with protobuf-csharp-port)
All the possible C# classes from an assembly decorated with:
[DebuggerNonUserCode, CompilerGenerated, GeneratedCode("ProtoGen", "2.4.1.473")]
public sealed class thing : GeneratedMessageLite<thing, thing.Builder>
All I want to do is parse those packets using what I know from the assembly file. Simple? Probably, but no matter what I do, nothing actually is getting parsed.
Here's an example of one of the many possible classes:
[DebuggerNonUserCode, CompilerGenerated, GeneratedCode("ProtoGen", "2.4.1.473")]
public sealed class Thing: GeneratedMessageLite<Thing, Thing.Builder>
{
// Fields
private static readonly string[] _thingFieldNames = new string[] { "list" };
private static readonly uint[] _thingFieldTags = new uint[] { 10 };
...
public static Builder CreateBuilder()
{
return new Builder();
}
...
public static thing ParseFrom(ByteString data)
{
return CreateBuilder().MergeFrom(data).BuildParsed();
}
...
public override void WriteTo(ICodedOutputStream output)
{
int serializedSize = this.SerializedSize;
string[] strArray = _thingFieldNames;
if (this.list_.Count > 0)
{
output.WriteMessageArray<thingData>(1, strArray[0], this.list_);
}
}
...
[DebuggerNonUserCode, GeneratedCode("ProtoGen", "2.4.1.473"), CompilerGenerated]
public static class Types
{
// Nested Types
[CompilerGenerated, GeneratedCode("ProtoGen", "2.4.1.473")]
public enum PacketID
{
ID = 19
}
}
}
There are many others like that. I've tried doing something like this with each packet (using protobuf-csharp-port):
Console.WriteLine(Thing.ParseFrom(packet.Buffer).ToString());
I'm expecting to see the actual text data. But I either get nothing, an error about invalid packet tags, or an error about it being a "0".
I've also tried using protobuf-net, but it just gives me random errors about incompatibility, unexpected types, etc.:
Console.WriteLine(ProtoBuf.Serializer.Deserialize<Thing>(ms));
What on Earth am I doing wrong here? Is there a better way to, using all the known types in an assembly, simply decode the Protobuf message and see what's inside? Ideally without having to know beforehand what type of message it is?
Thank you so much if you can figure this out!

Guessing from the failed attempts outlined in the question, I believe that you have some misconceptions about the content of your pcap file.
This line in particular
Console.WriteLine(Thing.ParseFrom(packet.Buffer).ToString());
makes me think that you are working under the wrong assumption that a single pcap packet contains the serialized bytes of one single object. Unfortunately, this is not the case.
As you might know, TCP/IP networks use a layered protocol stack, where each layer adds functionality and isolates upper layer protocols from the details of lower layer protocols (and vice versa). This is done by encapsulating the data sent from the upper layers down to the network and de-encapsulating the data as it travels up the stack on the receiving side.
Now, your pcap file contains the raw data as seen by your network interface, i.e. the serialized payload plus all the data added by the application, transport, internet, and link layer.
Now, if you want to de-serialize the objects contained in your dump, you will need to write some code that removes all the headers of the link layer and internet protocols, (un-)does the work of the transport protocol and reassembles the stream of bytes that was sent over the network.*
Next, you will need to analyze the resulting byte dump and make some sophisticated guesses about the design of the application level protocol. Does it implement a handshake when it starts communicating? Does it send a checksum together with the actual payload? Was the data compressed before it was sent over the network? Does the application encrypt the data prior to sending it? If TCP was used as the transport protocol, how is message framing implemented etc. Of course, if you have access to the source code of the application that generated the data (or at least the application binaries), then you can just read the code (or reverse engineer the binaries) to figure this part out.
Once you are at this point you are in a position to interpret the raw data. All that is left is to write some code that extracts the relevant bytes, feeds it to the protocol-buffer deserializer and voilà, you have your objects back!
(* And there are other minor issues like fragmented IP packets, TCP segments that arrived out of order, and TCP retransmissions, of course.)
To sum it all up:
It is theoretically possible to write a tool that de-serializes objects that were serialized using protocol-buffers from a pcap dump, provided that the dump contains the full communication between both peers, i.e. packets were not truncated by the tool that generated the dump etc.
In practice however, there are multiple hurdles to overcome that are anything but trivial even for a seasoned practitioner of the art, as such a tool must:
be able to deal with all the complex issues of the lower level protocols of TCP/IP to rebuild the data flow between the peers.
be able to understand the application level protocol that was used to transmit the serialized objects.
Note that point 1 above alone results in the requirement to implement the functionality of a TCP/IP stack at least in part. The easiest way to accomplish this would probably consist in reusing code of an open source TCP/IP implementation such as the one found in the Linux or *BSD kernel. Many tools that do similar things, like reconstructing HTTP traffic from capture files, do exactly this. (See e.g. Justsniffer.)

Related

How to deal with network data packets in a client/server application code?

I'm creating a client/server application using C# and I would like to know the best approach to deal with data packets in C# code.
From what I read I could A) use the Marshal class to construct a safe class from a data packet raw buffer into or B) use raw pointers in unsafe context to directly cast the data packet buffer into a struct and use it on my code.
I can see big problems in both approaches. For instance using Marshal seems very cumbersome and hard to maintain (compared to the pointer solution) and on the other hand the use of pointers in C# is very limited (e.g. I can't even declare a fixed size array of a non-primitive type which is a deal breaker in such application).
That said I would like to know which of these two approaches is better to deal with network data packets and if there is any other better approach to this. I'm open to any other possible solution.
PS.: I'm actually creating a custom client to an already existing client/server application so the communication protocol between client and server is already created and can not be modified. So my C# client need to adhere to this already existing protocol to its lower level details (eg. binary offsets of each information in the data packets are fixed and need to be respected).
the best approach to deal with network packets. you have to create the Payload attached to the header of actual packet and you receiving client always first read the header and convert it to the actual length. then set the buffer length that you received in the header. it will perfectly work without any loss of packets. it will also improve the memory leakage issue. you don't need to create the hard coded array to get the buffer bytes. just append the actual byte length to the header of packet. it will dynamically set the buffer bytes.
like.
public void SendPackettoNetwork(AgentTransportBO ObjCommscollection, Socket Socket)
{
try
{
byte[] ActualBufferMessage = PeersSerialization(ObjCommscollection);
byte[] ActualBufferMessagepayload = BitConverter.GetBytes(ActualBufferMessage.Length);
byte[] actualbuffer = new byte[ActualBufferMessage.Length + 4];
Buffer.BlockCopy(ActualBufferMessagepayload, 0, actualbuffer, 0, ActualBufferMessagepayload.Length);
Buffer.BlockCopy(ActualBufferMessage, 0, actualbuffer, ActualBufferMessagepayload.Length, ActualBufferMessage.Length);
Logger.WriteLog("Byte to Send :" + actualbuffer.Length, LogLevel.GENERALLOG);
Socket.Send(actualbuffer);
}
catch (Exception ex)
{
Logger.WriteException(ex);
}
}
Just pass your transport class object. use you serialization approach here i used binary formatter serialization of class object

Protobuf Exception When Deserializing Large File

I'm using protobuf to serialize large objects to binary files to be deserialized and used again at a later date. However, I'm having issues when I'm deserializing some of the larger files. The files are roughly ~2.3 GB in size and when I try to deserialize them I get several exceptions thrown (in the following order):
Sub-message not read correctly
Invalid wire-type; this usually means you have over-written a file without truncating or setting the length; see Using Protobuf-net, I suddenly got an exception about an unknown wire-type
Unexpected end-group in source data; this usually means the source data is corrupt
I've looked at the question referenced in the second exception, but that doesn't seem to cover the problem I'm having.
I'm using Microsoft's HPC pack to generate these files (they take a while) so the serialization looks like this:
using (var consoleStream = Console.OpenStandardOutput())
{
Serializer.Serialize(consoleStream, dto);
}
And I'm reading the files in as follows:
private static T Deserialize<T>(string file)
{
using (var fs = File.OpenRead(file))
{
return Serializer.Deserialize<T>(fs);
}
}
The files are two different types. One is about 1GB in size, the other about 2.3GB. The smaller files all work, the larger files do not. Any ideas what could be going wrong here? I realise I've not given a lot of detail, can give more as requested.
Here I need to refer to a recent discussion on the protobuf list:
Protobuf uses int to represent sizes so the largest size it can possibly support is <2G. We don't have any plan to change int to size_t in the code. Users should avoid using overly large messages.
I'm guessing that the cause of the failure inside protobuf-net is basically the same. I can probably change protobuf-net to support larger files, but I have to advise that this is not recommended, because it looks like no other implementation is going to work well with such huge data.
The fix is probably just a case of changing a lot of int to long in the reader/writer layer. But: what is the layout of your data? If there is an outer object that is basically a list of the actual objects, there is probably a sneaky way of doing this using an incremental reader (basically, spoofing the repeated support directly).

Message Queues with different message types

I'm investigating Microsoft Message Queues for doing inter-process cross-network messaging. But when I receive a message, I don't know a priori what type of object I'm getting, so the code
queue.Formatter = new XmlMessageFormatter(new Type[] { typeof(Wibble) });
can't be applied before I get the message because I don't know if it's a Wibble. So how do I receive different message types?
You're already using the constructor overload for XmlMessageFormatter that accepts an array of types. So just add all of the types that you're expecting to receive into that array, rather than just one type.
queue.Formatter = new XmlMessageFormatter(new Type[] {
typeof(Wibble),
typeof(Fleem),
typeof(Boo)
});
From TargetTypes:
The instance serialized in the message body must comply with one of the schemas represented in the type array. When you read the message using the Receive method, the method creates an object of the type that corresponds to the schema identified and reads the message body into it.
(Emphasis added)
You might consider not storing your object in the MSMQ message, but instead putting a reference to it's persistent location if you can. MSMQ has finite space on the message queues, so smaller messages are best.
If you can't do that, you can serialize your object to the messages BodyStream directly, using whatever serializer you like. Then store the type name as well, probably best in the message Label.
Something very similar to this (scratched it out here, no IDE on this computer) to put it in, and the analagous action on the way out:
public void FormatObject(object toFormat, Message message)
{
var serializer = new XmlSerializer(toFormat.GetType());
var stream = new MemoryStream();
serializer.Serialize(toFormat, stream);
//don't dispose the stream
message.BodyStream = stream;
message.Label = toFormat.GetType().AssemblyQualifiedName;
}
There is a great amount of misinformation running around on MSMQ primarily because the Microsoft documentation is frighteningly sparse on how to design a message send receive properly. I have both of the MSMQ books published on this subject and I'm still searching for sensible designs on the internet.
So, neither of these references say that there is a one message type to a queue requirement. And that would make PeakMessage and variants unnecessary and even stupid. Microsoft is vague and difficult in its documentation, but I've worked there and they are never stupid.
There is a constant irritating suggestion to use a CLSID as an identifier, a practice that is annoyingly short sighted. How about trying to embed the message type in the LABEL??? Then use PeadMessage to run up the Queue until you find a message meant expressly for your particular queue and with a message type that you can use to format the message properties to receive the message properly on the first try???
I know this makes for a complex code set, but would you rather do without? Or would you actually try to implement the suggestion of the responder above who is implying that if you have a system of 200 users with 200 message types that they should create 80,000 queues to manage all the one to one requirements? Some people just don't think these things through.
As joocer notes in a comment: use a different queue for different message types.
Alternatively you could agree with the message senders that all messages will be XML (anything that doesn't parse as XML is rejected). Then also agree to some basics of the XML schema: a header element with a message type (and version).
Then process (either yourself of via a serialiser) into the internal type.
Of course in many cases – where there is no real benefit to a deserialisation – just read the content of the XML as required.

C# - Serializing Packets Over a Network

I am developing a networked application that sends a lot of packets. Currently, my method of serialization is just a hack where it takes a list of objects and converts them into a string delimited by a pipe character '|' and flushes it down the network stream (or just sends it out through UDP).
I am looking for a cleaner solution to this in C# while minimizing
packet size (so no huge XML serialization).
My experiences with BinaryFormatter is SLOW. I am also considering compressing my packets by encoding them into base64 strings and them decoding them on the client side. I would like some input on seeing how this will effect the performance of my application.
Also, another quick question:
My setup creates 2 sockets (one TCP and UDP) and the client connects individually to these two sockets. Data is flushed down either one based off of the need (TCP for important stuff, UDP for unimportant stuff). This is my first time using TCP/UDP simultaneously and was wondering
if there is a more unified method, although it does not seem so.
Thanks as always for the awesome support.
I would use a binary protocol similar to Google's Protocol Buffers. Using John Skeet's protobuf-csharp-port one can use the WriteDelimitedTo and MergeDelimitedFrom methods on IMessage and IBuilder respectively. These will prefix the message with the number of bytes so that they can consumed on the other end. Defining messages are really easy:
message MyMessage {
required int32 number = 1;
}
Then you build the C# classes with ProtoGen.exe and just go to town. One of the big benefits to protobuffers (specifically protobuf-csharp-port) is that not every endpoint needs to be upgraded at the same time. New fields can be added and consumed by previous versions without error. This version independence can be very powerful, but can also bite you if you're not planning for it ;)
You could look into using ProtoBuf for the serilization
I personally have used following system:
Have abstract Packet class, all packets are derived from. Packet class defines two virtual methods:
void SerializeToStream(BinaryWriter serializationStream)
void BuildFromStream(BinaryReader serializationStream)
This manual serialization makes it possible to create small sized packets.
Before sending to socket, packets are length prefixed and prefixed with unique packet type id number. Receiving end can then use Activator.CreateInstance to build appropriate packet and call BuildFromStream to reconstruct the packet.
Example packet:
class LocationUpdatePacket : Packet
{
public int X;
public int Y;
public int Z;
public override void SerializeToStream(BinaryWriter serializationStream)
{
serializationStream.Write(X);
serializationStream.Write(Y);
serializationStream.Write(Z);
}
public override void BuildFromStream(BinaryReader serializationStream)
{
X = serializationStream.ReadInt32();
Y = serializationStream.ReadInt32();
Z = serializationStream.ReadInt32();
}
}
I am developing a networked application that sends a lot of packets
Check out networkComms.net, an open source network communication library, might save you a fair bit of time. It incorporates protobuf for serialisation, an example of which is here, line 408.

Google Protocol Buffers Serialization hangs writing 1GB+ data

I am serializing a large data set using protocol buffer serialization. When my data set contains 400000 custom objects of combined size around 1 GB, serialization returns in 3~4 seconds. But when my data set contains 450000 objects of combined size around 1.2 GB, serialization call never returns and CPU is constantly consumed.
I am using .NET port of Protocol Buffers.
Looking at the new comments, this appears to be (as the OP notes) MemoryStream capacity limited. A slight annoyance in the protobuf spec is that since sub-message lengths are variable and must prefix the sub-message, it is often necessary to buffer portions until the length is known. This is fine for most reasonable graphs, but if there is an exceptionally large graph (except for the "root object has millions of direct children" scenario, which doesn't suffer) it can end up doing quite a bit in-memory.
If you aren't tied to a particular layout (perhaps due to .proto interop with an existing client), then a simple fix is as follows: on child (sub-object) properties (including lists / arrays of sub-objects), tell it to use "group" serialization. This is not the default layout, but it says "instead of using a length-prefix, use a start/end pair of tokens". The downside of this is that if your deserialization code doesn't know about a particular object, it takes longer to skip the field, as it can't just say "seek forwards 231413 bytes" - it instead has to walk the tokens to know when the object is finished. In most cases this isn't an issue at all, since your deserialization code fully expects that data.
To do this:
[ProtoMember(1, DataFormat = DataFormat.Group)]
public SomeType SomeChild { get; set; }
....
[ProtoMember(4, DataFormat = DataFormat.Group)]
public List<SomeOtherType> SomeChildren { get { return someChildren; } }
The deserialization in protobuf-net is very forgiving (by default there is an optional strict mode), and it will happily deserialize groups in place of length-prefix, and length-prefix in place of groups (meaning: any data you have already stored somewhere should work fine).
1.2G of memory is dangerously close to the managed memory limit for 32 bit .Net processes. My guess is the serialization triggers an OutOfMemoryException and all hell breaks loose.
You should try to use several smaller serializations rather than a gigantic one, or move to a 64bit process.
Cheers,
Florian

Categories