Protobuf-net lazy streaming deserialization of fields - c#

Overall aim: To skip a very long field when deserializing, and when the field is accessed to read elements from it directly from the stream without loading the whole field.
Example classes The object being serialized/deserialized is FatPropertyClass.
[ProtoContract]
public class FatPropertyClass
{
[ProtoMember(1)]
private int smallProperty;
[ProtoMember(2)]
private FatArray2<int> fatProperty;
[ProtoMember(3)]
private int[] array;
public FatPropertyClass()
{
}
public FatPropertyClass(int sp, int[] fp)
{
smallProperty = sp;
fatProperty = new FatArray<int>(fp);
}
public int SmallProperty
{
get { return smallProperty; }
set { smallProperty = value; }
}
public FatArray<int> FatProperty
{
get { return fatProperty; }
set { fatProperty = value; }
}
public int[] Array
{
get { return array; }
set { array = value; }
}
}
[ProtoContract]
public class FatArray2<T>
{
[ProtoMember(1, DataFormat = DataFormat.FixedSize)]
private T[] array;
private Stream sourceStream;
private long position;
public FatArray2()
{
}
public FatArray2(T[] array)
{
this.array = new T[array.Length];
Array.Copy(array, this.array, array.Length);
}
[ProtoBeforeDeserialization]
private void BeforeDeserialize(SerializationContext context)
{
position = ((Stream)context.Context).Position;
}
public T this[int index]
{
get
{
// logic to get the relevant index from the stream.
return default(T);
}
set
{
// only relevant when full array is available for example.
}
}
}
I can deserialize like so: FatPropertyClass d = model.Deserialize(fileStream, null, typeof(FatPropertyClass), new SerializationContext() {Context = fileStream}) as FatPropertyClass; where the model can be for example:
RuntimeTypeModel model = RuntimeTypeModel.Create();
MetaType mt = model.Add(typeof(FatPropertyClass), false);
mt.AddField(1, "smallProperty");
mt.AddField(2, "fatProperty");
mt.AddField(3, "array");
MetaType mtFat = model.Add(typeof(FatArray<int>), false);
This will skip the deserialization of array in FatArray<T>. However, I then need to read random elements from that array at a later time. One thing I tried is to remember the stream position before deserialization in the BeforeDeserialize(SerializationContext context) method of FatArray2<T>. As in the above code: position = ((Stream)context.Context).Position;. However this seems to always be the end of the stream.
How can I remember the stream position where FatProperty2 begins and how can I read from it at a random index?
Note: The parameter T in FatArray2<T> can be of other types marked with [ProtoContract], not just primitives. Also there could be multiple properties of type FatProperty2<T> at various depths in the object graph.
Method 2: Serialize the field FatProperty2<T> after the serialization of the containing object. So, serialize FatPropertyClass with length prefix, then serialize with length prefix all fat arrays it contains. Mark all of these fat array properties with an attribute, and at deserialization we can remember the stream position for each of them.
Then the question is how do we read primitives out of it? This works OK for classes using T item = Serializer.DeserializeItems<T>(sourceStream, PrefixStyle.Base128, Serializer.ListItemTag).Skip(index).Take(1).ToArray(); to get the item at index index. But how does this work for primitives? An array of primitives does not seem to be able to be deserialized using DeserializeItems.
Is DeserializeItems with LINQ used like that even OK? Does it do what I assume it does (internally skip through the stream to the correct element - at worst reading each length prefix and skipping it)?
Regards,
Iulian

This question depends an awful lot on the actual model - it isn't a scenario that the library specifically targets to make convenient. I suspect that your best bet here would be to write the reader manually using ProtoReader. Note that there are some tricks when it comes to reading selected items if the outermost object is a List<SomeType> or similar, but internal objects are typically either simply read or skipped.
By starting again from the root of the document via ProtoReader, you could seek fairly efficiently to the nth item. I can do a concrete example later if you like (I haven't leapt in unless you're sure it will actually be useful). For reference, the reason the stream's position isn't useful here is: the library aggressively over-reads and buffers data, unless you specifically tell it to limit its length. This is because data like "varint" is hard to read efficiently without lots of buffering, as it would end up being a lot of individual calls to ReadByte(), rather than just working with a local buffer.
This is a completely untested version of reading the n-th array item of the sub-property directly from a reader; note that it would be inefficient to call this lots of times one after the other, but it should be obvious how to change it to read a range of consecutive values, etc:
static int? ReadNthArrayItem(Stream source, int index, int maxLen)
{
using (var reader = new ProtoReader(source, null, null, maxLen))
{
int field, count = 0;
while ((field = reader.ReadFieldHeader()) > 0)
{
switch (field)
{
case 2: // fat property; a sub object
var tok = ProtoReader.StartSubItem(reader);
while ((field = reader.ReadFieldHeader()) > 0)
{
switch (field)
{
case 1: // the array field
if(count++ == index)
return reader.ReadInt32();
reader.SkipField();
break;
default:
reader.SkipField();
break;
}
}
ProtoReader.EndSubItem(tok, reader);
break;
default:
reader.SkipField();
break;
}
}
}
return null;
}
Finally, note that if this is a large array, you might want to use "packed" arrays (see the protobuf documentation, but this basically stores them without the header per-item). This would be a lot more efficient, but note that it requires slightly different reading code. You enable packed arrays by adding IsPacked = true onto the [ProtoMember(...)] for that array.

Related

How to manipulate structure (array inside)?

This is the structure:
public struct ProfilePoint
{
public double x;
public double z;
byte intensity;
}
It is used inside a callback function (I deleted most of it so it won't make sense, there is a for loop that cycle through every points (arrayIndex) that were scanned on a surface and process them. The result is stored inside profileBuffer):
public static void onData(KObject data)
{
if (points[arrayIndex].x != -32768)
{
profileBuffer[arrayIndex].x = 34334;
profileBuffer[arrayIndex].z = 34343;
validPointCount++;
}
else
{
profileBuffer[arrayIndex].x = 32768;
profileBuffer[arrayIndex].z = 32768;
}
}
}
I would like to process the data inside profileBuffer (both array, x & z).
So far I was "able" to create a function that get one value from profileBuffer with no error from visual studio:
public static int ProcessProfile(double dataProfile)
{
int test=1;
return test;
}
Putting this line:
ProcessProfile(profileBuffer[1].x);
Into onData() result in no error but that's just one value. Ideally, I would like to have the whole array. What confuse me is that every value stored inside profileBuffer are double (forget intensity). But stored in array. Yet I can't import the data like ProcessProfile(profileBuffer.x); I have to specify an index... Is it possible to manipulate a vector (line) of data? That would be ideal for me.
Sorry for the poor explanation / long post... I am quite newb.
you need
public static int ProcessProfile(ProfilePoint []points)
{
var x = points[4].x;
.....
}
and do
ProcessProfile(profileBuffer);

Which data structure allows adding from both sides, but enforces a capacity?

I require a data structure that has a capacity, but also that allows adding an item from either the front or the back. Each time an item is added, one item must be removed from the opposite end. My first thought was that this sound very similar to a Deque.
Is there an existing data structure that provides this functionality, or do I have to create it myself? If it does exist, does the .Net library have an implementation?
Thanks
I would suggest that you use a LinkedList, which gives you all the functionality you need. There are AddFirst and AddLast methods that let you add items at the front or back, and RemoveFirst and RemoveLast methods that let you remove from the front and back.
And, of course, there's a Count property that tells you how many items are in the list, so you can enforce your capacity requirement.
Not tested but something like this I think would work
public class Stack<T>
{
private T[] arr;
readonly int m_Size;
int m_StackPointer = 0;
public T this[int i]
{
get
{
if (i >= m_Size)
throw new IndexOutOfRangeException();
int pointer = i + m_StackPointer;
if (pointer >= (m_Size)) pointer -= m_Size;
return arr[pointer];
}
}
public void AddStart(T addItem)
{
m_StackPointer--;
if (m_StackPointer < 0) m_StackPointer = m_Size - 1;
arr[m_StackPointer] = addItem;
}
public void AddEnd(T addItem)
{
arr[m_StackPointer] = addItem;
m_StackPointer++;
if (m_StackPointer >= m_Size) m_StackPointer = 0;
}
public Stack()
: this(100)
{ }
public Stack(int size)
{
m_Size = size;
arr = new T[size];
}
}
I have decided that the best option is to use an array of T for the backing structure, and have a reference Front and a reference Back to represent the virtual start and end of the structure. I will also store a direction enum that will effectively indicate which direction the structure is facing(whether the last add operation was at the Front or the Back or a default if no add operations have been performed). This way, I can also implement an indexer with O(1) complexity, rather than iterating the collection.
Thanks for all of the responses. For some reason, I thought that I would need to move the data around in the backing structure. I didn't realize that this option is possible in C#.

What collection object is appropriate for fixed ordering of values?

Scenario: I am tracking several performance counters and have a CounterDescription[] correlate to DataSnapshot[]... where CounterDescription[n] describes the data loaded within DataSnapshot[n].
I want to expose an easy to use API within C# that will allow for the easy and efficient expansion of the arrays.
Simplified example (it gets more complex)
CounterDescription[0] = Humidity;
DataSnapshot[0] = .9;
CounterDescription[1] = Temp;
DataSnapshot[1] = 63;
Note how my intent is to correlate many Datasnapshots with a DateTime reference, and using the offset of the data to refer to its meaning. This was determined to be the most efficient way to store the data on the back-end, and has now reflected itself into the following structure:
public class myDataObject {
[DataMember]
public SortedDictionary<DateTime, float[]> Pages { get; set; }
/// <summary>
/// An array that identifies what each position in the array is supposed to be
/// </summary>
[DataMember]
public CounterDescription[] Counters { get; set; }
}
How will myDataObject be used?:
I will frequently search for a counter by string name, and use its' position to determine what offset a particular value will be saved. I can use an homegrown extension method to enumerate the object, or leverage the framework if ordering is guaranteed.
Also, I will need to expand each of these arrays as new sensors are added: (float[] and CounterDescription[] ), but whatever data already exists must stay in that relative offset. I don't want the serialized version of this object to confuse Temp (offset 1) with Humidity (offset 0)
Which .NET objects support this fixed ordering, expansion, and enumeration (and optional searching by string)? My guess is to use one of these objects...
Array[] , LinkedList<t>, and List<t>
Use a Dictionary<string, double> so that each name (string) maps to a value (double):
var counters = new Dictionary<string, double>();
counters["Humidity"] = 0.9;
counters["Temp"] = 63;
And use a service that gets and sets the counter values:
[OperationContract]
public double GetCounter(string name)
{
return Counters[name];
}
[OperationContract]
public void SetCounter(string name, double value)
{
Counters[name] = value;
}
You can use your CounterDescription and/or DataSnapshot classes in the same way, but make sure that the class you use as the key (probably CounterDescription) overrides Object.Equals() and Object.GetHashCode() with a proper implementation.
If CounterDescription is a string or an enum, then Dictionary seems like a good fit:
Dictionary<string, double> Counters = new Dictionary<string, double>();
// Then initialize it ...
Counters.Add("Humidity", 0);
Counters.Add("Temp", 0);
// To update:
Counters["Humidity"] = 0.9;
// To query
double humidity = Counters["Humidity"];
You can do the same sort of thing with a enum for the key, rather than a string.
If your CounterDescription type is a complex object, you can still use it as a key, but you'll need to implement IComparable or provide a comparison function.
Any list (IList) has ordered values. I always assume that mere IEnumerables have no strict order underneath; although they usually do, you can't guarantee it. I agree with the others that a Dictionary (or some other IDictionary) is a good fit.

Protobuf-net r282 having problems deserializing object serialized with r249

I've just updated from r249 to r282. Other than replacing the dll I've made no changes. Unfortunately, now deserializing the objects created before the update takes significantly longer. What used to take two seconds now takes five minutes.
Were there syntax changes between versions? Is there anything it no longer supports?
My classes are all using ProtoContract, ProtoMember, and ProtoInclude. I am running VS2010. As far as I was concerned there were no problems with my protocol buffer code. I'm only trying to upgrade because I figured it's good to have the most recent version.
Edit - 2010.09.09
One of the properties of my object is an array of ushorts. I've just noticed that this property did not serialize/deserialize properly with r282. The resulting values of the array are all zeros. The array had values before being serialized (r282) but not after deserialization (r282).
It turns out that despite my efforts, yes there was a breaking change in data format in one of the earlier builds. This only impacts ushort data, which was omitted from the handling at one point. This is regrettable, but the good news is that no data is lost - it is simply a bit inconvenient to access (it is essentially written via a string at the moment).
Here's my suggested workaround; for a member like:
[ProtoBuf.ProtoMember(1)]
public ushort[] Data {get;set;}
Replace that with:
[ProtoBuf.ProtoMember(1)]
private string[] LegacyData {get;set;}
private bool LegacyDataSpecified { get { return false; } set { } }
/* where 42 is just an unused new field number */
[ProtoBuf.ProtoMember(42, Options = MemberSerializationOptions.Packed)]
public ushort[] Data { get; set; }
[ProtoBuf.ProtoAfterDeserialization]
private void SerializationCallback()
{
if (LegacyData != null && LegacyData.Length > 0)
{
ushort[] parsed = Array.ConvertAll<string, ushort>(
LegacyData, ushort.Parse);
if (Data != null && Data.Length > 0)
{
int oldLen = parsed.Length;
Array.Resize(ref parsed, parsed.Length + Data.Length);
Array.Copy(Data, 0, parsed, oldLen, Data.Length);
}
Data = parsed;
}
LegacyData = null;
}
This imports old-style data into LegacyData and merges it during (after) serialization, or writes new-style data from Data. Faster, smaller, and supports both old and new data.

Reading in a binary file containing an unknown quantity of structures (C#)

Ok, so I currently have a binary file containing an unknown number of structs like this:
private struct sTestStruct
{
public int numberOne;
public int numberTwo;
public int[] numbers; // This is ALWAYS 128 ints long.
public bool trueFalse;
}
So far, I use the following to read all the structs into a List<>:
List<sTestStruct> structList = new List<sTestStruct>();
while (binReader.BaseStream.Position < binReader.BaseStream.Length)
{
sTestStruct temp = new sTestStruct();
temp.numberOne = binReader.ReadInt32();
temp.numberTwo = binReader.ReadInt32();
temp.numbers = new int[128];
for (int i = 0; i < temp.numbers.Length; i++)
{
temp.numbers[i] = binReader.ReadInt32();
}
temp.trueFalse = binReader.ReadBoolean();
// Add to List<>
structList.Add(temp);
}
I don't really want to do this, as only one of the structs can be displayed to the user at once, so there is no point reading in more than one record at a time. So I thought that I could read in a specific record using something like:
fileStream.Seek(sizeof(sTestStruct) * index, SeekOrigin.Begin);
But it wont let me as it doesn't know the size of the sTestStruct, the structure wont let me predefine the array size, so how do I go about this??
The sTestStruct is not stored in one consecutive are of memory and sizeof(sTestStruct) is not directly related to the size of the records in the file. The numbers members is a reference to an array which you allocate youself in your reading code.
But you can easily specify the record size in code since it is a constant value. This code will seek to the record at index. You can then read one record using the body of your loop.
const Int32 RecordSize = (2 + 128)*sizeof(Int32) + sizeof(Boolean);
fileStream.Seek(RecordSize * index, SeekOrigin.Begin);
If you have many different fixed sized records and you are afraid that manually entering the record size for each record is error prone you could devise a scheme based on reflection and custom attributes.
Create an attribute to define the size of arrays:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
sealed class ArraySizeAttribute : Attribute {
public ArraySizeAttribute(Int32 length) {
Length = length;
}
public Int32 Length { get; private set; }
}
Use the attribute on your record type:
private struct sTestStruct {
public int numberOne;
public int numberTwo;
[ArraySize(128)]
public int[] numbers; // This is ALWAYS 128 ints long.
public bool trueFalse;
}
You can then compute the size of the record using this sample code:
Int32 GetRecordSize(Type recordType) {
return recordType.GetFields().Select(fieldInfo => GetFieldSize(fieldInfo)).Sum();
}
Int32 GetFieldSize(FieldInfo fieldInfo) {
if (fieldInfo.FieldType.IsArray) {
// The size of an array is the size of the array elements multiplied by the
// length of the array.
var arraySizeAttribute = (ArraySizeAttribute) Attribute.GetCustomAttribute(fieldInfo, typeof(ArraySizeAttribute));
if (arraySizeAttribute == null)
throw new InvalidOperationException("Missing ArraySizeAttribute on array.");
return GetTypeSize(fieldInfo.FieldType.GetElementType())*arraySizeAttribute.Length;
}
else
return GetTypeSize(fieldInfo.FieldType);
}
Int32 GetTypeSize(Type type) {
if (type == typeof(Int32))
return 4;
else if (type == typeof(Boolean))
return 1;
else
throw new InvalidOperationException("Unexpected type.");
}
Use it like this:
var recordSize = GetRecordSize(typeof(sTestStruct));
fileStream.Seek(recordSize * index, SeekOrigin.Begin);
You will probably have to expand a little on this code to use it in production.
From everything I have read, the way you are doing it is the best method to read in binary data as it has the fewest gotchas where things can go wrong.
Define your struct like this:
struct sTestStruct
{
public int numberOne;
public int numberTwo;
[MarshalAs(UnmanagedType.ByValArray, SizeConst=128)]
public int[] numbers; // This is ALWAYS 128 ints long.
public bool trueFalse;
}
And use Marshal.Sizeof(typeof(sTestStruct)).

Categories