XML deserialization - throwing custom errors - c#

So I have the following method:
private int? myIntField
[System.Xml.Serialization.XmlElementAttribute(Form = System.Xml.Schema.XmlSchemaForm.Unqualified)]
public int? IntField{
get {
return this.myIntField;
}
set {
this.myIntField= value;
}
}
Now, I am deserializing xml from a post, if for whatever reason I am getting a string, such as "here is the int field: 55444" instead of 55444, the error I get in response is: Input string was not in a correct format. which isn't very specific, especially considering I will have more than one int field I need to verify.
Originally, I was planning something like this:
private string myIntField
[System.Xml.Serialization.XmlElementAttribute(Form = System.Xml.Schema.XmlSchemaForm.Unqualified)]
public int? IntField{
get {
return this.myIntField.CheckValue();
}
set {
this.myIntField= value;
}
}
Where CheckValue performs a try-parse to an Int32, and if it fails it returns a null and adds an error to a list. However, I can't seem to nail this set-up for the generated classes.
Is there I way I can throw a specific error if I am getting strings in place of ints, DateTimes, etc?

It's easy if you have schema(s) for you XML and validate it against schema before deserializing. Suppose you have schema(s) for your XML, you can initialize a XmlSchemaSet, add your schema(s) in it and the:
var document = new XmlDocument();
document.LoadXml(xml); // this a string holding the XML
document.Schemas.XmlResolver = null; //if you don't need to resolve every references
document.Schemas.Add(SchemaSet); // System.Xml.Schema.XmlSchemaSet instance filled with schemas
document.Validate((sender, args) => { ... }); //args are of type ValidationEventArgs and hold problem if there is one...
Personally I think this is a better approach, because you can validate your XML before deserializing and be sure the XML is correct otherwise the deserializer will most probably throw an exception if something is wrong and you will almost never be able to show a meaningful feedback to the user...
P.S. I recommend creating schema(s) describing the XML

The "Input string was not in a correct format" messages comes from a standard System.FormatException raised by a call to int.Parse, added to the automatically generated assembly that does the deserialization. I don't think you can add some custom logic to that.
One solution is to do something like this:
[XmlElement("IntField")]
[Browsable(false)] // not displayed in grids
[EditorBrowsable(EditorBrowsableState.Never)] // not displayed by intellisense
public string IntFieldString
{
get
{
return DoSomeConvert(IntField);
}
set
{
IntField = DoSomeOtherConvert(value);
}
}
[XmlIgnore]
public int? IntField { get; set; }
It's not perfect, because you can still get access to the public IntFieldString, but at least, the "real" IntField property is used only programmatically, but not by the XmlSerializer (XmlIgnore), while the field that's holding the value back & forth is hidden from programmers (EditorBrowsable), grids (Browsable), etc... but not from the XmlSerializer.

I have three approaches for you.
Assuming your data is being entered by a user in a user interface, use input validation to ensure the data is valid. It seems odd to allow random strings to be entered when it should be an integer.
Use exactly the approach you suggest above. Here's an example using LINQ Pad
void Main()
{
using(var stream = new StringReader(
"<Items><Item><IntValue>1</IntValue></Item></Items>"))
{
var serializer = new XmlSerializer(typeof(Container));
var items = (Container)serializer.Deserialize(stream);
items.Dump();
}
}
[XmlRoot("Items")]
public class Container
{
[XmlElement("Item")]
public List<Item> Items { get; set; }
}
public class Item
{
[XmlElement("IntValue")]
public string _IntValue{get;set;}
[XmlIgnore]
public int IntValue
{
get
{
// TODO: check and throw appropriate exception
return Int32.Parse(_IntValue);
}
}
}
Take control of serialization using IXmlSerializable, here's another example
void Main()
{
using(var stream = new StringReader(
"<Items><Item><IntValue>1</IntValue></Item></Items>"))
{
var serializer = new XmlSerializer(typeof(Container));
var items = (Container)serializer.Deserialize(stream);
items.Dump();
}
}
[XmlRoot("Items")]
public class Container
{
[XmlElement("Item")]
public List<Item> Items { get; set; }
}
public class Item : IXmlSerializable
{
public int IntValue{get;set;}
public void WriteXml (XmlWriter writer)
{
writer.WriteElementString("IntValue", IntValue.ToString());
}
public void ReadXml (XmlReader reader)
{
var v = reader.ReadElementString();
// TODO: check and throw appropriate exception
IntValue = int.Parse(v);
}
public XmlSchema GetSchema()
{
return(null);
}
}

Related

Is it possible to optimize large switch statements in C#?

I am working on a websocket client application. The server send messages in JSON format and I want to deserialize it. There have one string in the JSON format data that shows the type of message (it has about 50 types today, maybe it will have more in the future).
So I have written a large switch statement like this:
switch(type){
case "type1":
DoSth<T1>(DeserializeFunction<T1>(message));
break;
case "type2":
DoSth<T2>(DeserializeFunction<T2>(message));
break;
//...
}
Is it possible to optimize this statement?
This is the model:
public record EventMessage<T> where T : IEventExtraBody
{
// this will always be 0
[JsonPropertyName("s")]
public int EventType { get; set; }
[JsonPropertyName("sn")]
public long SerialNumber { get; set; }
[JsonPropertyName("d")]
public EventMessageData<T> Data { get; set; }
public override string ToString()
{
return JsonSerializer.Serialize(this);
}
}
public record EventMessageData<T> where T : IEventExtraBody
{
// Some other properties
[JsonPropertyName("extra")]
public EventMessageExtra<T> Extra { get; set; }
}
public record EventMessageExtra<T> where T : IEventExtraBody
{
[JsonPropertyName("type")]
public string Type { get; set; } // this string indicates the type of message
[JsonPropertyName("body")]
public T Body { get; set; }
}
Body (an example):
public record ExitedGuildEvent : IEventExtraBody
{
[JsonPropertyName("user_id")]
public string UserId { get; set; }
[JsonPropertyName("exited_at")]
public long ExitedAt { get; set; }
}
When message arrived, I use JsonDocument to get the type string.
var typeString = JsonDocument.Parse(message.Text).RootElement.GetProperty("d").GetProperty("extra").GetProperty("type").GetString()
Then, I want to deserialize the message and publish it to MessageHub.
Deserializing the json string and publish:
_messageHub.Publish(JsonSerializer.Deserialize<EventMessage<BodyType>>(message.Text));
And because there are lots of BodyType, and EventMessage<Type.GetType("TypeClassPath")>(message.Text) is illegal, I write a large switch statement.
Maybe I have build a very bad model for this situation. I hope you can give me some advice.
You could replace switch-case with a hashmap. To do that you just need to move every case into separate function. Here you can create a factory method to help you to fill out a hashmap because cases are pretty similar
public class YourHub
{
private IMessageHub _messageHub = new MessageHub();
private Dictionary<string, Action<string, IMessageHub>> _methods;
public YourHub()
{
//fill out the hashmap for all types that you have
//make sure this hashmap is shared between operations
_methods = new Dictionary<string, Action<string, IMessageHub>>()
{
{"key1", CreateAction<EventMessage<ExitedGuildEvent>>() }
};
}
//factory method for the actions
private Action<string, IMessageHub> CreateAction<T>()
{
return (json, hub) => hub.Publish(JsonSerializer.Deserialize<T>(json, null));
}
public void ProcessMessage(string json)
{
var typeString = JsonDocument
.Parse(json)
.RootElement.GetProperty("d")
.GetProperty("extra")
.GetProperty("type")
.GetString();
if (!_methods.ContainsKey(typeString)) throw new NotSupportedException();
var method = _methods[typeString];
method(json, _messageHub);
}
}
This aproach won't give you a huge perfomance boost on 50 elements, but it looks cleaner. The runtime complexity is O(1) compared to O(n) with switch-case, but it takes O(n) additional space.
A better solution than a big switch would probably be to refactor DeserializeFunction into an interface and class.
Register It by type and then resolve it. Either with a DI container or by a dictionary where you map.
interface IMessageDeserializer {
object Deserialize(Message message);
}
class Type1Deserializer : IMessageDeserializer {
public object Deserialize(Message message){
// Implementation that returns a Type1
return new Type1(){
};
}
}
// Register your serializers (you can use also a DI container but this is simpler just to show how) in a dictionary, preferably reused
Dictionary<Type, IMessageDeserializer> serializers = new Dictionary<Type, IMessageDeserializer>();
serializers.Add("type1", new Type1Deserializer());
serializers.Add("type2", new Type2Deserializer());
serializers.Add("type3", new Type3Deserializer());
// When you need it, use it like this:
string type = "type1"; // This is from your other code
var message = GetMessage(); // This is from your other code
IMessageDeserializer serializer = serializers[type];
object deserializedMessage = serializer.Deserialize(message);
// To create your event message, either add a constraint to the T of IMessageDeserializer so you can pass it into another function that creates the event message or just simply return the messagehub message as json directly from your IMessageDeserializer implementation)
(I wrote this from memory so I apologise for any mistakes)

How to add a known type to a List<T>?

The overall goal here is like this: We have a lot of CSV files of various names and format stored in Azure blob storage. We need to convert them to lists.
I have an interface:
public interface IGpasData
{
List<T> ConvertToList<T>(StreamReader reader);
}
And then here's an example of a class that Implements it:
public class GpasTableOfContent : IGpasData
{
public string TocProp0 { get; set; }
public string TocProp1 { get; set; }
public string TocProp2 { get; set; }
public List<T> ConvertToList<T>(StreamReader reader)
{
List<T> dataList = new List<T>();
while (!reader.EndOfStream)
{
var lineItem = reader.ReadLine();
GpasTableOfContent dataItem = new GpasTableOfContent
{
TocProp0 = lineItem.Split(',')[0],
TocProp1 = lineItem.Split(',')[1],
Type = lineItem.Split(',')[2]
};
dataList.Add(dataItem);
}
return dataList;
}
}
To keep going with the example of the class above, there is a file called ToC.csv. In a class that is designed to convert THAT file into a list, I make this call:
List<GpasTableOfContent> gpasToCList = ConvertCloudFileToList<GpasTableOfContent>("ToC.csv", "MyModel");
Some other possible examples:
List<GpasFoo> gpasFooList = ConvertCloudFileToList<GpasFoo>("foo.csv", "MyModel");
List<GpasBar> gpasBarList = ConvertCloudFileToList<GpasBar>("bar.csv", "MyModel");
Here's ConvertCloudFileToList:
private List<T> ConvertCloudFileToList<T>(string fileName, string modelName)
{
// Get the .csv file from the InProgress Directory
string filePath = $"{modelName}/{fileName}";
CloudFile cloudFile = _inProgressDir.GetFileReference(filePath);
List<T> dataList = new List<T>();
// Does the file exist?
if (!cloudFile.Exists())
return dataList;
using (StreamReader reader = new StreamReader(cloudFile.OpenRead()))
{
IGpasData gpasData = (IGpasData)Activator.CreateInstance<T>();
dataList = gpasData.ConvertToList<T>(reader);
}
return dataList;
}
And that brings us back to ConvertToList. The problem is here:
dataList.Add(dataItem);
Can not convert 'GpasFoo' to 'T'
Not sure how to work around this.
Any object that is an IGpasData is expected to be able to produce a List of any given type when provided with a StreamReader. GpasTableOfContent does not fulfill this requirement, it can only produce a list of its own type.
However it doesn't seem reasonable to have one type of GpasData be responsible for converting everything so I'd suggest moving the Type argument from the ConvertToList method into the interface. This way subclasses will only be responsible for converting lists of a particular type.
public interface IGpasData<T>
{
List<T> ConvertToList(StreamReader reader);
}
public class GpasTableOfContent : IGpasData<GpasTableOfContent>
{
//...
public List<GpasTableOfContent> ConvertToList(StreamReader reader)
{
//...
}
}
On a side note, creating an empty table of contents and then using it to read from a stream and produce a list of the real table of contents seems very clunky to me. In my opinion, the behaviour of creating these content objects should be moved into its own class.
You can't do what you want here without providing some additional logic. The problem is that you have a string from reading the CSV file, and you want to convert it to a T, but there is no rule for converting a string into any arbitrary type.
One approach would be to change the method to also take a delegate Func that is used to convert each line into a T. Then if, for example, your data is guaranteed to consist of doubles, you could pass t => Double.Parse(t) for that argument. Of course, this approach requires that you change the signature of the interface method you are implementing.
If you are not able to change the signature of the interface method, then all I can suggest is trying to handle a pre-defined set of types and throwing an exception for other types.
As other have pointed out, this design is flawed:
public interface IGpasData
{
List<T> ConvertToList<T>(StreamReader reader);
}
This contract says that an IGpasData should only know how deserialize anything. It doesn't make sense.
An IGpasData should know how to deserialize itself, and for this we would need a self-referencing interface:
public interface IGpasData<T> where T : IGpasData<T>
{
List<T> ConvertToList(StreamReader reader);
}
public class GpasBar: IGpasData<GpasBar>
{
public string MyPropertyA { get; set; }
public int MyPropertyB { get; set; }
public List<GpasBar> ConvertToList(StreamReader reader)
{
var results = new List<GpasBar>();
while (!reader.EndOfStream)
{
var values = reader.ReadLine().Split(',');
results.Add(new GpasBar()
{
PropertyA = values[0],
PropertyB = int.Parse(values[1]),
});
}
return results;
}
}
Or, an IGpasData should know how to populate itself from an array of values:
public interface IGpasData
{
void Populate(string[] values);
}
public class GpasBar
{
public string MyPropertyA { get; set; }
public int MyPropertyB { get; set; }
public void Populate(string[] values)
{
MyPropertyA = values[0];
MyPropertyB = int.Parse(values[1]);
}
}
public static List<T> ConvertCloudFileToList<T>(string fileName, string modelName)
where T : IGpasData, new()
{
// ...
using (StreamReader reader = new StreamReader(cloudFile.OpenRead()))
{
var results = new List<T>();
while (!reader.EndOfStream)
{
var item = new T();
item.Populate(reader.ReadLine().Split(','));
results.Add(item);
}
return results;
}
}
Using this 2nd approach, you can avoid duplicating the part about StreamReader and read lines.

Xml Empty Tag Deserialization

Could you please help me to find the solution to deserialize xml file which contains an empty tag?
Example is here:
<Report>
<ItemsCount></ItemsCount>
</Report>
And I want to deserialize it into object of class like:
public class Report{
public int? ItemsCount { get;set;}
}
my xml schema which i'm using in deserialization is:
[XmlRoot]
public partial class Report
{
private int? itemsCount;
[XmlElement(IsNullable = true)]
public int? ItemsCount {
get
{
return itemsCount;
}
set
{
itemsCount = value;
}
}
It works well if the ItemsCount tag is missing at all, but if it is exist and is empty at the same moment, in that case it throwing the exception regarding lines there this tag is located in xml.
I saw a lot of links here while trying to find the solution, but without success.
And also, i don't want to just ignore the tag for all the cases, i want to get a null value instead then it is empty.
XmlSerializer is trying to convert string.Empty value of tag to integer and failing. Change your property as below to convert data type to string:
[XmlElement]
public string ItemsCount {
get
{
return itemsCount;
}
set
{
itemsCount = value;
}
This will set property Itemscount to empty in the above case.
For null value for the above property the xml should be as below:
<ItemsCount xs:Nil='true'/>
How about this approach?
Define the class as follows:
public class Report
{
[XmlIgnore]
public int? ItemsCount { get; set; }
}
Due to the XmlIgnore attribute, this tag will be treated as unknown.
When creating the serializer add the event handler:
var xs = new XmlSerializer(typeof(Report));
xs.UnknownElement += Xs_UnknownElement;
In the event handler interpret an empty string as null:
private void Xs_UnknownElement(object sender, XmlElementEventArgs e)
{
var report = (Report)e.ObjectBeingDeserialized;
if (e.Element.InnerText == string.Empty)
report.ItemsCount = null;
else
report.ItemsCount = int.Parse(e.Element.InnerText);
}
Use the serializer as usual:
Report report;
using (var fs = new FileStream("test.xml", FileMode.Open))
{
report = (Report)xs.Deserialize(fs);
}
To my understanding, the described behaviour is correct; if the tag ItemsCount is missing, its value is null; if it is empty, its value cannot be converted from "" to a value of int?. That being said, it would be possible to implement some custom parsing into the accessors of ItemsCount, which would have to be of type string. However, this seems more like a workaround to me. If possible, the document should be changed to begin with.

Incorrect XML deserialization

I have the following class:
public class FtpDefinition
{
public FtpDefinition()
{
Id = Guid.NewGuid();
FtpServerAddress = string.Empty;
FtpPortSpecified = false;
FtpPort = "21";
}
[System.Xml.Serialization.XmlElement("Id")]
public System.Guid Id { get; set; }
[System.Xml.Serialization.XmlElement("FtpServerAddress")]
public string FtpServerAddress { get; set; }
[System.Xml.Serialization.XmlElement("FtpPortSpecified")]
public bool FtpPortSpecified { get; set; }
[System.Xml.Serialization.XmlElement("FtpPort")]
public string FtpPort { get; set; }
}
I have a method that gets the following XML string, and using the .net XML deserialization capability
deserializes it into an object of type FtpDefinition.
<FTPDefinition>
<Id>a0a940a7-6785-41be-ac3a-75ba5d4c13ee</Id>
<FtpServerAddress>ftp.noname.com</FtpServerAddress>
<FtpPortSpecified>false</FtpPortSpecified>
<FtpPort>21</FtpPort>
</FTPDefinition>
The problem is, that although the Id and FtpServerAddress fields get populated properly, FtpPort gets
populated with an empty string, and what's more weird is that FtpPortSpecified gets populated with the bool value TRUE instead of FALSE.
I replaced the automatic properties in the above code with actual return\... = value old style getter\setter, so that I can catch the setter getting hit. I was suspecting there's some user code setting the value, but this is not the case. In the call stack it clearly shows that the .net deserialization code is calling the setter with the value TRUE, but one can also see that the XML string provided as parameter to the deserializing method has the correct value (FALSE).
The deserialization code is simple:
XmlSerializer xs = ...(objectType);
using (StringReader stringReader = new StringReader(xml))
{
return xs.Deserialize(stringReader);
}
Please help me figure out what's going on.
The Specified suffix has some special behavior in XML Serialization. Simply change FtpPortSpecified to something else.
http://msdn.microsoft.com/en-us/library/office/bb402199(v=exchg.140).aspx

Differences with BinaryFormatter

I am trying to change the serializer in an existing WCF net.tcp project that uses shared entities on client & server. I am having a hard time figuring out protobuf-net(V2480)
The chart here says I can serialize private members but cannot find documentation to do that, is it possible without attributes? How do I enable graph mode(As Reference) as explained here
Will that solve the issue of protobuf triggering my changed items flag? For example I have a class
public enum FirstEnum
{
First = 0,
Second,
Third
}
public enum AnotherEnum
{
AE1 = 0,
AE2,
AE3
}
[Serializable()]
public class SomeClass
{
public int SomeClassId { get; set; }
public FirstEnum FEnum { get; set; }
public AnotherEnum AEnum { get; set; }
string thing;
public string Thing
{
get{return thing;}
set
{
if (string.IsNullOrEmpty(value))
throw new ArgumentNullException("Thing");
thing = value;
}
}
private decimal firstAmount;
public decimal FirstAmount
{
get{return firstAmount;}
set
{
if (value != firstAmount)
{
firstAmount = value;
changedItems.Add("FirstAmount changed");
}
}
}
private decimal secondAmount;
public decimal SecondAmount
{
get { return secondAmount; }
set
{
if (value != secondAmount)
{
secondAmount = value;
changedItems.Add("SecondAmount changed");
}
}
}
public decimal ThirdAmount { get { return SecondAmount - FirstAmount; } }
public DateTime? SomeDate { get; set; }
private List<string> changedItems = new List<string>();
public List<string> ChangedItems
{
get { return changedItems; }
}
public int PrivateSet { get; private set; }
public SomeClass() { }
public SomeClass(decimal first, decimal second)
{
FirstAmount = first;
SecondAmount = second;
}
public void ClearChangedItems()
{
changedItems.Clear();
}
When I deserialize it with (1000 items)
var model = CreateModel();
items = (List<SomeClass>)model.Deserialize(returnStream, null, typeof(List<SomeClass>));
2012-04-06 09:14:28.1222|DEBUG|ProtobufTEsts.Form1|ProtoBuf Number of changed items : 1000
With BinaryForrmatter
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter binaryFormatter = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
items = (List<SomeClass>)binaryFormatter.Deserialize(returnStream);
2012-04-06 09:14:28.1662|DEBUG|ProtobufTEsts.Form1|BinaryFormatter Number of changed items : 0
Is there a way to get protobuf to behave like the binaryFormatter but preserve the performance of protobuf?
How to allow for private serialization, this fails
public static TypeModel CreateModel()
{
RuntimeTypeModel model = TypeModel.Create();
///var metaType = RuntimeTypeModel.Default.Add(typeof(SomeClass), false);
model.Add(typeof(SomeClass), false)
.Add(1, "SomeClassId")
.Add(2, "FEnum")
.Add(3, "AEnum")
.Add(4, "Thing")
.Add(5, "FirstAmount")
.Add(6, "SecondAmount")
.Add(7, "SomeDate")
.Add(8, "PrivateSet");
TypeModel compiled = model.Compile();
return compiled;
}
Ah, I understand the issue now; this line is problematic:
TypeModel compiled = model.Compile();
return compiled;
If you use Compile(), it creates a formal assembly (in memory) that has to obey the usual rules of assemblies, and in particular: member accessibility. This means it can't access your private sertter.
Instead, use:
model.CompileInPlace();
return model;
This performs a partial compilation, but continues using DynamicMethod. This cheeky little critter has options to spoof its way past accessibility rules (much like reflection can), so it can continue to use the private setter.
Note that the model is also compiled-in-place (at a more granular level) on as as-needed basis, so this call to CompileInPlace is not strictly necessary, but helps do everything up-front an in advance.
For completeness, there is an additional Compile(string,string) overload that can be used to produce a separate serialization dll on disk, that can be referenced and used without any meta-programming at runtime.
Yes protobuf-net can serialize private fields, and do so without attributes. I'm not at a PC, so this may need tweaking:
var metaType = RuntimeTypeModel.Default.Add(typeof(SomeClass), false);
// for each field in a known order
metaType.Add(fieldName, someUniqueTag);
In attribute-driven usage, there is also ImplicitFields.AllFields which would automatically configure it for the usage you intend, but I haven't yet added an ImplicitFields helper method to MetaType. I will add that to my list!
Note: tag (=field) numbers are important to protobuf and it must be possible to reproduce the same number mappings when you deserialize.
Another option you might want to consider is (de)serialization callbacks, which allow you to know that it is currently serializing/deserializing (via before/after method invokes). This can be another way of disabling side-effects for an interval such as deserialization.

Categories