Evaluate XML Find Bad Character?

Evaluate XML Find Bad Character? - c#

I have some XML coming from a remote (Java) web service into my c# console app, it is written to a Microsoft SQL Server XML column via a stored procedure. Sometimes the XML has a bad character somewhere and SQL Server is not giving enough information about where the problem exactly is.
I would like to evaluate the XML before the database-write happens, and of course I have no XSD.
What is a good way to evaluate every part of the XML for "regular conformance" before writing to the database? I am using .NET 4.0, C#.
Thanks.

If you have the possibility, I would recommend doing XML Schema Validation on all XML dat that you retrieve in 3rd party services.
Xml Schema validation will ensure that every element of an XML document is valid against it's defined contract.
You should consider making the Xml Schema Validation optional, as it introduces an overhead, that you might want to prevent in production environments. But in development and testing environments it can be quite beneficial to get detail validation error information from all your 3rd party services.

You can try sanitizing your xml which might help a little bit: http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/
That link really only helps filter invalid characters, most of the time, that will not be enough nor helpful (however I still recommended filtering unknown characters for security).
I think for checking if tags are valid or not, you can use a try catch. If the try catch is returning problems on line 1 then, the problem could be that you don't have a root element in your xml? Or it could be that your encoding is incorrect for the xml document. They should return different errors.

Related

Is CDATA required to validate/deserialize against a schema if a string element contains valid XML

I am hosting a C# WCF SOAP which service that has a call which contains the following element
<element name="SomeXmlElement" type="xsd:string" minOccurs="0"/>
The WSDL in question is provided by the client.
The content of this element is valid XML which in general will conform to a different XSD, but for our purposes is arbitrary valid XML
If the data is passed "raw" which is the way the client prefers to send it, SomeXmlElement is null after being deserialized
<SomeXmlElement><SomeArbitraryXml/></SomeXmlElement>
If I have them wrap it in a CDATA it works correctly, but the customer/client complains that they don't have to do that for other implementations, and it causes compatability issues
<SomeXmlElement><![CDATA[<SomeArbitraryXml/>]]></SomeXmlElement>
My understanding is that there are only a few choices to have this deserialize correctly.
wrap in CDATA (nested cdata ugh)
Change the schema to use a complex type instead of string, where the complex type references the other XSD schema
xs:any in the schema (what would this deserialize as?)
The customer insists that this is just a deficiency in my code/.Net and that this should deserialize/process fine in the raw format.
Rolling my own deserializer would be possible, or just loading into a DOM and accessing the InnerXml property or whatnot, but thats a lot of work to override default expected behavior imo.
Thoughts? Suggestions? Am I interpreting the XML specs correctly? Are there any choices that don't require schema changes or rewriting lots of WCF default behavior?

Your client has no right to complain. They're publishing an interface and then telling you out-of-band to ignore parts of the interface specification.
If they want to allow arbitrary XML under SomeXmlElement, then they should use xsd:any.
If they want to restrict the XML under SomeXmlElement to that given by another XSD, then they should import or include the other XSD and explicitly reference the allowed elements.
But they should not specify that SomeXmlElement contains an xsd:string and then expect its content model to really be XML. You're the one who has the right to complain.
That their implementations are 10 years old or Java based is irrelevant. XML and XSD specifications go back that far and work well in Java.
So, besides looking for validation here, you probably want advice beyond telling your client to fix their broken interface definition...
Consider rewriting their XSD to be what they really mean, and hold yourself and your code to a higher standard (an actual standard, that is). Anything else would be a hack upon a hack and make you an accessory to their crime.

Mapping XML to Unrelated Objects

I'm designing a process to get XML files from our client and load them to our database, creating an order on our side.
The snag is, and isn't there always one?, the client's XML really doesn't resemble the business objects we use to load data to our database.
So I have to design a way to get the format they specify into our custom objects.
I'm considering creating "on the fly" custom objects FROM their XML and then coming up with a "map" to translate their objects into ours. That's where my head is at right now.
Essentially I don't want to write another data-load process that supports their data, I just want to get their data into our format.
I know this is basically a design question so I'm just throwing out my idea to see if it rings true with anyone else. Or if someone has done this and has a suggestion, I'm very open to hearing it. Thanks!

From your tag, c# and xml, I would generate an event upon file reception (OS level) that triggers the small app you will have to make. Structure wise, I would go with CompanyName.Object1.
Read up on XDocument for parsing and what not. XElement and its Attributes.
Bottom line, it looks like a CRM kind of implementation and from my implementation experience, it's the longuest process: parsing of incoming data. You'll have to be thorough with your clients and have them write specific..
<Nodes name="SpecificName">
Nodes = LocalName
name = Attribute("name")
Good luck.

Sending XML format messages through TCP in C#

I have a C# TCP chat program. Currently, I have formatted the messages sent using strings i.e, a "login" message starts with a "3" then followed by a "U:" then the username etc.
I think this method is very crude in a way that it's not really readable and not standardized. In early research, I have read that I can format my messages using XML but I dont know where to start exactly. Do I just make a string builder and append it tags like .append("<Login>"+message)?

The most common approach for dealing with a problem like this is to use serialization. Serialization is the process of converting an in-memory object into a format that can be easily streamed "over the wire," and de-serialization is the reverse process of converting the serialized format back into an object. .NET has good support for XML and binary serialization out-of-the-box, but there are other ways to implement this. Here's a link to get you started:
http://msdn.microsoft.com/en-us/library/7ay27kt9(VS.71).aspx

You can send whatever you like over the connection - as long as it's just for your program it doesn't really matter what you choose. Xml might give you some benefits as it lends itself to some kind of more structured messages and there are many classes and tools and knowledge around on the net regarding XML. JSon format might be another option - it will make it potentially easier creating a JavaScript client for it in case you want to go web based.

Unless there is a reqirement that 3rd parties be able to read these messages then I would probably favour binary serialisation, as it has a more compact format.
That said, I'd probably just use WCF rather than uisng TCP directly.
If you want to know more about XML serialisation then the most commonly used methods are:
Generating a stronly typed C# object decorated with attributes to control XML serialisation using XSD.exe, and then using XmlSerializer to serialise and deserialise XML. (recommended)
Using the XmlDocument class
You can write our XML yourself as a string, but its better to use the serialisation methods made available in the .Net framework as it makes things considerably easier and reduces the chance that you will make a mistake and inadvertantly start working with invalid xml.

What is the best way to read and write cXML documents in C#?

I know this is a vague open ended question. I'm hoping to get some general direction.
I need to add cXML punchout to an ASP.NET C# site / application. This is replacing something that I wrote years ago in ColdFusion.
I'm a reasonably experienced C# developer but I haven't done much with XML. There seems to be lots of different options for processing XML in .NET.
Here's the open ended question: Assuming that I have an XML document in some form, eg a file or a string, what is the best way to read it into my code? I want to get the data and then query databases etc. The cXML document size and our traffic volumes are easily small enough so that loading the a cXML document into memory is not a problem.
Should I:
1) Manually build classes based on the dtd and use the XML Serializer?
2) Use a tool to generate classes. There are sample cXML files downloadable from Ariba.com.
I tried xsd.exe to generate an xsd and then xsd.exe /c to generate classes. When I try to deserialize I get errors because there seems to be "confusion" around whether some elements should be single values or arrays.
I tried the CodeXS online tool but that gives errors in it's log and errors if I try to deserialize a sample document.
2) Create a dataset and ReadXml()?
3) Create a typed dataset and ReadXml()?
4) Use Linq to XML. I often use Linq to Objects so I'm familiar with Linq in general but I'm struggling to see what it gives me in this situation.
5) Some other means.
I guess I need to improve my understanding of XML in general but even so ... am I missing some obvious way of doing this? In the old ColdFusion site I found a free component ("tag") which basically ignored any schema and read the XML into a "structure" which is essentially a series of nested hash tables which was then easy to read in code. That was probably quite sloppy but it worked.
I also need to generate XML files from my C# objects. Maybe Linq to XML will be good for that. I could start with a default "template" document and manipulate it before saving.
Thanks for any pointers ...

If you need to generate arbitrary XML in an exact format, you should generate it manually using LINQ-to-XML.

Accept Data With A Webservice C# .NET 3.5

I was curious as to how I would accomplish the following with webservices:
Authenticate a user.
Accept a CSV or XML file.
Process the file and put it into an SQL database.
Someone mentioned in a previous post that I should use a webservice. I can't seem to find any resources that explain how to begin something like this. All the simple examples seem to just show how you can serve XML given a query.
I want to know how to accept stuff and also, how this would differ from an upload control on an authenticated webpage. I don't think I really understand webservices and their benefits.
How would the user sending the XML file interface with my webservice?

If you want to do large file uploads, then a web service may cause some issues, because some web service platforms (including .NET) have default settings limiting the size of the data.
The advantage of a web service is that it does all the mapping of the request to/from XML, so you can return a .NET type, and don't need to muck around with processing request parameters.
However, you may have to put more effort into maintaining state, etc.
For logins, what you can do is have a login function that returns some kind of identifier which can be used to verify the user as valid for that session - one way of doing this being to have columns in your user table for lastActive and sessionGUID, and when they log in you generate a new sessionGUID and return that, and on that and any other valid request they make you update the lastActive, and if there is a request too long after the lastActive time, then you refuse the request... there's any number of similar ways of doing that, but hopefully you get the general idea - you don't want to require the login details each time, but you can generate a temporary identifier and use that.
For accepting an XML file, you'd want to use something like XDocument or XMLReader to read the data that you receive. Assuming you're not talking about the parsing of the XML format that the web service itself uses, you're most likely to be receiving a string and then pushing that into an XDocument and then using the standard XDocument functions to process the data. If the document would be large, then XMLReader should be more efficient.
For reading a CSV file, there are some (free and non-free) CSV readers which help avoid some of the issues you can have, giving you a nice API for processing a string or strings of CSV data. If you know that the source data doesn't have non-structural commas, though, you can just take the string and split it by commas, and then strip any quotes around the values. That tends to get flaky quite fast if there might be addresses or other data that could have commas in, though.
The XML should be able to be passed via the web service just fine - it should be encoded and decoded, so it's then compliant strings being passed out.
As for storing it in a database, there's any number of ways to do that - you can use ADO.NET to store things in a database without further libraries, you can create a database structure in Visual Studio or SQL Server Management Studio and then use SQLMetal or Linq to SQL to generate classes for saving the data, you can use a 3rd party database mapping tool (such as Castle ActiveRecord), or whatever. It depends what you know and how much you're willing to learn. That's really separate to the web service. When you define a web service in .NET you effectively define standard functions with attributes marking them as web services, so the database side is standard .NET database stuff that's not necessarily any different to what you'd do for an ASP.NET website, or even a desktop program.

A web service is not really appropriate for sending an arbitrary file. It can be done, but if that's your only reason for creating the web service, you might as well just stick to HTTP.
If the file has a specific format or specific contents then you might want to create a web service for that. The purpose of an ASMX or WCF web service is to provide discoverability and strong typing to the data (among other things, but I'm sticking to the basics for the moment). From the perspective of the client, instead of trying to create some ugly XML or CSV blob and chuck it over HTTP, you use an actual service proxy with POCO classes:
MyService service = new MyService();
MyData data = new MyData() { ID = 3, Name = "Test", Date = DateTime.Now };
service.Save(data);
Visual Studio (and equivalent tools in Java and some other platforms) will take care of generating the proxy for you, so really all you have to do is write the above code.
But if you're just trying to send any data, this won't get you anywhere, because you can't generate a proxy for raw XML. Well, you can, but it would just be an XmlDocument and that accomplishes nothing in terms of usability, type safety or discoverability.
Don't get confused by the "XML" in "XML Web Service". It's not a tool for sending around vanilla XML. Rather, XML refers to the format of the message, as it is transmitted over the wire, as opposed to a POST string (id=3&name=Test&date=2010-01-24) or a binary RPC call as used in .NET Remoting.
In terms of authentication, if you do decide to use WCF, you just have to use the right binding. A WCF proxy is normally configured by default to use wsHttpBinding, which uses integrated Windows authentication to secure the messages. Again, assuming you use Visual Studio, this is all done pretty much automatically for you unless you decide to change the defaults.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.