Manipulating a file that is like XML - c#

I have a file that I want to read and manipulate. It is XML like but is not an actual XML file. It does reference a DTD however. What part of the .Net framework can I use to do the above? Will the XML API's work some how with this file?

Based on your reply to my comments, it sounds like if all that is different from a standard XML document is the lack of a header, the tools should work perfectly fine with that data. I would give System.Xml a try, and if that doesn't work, try prepending the header.

Related

Send Output direct to HTML or through XML first?

I am writing a parser to parse incoming text files. I have it to where it will parse everything accurately.
I have an option for it to output to text - this was done to check the accuracy of the parsing. I am currently implementing an option to write to a spreadsheet but it doesn't output everything yet.
I have a request to output as static HTML. Is it worth outputting to XML and then generating HTML from that?
I see C# has the XMLTransform class which looks like it would do what I need. Is using the XML designer in VS and writing the XSLT file easier than hand-coding all of the HTML output? I know Excel will import XML files, but it is a little messy and I don't get the formatting options I can get if I generate the .xls file directly
I would give you a qualified No.
It is generally not worth building XML then running it through an XSLT transformation to build HTML.
That said, I might consider such an option if I wanted to easily swap out transformations, such as if this is an app used by multiple clients and the generated HTML would be client dependent. Even then I'd investigate using a simple tokenized HTML template in which I just plugged in the data I wanted. However, if the transformation was sufficiently complex then, yes, I'd go the XSLT route.
The reason for the No is that by the conversion adds such a level of complexity that it is usually not worth the time involved.

How to convert a System.IO.Packaging.Package to HTML?

Microsoft Word interoperability classes will let you get at a property called WordOpenXML. This represents a package that will be stored - zipped up - in a .docx file and can be opened by Microsoft Word. However, is there a way to convert this Package to other formats, notably HTML?
I read in an answer to an old question that "Word 2007 has an API that you can use to convert to HTML. [...] You can find documentation around the API, but I remember that there is a convert to HTML function in the API." I'm not 100% sure which API that guy is talking about but perhaps it's System.IO.Packaging.Package or something similar. I can't seem to find any "convert to HTML function"; does anyone know how you can convert a Package format Word document into HTML?
The API in question is probably the Save method on the document; when a file type of HTML is chosen, Word transforms the document into HTML, and applies the appropriate styling.
Chances are, given that the docx format is XML, there is an XSLT transformation of some sort going on; this is just speculation, but it's not far-fetched, as XSLT is commonly used to create HTML from XML.
That said, what you are looking for probably does not reside in the Package class, nor should it. The Package class is used for creating packages of content, not with the transformation of that content.
However, there's nothing stopping you from providing the transformation of that content; you can get the XML that is the basis of the Word document and then apply your own XSLT which would produce the HTML that you want.

How to validate an XML document?

My C#/.NET application reads XML files that are manually edited by the users. The allowed elements and tags are described in the application's documentation. I'm using LINQ to extract data from the XML file.
Before extracting data from the XML file, I'd like to validate it to see if it has the expected structure. If not, it would be nice to have information about what is wrong so that I can give some feeback to the user.
What's the simplest way to do this in C#?
You can validate xml files against XSD.
First you have to create Xml Schema Definition file. See example
use XML Schema Definition Tool to create XSD from XMLfile
Use this code to validate input XML using corresponding XSD
Hope this will help...
EDIT
This article explains all possible ways to validate xml, using C#
How To Validate an XML Document by Using DTD, XDR, or XSD in Visual C# .NET
IMO best option is to use XSD.
Validating Input Xml Data Files

Generating text file from database

I have a requirement to hand-code an text file from data residing in a SQL table. Just wondering if there are any best practices here. Should I write it as an XMLDocument first and transform using XSL or just use Streamwriter and skip transformation altogether? The generated text file will be in EDIFACT format, so layout is very specific.
The normal thing to do is just write the EDIFACT data directly.
Creating it as an XMLDocument and transforming it to EDIFACT might be useful if there's a library already available to do the transformation. I say this because there's a lot of language support for XML output.
I can't see how XSL will help you here, but I've never had to output EDIFACT data.
http://www.stylusstudio.com/edi/XML_to_EDIFACT.html
This URL has an example XSLT for translating XML to EDIFACT which might solve your problem.

XML data storage database? free and opensource

I need a way to store data inside xml files and write to differant parts of the file, as well as add elements and structure to the xml document.
I need full control over the file names and xml documents, and it would be much easier if I could use some kind of SQL layer to read and write from the xml.
Just due to project constraints I am tied into using XML, but if possible would like a trust and tested open source solution for this.
Or should I be using out of box .net functionality for this?
you should be using out of box .net functionality.
the XML namespaces and Linq-to-XML will do this for you.

Categories