Read from XML write another XML

Read from XML write another XML - c#

I want to do a software who read something from xml and write another thing in other xml, example:
From here I want the software to read all values between <>[value]
<quest>
<id>1</id>
<reward_exp1>1848</reward_exp1>
<reward_gold1>560</reward_gold1>
</quest>
And write something else like this
<quest id="1"><reward gold="560" exp="184" /></quest>
Can I find a tutorial or something?

One way to do this would be to use linq to xml.
Here are some links to get you started.
http://msdn.microsoft.com/en-us/library/bb387044.aspx
http://www.dreamincode.net/forums/topic/218979-linq-to-xml/
There are other options e.g. xslt transform, xml dom

What you're looking to do is called XML Transformation, it's a common problem with many different ways to approach a solution.
If you're new to coding, you may want to look at XSLTs, although the XSLT 'language' can be a bit tricky for complex problems, I suspect it can handle yours with minimal effort and would only take a few lines of XSLT 'code' and a few lines of which ever language you want to use to run the XSLT (e.g. Java, C#, VB etc).

Related

How/Can I use linq to xml to query huge xml files with reasonable memory consumption?

I've not done much with linq to xml, but all the examples I've seen load the entire XML document into memory.
What if the XML file is, say, 8GB, and you really don't have the option?
My first thought is to use the XElement.Load Method (TextReader) in combination with an instance of the FileStream Class.
QUESTION: will this work, and is this the right way to approach the problem of searching a very large XML file?
Note: high performance isn't required.. i'm trying to get linq to xml to basically do the work of the program i could write that loops through every line of my big file and gathers up, but since linq is "loop centric" I'd expect this to be possible....

Using XElement.Load will load the whole file into the memory. Instead, use XmlReader with the XNode.ReadFrom function, where you can selectively load notes found by XmlReader with XElement for further processing, if you need to. MSDN has a very good example doing just that: http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom.aspx
If you just need to search the xml document, XmlReader alone will suffice and will not load the whole document into the memory.

Gabriel,
Dude, this isn't exactly answering your ACTUAL question (How to read big xml docs using linq) but you might want to checkout my old question What's the best way to parse big XML documents in C-Sharp. The last "answer" (timewise) was a "note to self" on what ACTUALLY WORKED. It turns out that a hybrid document-XmlReader & doclet-XmlSerializer is fast (enough) AND flexible.
BUT note that I was dealing with docs upto only 150MB. If you REALLY have to handle docs as big as 8GB? then I guess you're likely to encounter all sorts of problems; including issues with the O/S's LARGE_FILE (>2GB) handling... in which case I strongly suggest you keep things as-primitive-as-possible... and XmlReader is as primitive as possible (and THE fastest according to my testing) XML-parser available in the Microsoft namespace.
Also: I've just noticed a belated comment in my old thread suggesting that I check out VTD-XML... I had a quick look at it just now... It "looks promising", even if the author seems to have contracted a terminal case of FIGJAM. He claims it'll handle docs of upto 256GB; to which I reply "Yeah, have you TESTED it? In WHAT environment?" It sounds like it should work though... I've used this same technique to implement "hyperlinks" in a textual help-system; back before HTML.
Anyway good luck with this, and your overall project. Cheers. Keith.

I realize that this answer might be considered non-responsive and possibly annoying, but I would say that if you have an XML file which is 8GB, then at least some of what you are trying to do in XML should be done by the file system or database.
If you have huge chunks of text in that file, you could store them as individual files and store the metadata and the filenames separately. If you don't, you must have many levels of structured data, probably with a lot of repetition of the structures. If you can decide what is considered an individual 'record' which can be stored as a smaller XML file or in a column of a database, then you can structure your database based on the levels of nesting above that. XML is great for small and dirty, it's also good for quite unstructured data since it is self-structuring. But if you have 8GB of data which you are going to do something meaningful with, you must (usually) be able to count on some predictable structure somewhere in it.
Storing XML (or JSON) in a database, and querying and searching both for XML records, and within the XML is well supported nowadays both by SQL stuff and by the NoSQL paradigm.
Of course you might not have the choice of not using XML files this big, or you might have some situation where they are really the best solution. But for some people reading this it could be helpful to look at this alternative.

What is the best way to read and write cXML documents in C#?

I know this is a vague open ended question. I'm hoping to get some general direction.
I need to add cXML punchout to an ASP.NET C# site / application. This is replacing something that I wrote years ago in ColdFusion.
I'm a reasonably experienced C# developer but I haven't done much with XML. There seems to be lots of different options for processing XML in .NET.
Here's the open ended question: Assuming that I have an XML document in some form, eg a file or a string, what is the best way to read it into my code? I want to get the data and then query databases etc. The cXML document size and our traffic volumes are easily small enough so that loading the a cXML document into memory is not a problem.
Should I:
1) Manually build classes based on the dtd and use the XML Serializer?
2) Use a tool to generate classes. There are sample cXML files downloadable from Ariba.com.
I tried xsd.exe to generate an xsd and then xsd.exe /c to generate classes. When I try to deserialize I get errors because there seems to be "confusion" around whether some elements should be single values or arrays.
I tried the CodeXS online tool but that gives errors in it's log and errors if I try to deserialize a sample document.
2) Create a dataset and ReadXml()?
3) Create a typed dataset and ReadXml()?
4) Use Linq to XML. I often use Linq to Objects so I'm familiar with Linq in general but I'm struggling to see what it gives me in this situation.
5) Some other means.
I guess I need to improve my understanding of XML in general but even so ... am I missing some obvious way of doing this? In the old ColdFusion site I found a free component ("tag") which basically ignored any schema and read the XML into a "structure" which is essentially a series of nested hash tables which was then easy to read in code. That was probably quite sloppy but it worked.
I also need to generate XML files from my C# objects. Maybe Linq to XML will be good for that. I could start with a default "template" document and manipulate it before saving.
Thanks for any pointers ...

If you need to generate arbitrary XML in an exact format, you should generate it manually using LINQ-to-XML.

Why would I choose XSLT or XQuery over the other for html document generation?

I was researching alternatives to using Microsoft's XslCompiledTransform and everything seemed to point towards Saxon primarily and secondly XQSharp. As I started to look at documentation for Saxon I saw that XQuery could do the equivalent of my XSLTs that are no where near as terse as XQuery's markup is.
What advantages do XSLTs offer over XQuery to deserve the much more detailed syntax?
Would it be the templating functionality that can be created?

In general, there's a lot of overlap; both are rooted in an underlying XPath implementation. As for whether to use XSLT or XQuery, the proof is in the pudding: XSLT is better at transforms, and XQuery is better at queries.
So, use XQuery for:
smaller, less complicated things
highly structured data (XQuery is very strongly typed and prone to complaining)
data from a database
extracting a small piece of information
Conversely, use XSLT for:
copying a document with only incremental changes
larger, more complicated things
loosely (or badly) structured data

XSLT is designed to take one xml document and transform it into something else, e.g. csv, html or a different xml format, e.g. xhtml.
XQuery is designed to extract information from one or more xml documents, and combine the result into a new xml document.
Both XQuery and XSLT rely heavily on XPath. If your output is based on one input xml document and **one output xml document, the two can pretty much be interchanged.
The FLWR syntax of XQuery is quite intuitive, if you have an SQL back-ground, IMO XSLT is the more powerful language when dealing with one input/one output situations, especially if the output will not be xml.
Personally I find the xml based syntax and the declarative nature of XSLT slightly difficult to read and maintain.
It really boils down to choice, although using XQuery for "simple" formatting is slightly unusual. If your input is based on more than one xml document, you are pretty much stuck with XQuery, if your output is not xml based, you are pretty much stuck with XSLT.

The biggest reason to move away from XslCompiledTransform is that it is merely an XSLT 1.0 processor.
The majority of the functionality of XSLT 2.0 and XQuery 1.0 overlaps, and for the most part they are similar languages with different syntax (a little like C# and VB).
XSLT is a lot more verbose, but its templating features add a lot of functionality that can be fairly cumbersome to replicate in XQuery, particularly making small changes to node trees. The main features that are particularly cumbersome in XQuery are things like <xsl:template match="..." /> and <xsl:copy>...</xsl:copy>.
XQuery has a much cleaner syntax (IMHO) as long as the templating features are not needed, and I find it is a lot better for more advanced computations, and retrieving data from large documents.
XQuery is often viewed as a database language. Whilst a lot of databases use it this way, it is not the only use for it. Some implementations of the language in databases are very restricted. Another commentor claims that XQuery is "very strongly typed". Unless you are using the static typing feature, XQuery is no more strongly typed than XSLT. Whilst some database implementations force you to use the static typing features, most other implementations are moving away from this.
He also claims that XQuery is not very good for "larger, more complicated things". I would have argued exactly the opposite. The conciseness and flavour of the syntax makes it far easier to write complicated functions and computations in XQuery. I have written a raytracer in XQuery, which feels really quite natural; I think it would be a lot harder (certainly more verbose) to write something this computationally complex in XSLT.
In summary:
XSLT is more natural for transformation. It is better if you have a document with roughly the right structure and you want to transform the components, for example rendering an HTML version of an XML file.
XQuery is more natural for building a new XML document from various sources, or for changing the structure of a document.
Note that the overlap is rather large and there is often no "right" choice, but if you make the wrong choice then you to tend to find yourself working against the grain.

XSLT and XQuery do two different things. XSLT, as the name suggests, is used to transform data from one form into another (e.g. from XML to HTML). On the other hand, XQuery is a language used to find and extract certain XML nodes from an XML document, which can then be used for any number of purposes.
XSLT actually relies on the functionality of XQuery, take a look at the tutorials on www.w3schools.com; when used correctly, they are both very powerful technologies.

Building Interpreter Of a Document Format

I'm going to start the development of my own document format(like PDF, XPS, DOC, RTF...), but I want to know where I can read some tutorials, how-to's...? I don't want code, this is a project that I want to learn how to build it, not use the experience of someone other.
PS: I want to make it like a XML file:
[Command Argument="Define it" Argument2="Something"]
It's like PDF, but this syntax will be interpreted by a program that I will build it using C#, just like HTML and your browser ;)
Remember that my question is about the program that will interpret this code, but it's good to start with a tutorial of interpreting XML code ;)

I assume you're doing this for the sake of learning how to do it. If that's the case, it is a worthwhile venture and I understand.
You'll want to start out by learning LL parsers and grammars. That will help you interpret the document that has been read from a file into a document object model (DOM). From there you can create routines to manipulate or render that document tree.
Good luck!

I'm confused as to what you're asking, but if you need your own format like an XML file, why not just use XML to describe the format?
Edit: Okay, I think I understand now. If you're doing this for fun and for learning (which is great), then there are lots of approaches to take. In fact, it may even be better to not do any research, try to come up with a solution on your own and see if it works, what you need to do to make it better, etc.

Sounds like a good learning project and you've got some good pointers here already. I would just add that you should remember that there is a difference between a document file language and a document format.
Consider OOXML, it is a document format that is built on top of XML (what I'd describe as the file language). If your purpose is to learn about building your own document format then I'd highly recommend starting with XML so that you don't have to reinvent a language parser. This will let you focus on the concerns around building the format.
That said, good on you if you want to play around with creating your own language; just wanted to make sure you realized that they are different beasts.
Here are some links that will help you get started using XML in C#:
Xml Tutorial (video)
XML Document overview
Reading Xml data with an XmlReader
Writing Xml data with an XmlWriter

Far be it from me to forbid you from re-inventing the wheel for the sake of learning something new. Good for you for trying this out. However, if you are going to ask questions about how to do it you are going to need to specify your questions a little more.
Are you looking for help on:
Designing your framework / format
Planning your time / Estimating deadlines
Working with XML
Working with C#
Building a web-based C# application
Building a PC-based C# application
Other aspects of development entirely
There are many people here who want to help -- but the best answers are given to focused questions (not necessarily specific, but always focused.)

There are a couple of ways to approach this. One way would be to define the format of the file first, then use a parser-generator to crate C# code that can read that format. doing a Google search on "c# parser generator" will get you links to a number of different libraries you can use.
Alternatively, you could code your own parser, from scratch. This will be more work than using a parser generation tool, but might be more educational in the end.
The define-a-grammar approach may be total overkill for a simple format. Another way to approach the problem is to design the object tree that you'll use in-app first, then write serialization and de-serialization routines to save and load the contents from a file. The serialization interface in C# is pretty flexible, and you can serialize to binary or XML files easily.
I think it should be relatively straightforward to create your own serializer to create a file formatted however you like, but MSDN is not being my friend today, so I can't find the relevant documentation.

Converting XML between schemas - XSLT or Objects?

Given:
Two similar and complex Schemas, lets call them XmlA and XmlB.
We want to convert from XmlA to XmlB
Not all the information required to produce XmlB is contained withing XmlA (A database lookup will be required)
Can I use XSLT for this given that I'll need to reference additional data in the database? If so what are the arguments in favour of using XSLT rather than plain old object mapping and conversion? I'm thinking that the following criteria might influence this decision:
Performance/speed
Memory usage
Code reuse/comlpexity
The project will be C# based.
Thanks.

With C# you can always provide extension objects to XSLT transforms, so that's a non-issue.
It's hard to qualitatively say without having the schemas and XML to hand, but I imagine a compiled transform will be faster than object mapping since you'll have to do a fair amount of wheel reinventing.
Further, one of the huge benefits of XSLT is it's maintainability and portability. You'll be able to adapt the XSLT doc really quickly with schema changes, and on-the-fly without having to do any rebuilds and takedowns if you're monitoring the file.
Could go either way based on what you've given us though.

My question is how likely are the set-of-transformations to change?
If they won't change much, I favor doing it all in one body of source code -- here that would be C#. I would use XSD.exe (.NET XSD tool) generated serialization classes in conjunction with data layers for this kind of thing.
On the other hand, if the set-of-transformations are likely to change -- or perhaps need to be 'corrected' post installation -- then I would favor a combination of XSLT and C# extensions to XSLT. The Extension mechanism is straightforward, and if you use the XslCompiledTransform type the performance is quite good.

If the data isn't in the xml, then xslt will be a pain. You can provide additional documents with xsl:document(), or you can use xslt extension methods (but that is not well supported between vendors). So unless you are dead set on xslt, it doesn't sound like a good option on this occasion (although I'm a big fan of xslt when used correctly).
So: I would probably use regular imperative code - streaming (IEnumerable<T>) if possible. Of course, unless you have a lot of data, such nuances are moot.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.