I have a XML File and I want to read the data and assign it to a string array, so Operative would be assign to 1 array and JobLocation to another
<Demo>
<JOBOperatives>
<Operative>
<Clock>aaaa</Clock>
<Name>aaaaa</Name>
<MobileNumber>00000000010</MobileNumber>
<OperativeTrade>3</OperativeTrade>
<OperativeTicket>1</OperativeTicket>
</Operative>
</JOBOperatives>
<JobLocation>
<UPRN>aaa</UPRN>
<Address1>aaaa</Address1>
<Address2>aaaa</Address2>
<Address3>aaaa</Address3>
<Address4>aaa</Address4>
<Address5>aa</Address5>
<PostCode>JR4 4ED</PostCode>
</JobLocation>
I take it you mean where each property from the xml is it's own element in the array?
That doesn't seem like a very good data structure, especially as xml schema definitions allow for the items to arrive in any order; your expected indexes could get all screwed up. A strongly-typed object seems more appropriate and is well supported in .Net. At very least you should use a dictionary, so the keys are preserved.
In this case the number of items in each tree is very small and you could end up with many of them, so a dictionary is probably not the best choice. You could do objects, but that would be a lot of extra code just to set it up and I get the impression the xml may come from different sources and be different based on the source (or something where the structure could change regularly, hence the initial desire for loosely-typed validation).
Ultimately your destination is a database, so I think in this case I'll show you an example using a dataset:
string xml = GetXmlString(); // <Demo><JobOperatives><Operative><Clock>aaaa</Clock>...
StringReader sr = new StringReader(xml);
DataSet ds = new DataSet();
ds.ReadXml(sr);
Play around with that: look in the dataset's .Tables collection, at each table's .TableName property, .Columns collection, and .Rows collection, and at each columns .ColumnName and .DataType properties.
The OuterXml property of the XmlNode class might help you here.
Related
I'm testing stored SQL procedures in C#. The procs return the datatype SqlDataReader and I want to write the whole thing to an XML file to compare later. Nothing I've read has provided a very simple solution. Is there a way to do this without looping through all the data in the stream? I don't know much about SQL, so I'm not sure exactly what I'm working with here.
The XML produced by DataSet, DataTable and its ilk leaves something to be desired from the point of view of humans reading it. I'd roll my own.
A SqlDataReader (and it doesn't matter whether its returning data from a stored procedure or a plain-text SQL query), returns 0 to many result sets. Each such result set has
a schema that describes the columns being returned in each row, and
the result set itself, consisting of zero or more rows.
Each row, is essentially an array of 1 or more columns, with each cell containing the value for the column with that ordinal position in the row.
each such column has certain properties, some from the schema, such as name, ordinal type, nullability, etc.
Finally, the column value within a row, is an object of the type corresponding to the SQL Server data type of the column in the result...or DbNull.Value if the column is null.
The basic loop is pretty straightforward (lots of examples in MSDN on how to do it.) And while it might be a bit of work to write it in the first place, once written, it's usable across the board, so it's a one-time hit. I would suggest doing something like this:
Determine what you want the XML to look like. Assuming your intent is to be able to diff the results from time to time, I'd probably go with something that looks like this (since I like to keep things terse and avoid redundancy):
<stored-procedure-results>
<name> dbo.some-stored-procedure-name </name>
<result-sets>
<result-set>
<column-schema column-count="N">
<column ordinal="0...N-1" name="column-name-or-null-if-column-is-unnamed-or-not-unique" data-type=".net-data-type" nullable="true|false" />
...
</schema>
<rows>
<row>
<column ordinal="0..N-1" value="..." />
...
<row/>
...
</rows>
</result-set>
...
</result-sets>
</stored-procedure-results>
Build POCO model classes to contain the data. Attribute them with XML serialization attributes to get the markup you want. From the above XML sample, these classes won't be all that complex. You'll probably want to represent column values as strings rather than native data types.
Build a mapper that will run the data reader and construct your model.
Then it's a couple of dozen lines of code to construct the XML serializer of choice and spit out nicely formatted XML.
Notes:
For QA purposes, you might want to capture the parameters, if any, that were passed to the query, along with the query itself, possibly, the date/time of the run.
There are a few oddball cases where the results set model I describe can get...wonky. For example, a select statement using compute by has to get handled somewhat differently. In my experience, it's pretty safe to ignore that sort of edge case, since you're unlikely to encounter queries like that in the wild.
Think about how you represent null in the XML: null strings are not the same as empty strings.
Try this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"C:\temp\test.xml";
static void Main(string[] args)
{
string connstr = "Enter your connection string here";
string SQL = "Enter your SQL Here";
SqlDataAdapter adapter = new SqlDataAdapter(SQL, connstr);
SqlCommand cmd = adapter.SelectCommand;
cmd.Parameters.Add("abc", SqlDbType.VarChar);
adapter.SelectCommand.ExecuteNonQuery();
DataSet ds = new DataSet();
adapter.Fill(ds);
ds.WriteXml(FILENAME, XmlWriteMode.WriteSchema);
}
}
}
I see the main issue is how to test complicated stored procedures before releases, not writing an XML from SQLDataAdapter which can be very simple. Row by row, column by column.
You have a test database which does not contain static data and you store somehow different versions of the stored procedure.
A simple setup would be to run the (let's say 5) versions of a stored procedure you have, run them against the same
database content, store the xmls to a folder and compare them. I would use for example a different folder for each run and have a timestamp to distinguish between them for example. I would not spent too much on how the xmls are written and in order to detect if they are different you end up even using String.Compare(fileStream1.ReadToEnd(), fileStream2.ReadToEnd()). If the result is too large, then something more elaborated.
If there are differences between 2 xmls, then you can look at them with a text compare tool. ...For more complicated stored procedures with multiple joins, the most common difference will likely be the size of the xmls\ the number of rows returned, not the value of a field.
In production, the content of the database is not static, so doing this type of test would not make sense.
When serializing SqlDataReader using the built-in methods WriteXml in DataTable or DataSet as described in the accepted answer, and the data contains geography data, the geography data are lost and can't be restored latter.
For more details read Datatable with SqlGeography column can't be serialized to xml correctly with loss of Lat,Long and other elements
There is a workaround solution to save to xml provided by #dbc without loss of data and save to xml using the same built-in methods WriteXml. Try it online
I'm a PHP programmer, and I'm trying to understand some code which I think is ASP.NET. This is also my first foray into XML. I don't have access to a Windows box to test on.
I need to produce XML output that third-party code can use. The third party wants to use our data instead of the data source they are currently using. I don't want to replicate the current XML structure exactly because it doesn't map well to our data.
The structure of the current XML is very flat. There are only a few nested elements and the third party doesn't make use of any of them. The third party does have a sub-contracted programmer, but he is very busy. Also, I want to understand, for myself, how this works.
This is an excerpt from a plugin for a custom CMS:
Dim obj_set As New Data.DataSet()
Using obj_reader As New System.Xml.XmlTextReader("http://www.example.com/xml_output.php")
obj_set.ReadXml(obj_reader)
End Using
Dim obj_view As Data.DataView = obj_set.Tables("profile").DefaultView
obj_view.Sort = "cname"
Dim obj_data As Data.DataTable = obj_view.ToTable()
So from what I have gathered so far, this code
reads the XML file into a DataSet
sorts the profile table by cname
creates a new DataTable from the sorted view
There is other code that stores the new table to, and retrieves it from, cache. Then there is code that loops through the table rows and maps the column names to template variables.
Sample excerpt of current XML structure:
<profiles>
<profile>
<cname>ABC Corporation</cname>
<fname>John</fname>
<lname>Smith</lname>
<sector>Widgets</sector>
<subsectors>
<subsector>Basic Widgets</subsector>
<subsector>Fancy Widgets</subsector>
</subsectors>
</profile>
</profiles>
So what happens to the subsectors data? Does the reader create a separate table for it? If so, how are the tables related?
Our data includes multiple contacts per company. I could just create multiple elements at the top level fname1, fname2, fname3 to keep the flat structure. But I was thinking a nested structure makes sense for this kind of data. The problem is that I don't understand if such a structural change is compatible with the plugin code.
What kinds of changes would need to be made to the plugin code to make use of nested elements?
I was stumped on this myself, and I don't know if you still are, but for reference to others here's what I found.
You are right in assuming that the reader creates a separate table for it. Being that a DataSet can hold multiple tables, each "level" of elements gets its own table. However, any nested elements that have nested elements of their own will get their own table. Essentially, it keeps creating tables until it reaches the bottom of the xml tree. If an element has no children, it gets added as a cell in the data table.
In your case,
dataSet.Tables[0] will hold the top level nodes (all the <.profiles>). But since the nested element <.profile> has elements of its own, Tables[0] will likely only have one row. The next level deeper, dataSet.Tables[1] will hold all <-profile> nodes. Although since <.subsectors> has sub-element <.subsector>, it will not be in Tables[1], but rather in Tables[2] which goes yet level deeper.
I know it has been a while since this was asked, but hopefully this will be helpful.
The question:
Do you guys know if there is any way that I can put an object in the header of a DataTable column, instead of an integer or a string?
Further explanation:
I'm writing a library that, in some moment, will read data from different meteorological stations. The data I'll read will be, for example, temperature, wind speed, atmospheric pressure, etc. These values can be read in different units (km/h, mph, celsius, fahrenheit) and the information about these units will be in a separate source, not together with the data itself. I'll be reading a XML file that will contain all the information about this datafile and, what I wanted to do is create an object with different attributes and use this object as the header of each column of the DataTable. A bit complicated explanation but I think that I was clear enough.
Do you think that it is possible using native .NET types or, if I wanted to do exactly this way I'd have to create my own table class?
Thank you all!
There is a DataColumn.ExtendedProperties collection, which works like a dictionary and can hold any objects.
So every DataTable column can have an object associated with it, which have description of type, units and any other info.
This code
XmlDataDocument xmlDataDocument = new XmlDataDocument(ds);
does not work for me, because the node names are derived from the columns' encoded ColumnName property and will look like "last_x20_name", for instance. This I cannot use in the resulting Excel spreadsheet. In order to treat the column names to make them something more friendly, I need to generate the XML myself.
I like LINQ to XML, and one of the responses to this question contained the following snippets:
XDocument doc = new XDocument(new XDeclaration("1.0","UTF-8","yes"),
new XElement("products", from p in collection
select new XElement("product",
new XAttribute("guid", p.ProductId),
new XAttribute("title", p.Title),
new XAttribute("version", p.Version))));
The entire goal is to dynamically derive the column names from the dataset, so hardcoding them is not an option. Can this be done with Linq and without making the code much longer?
It ought to be possible.
In order to use your Dataset as a source you need Linq-to-Dataset.
Then you would need a nested query
// untested
var data = new XElement("products",
from row in ds.Table["ProductsTable"].Rows.AsEnumerable()
select new XElement("product",
from column in ds.Table["ProductsTable"].Columns // not sure about this
select new XElement(colum.Fieldname, rows[colum.Fieldname])
) );
I appreciate the answers, but I had to abandon this approach altogether. I did manage to produce the XML that I wanted (albeit not with Linq), but of course there is a reason why the default implementation of the XmlDataDocument constructor uses the EncodedColumnName - namely that special characters are not allowed in element names in XML. But since I wanted to use the XML to convert what used to be a simple CSV file to the XML Spreadsheet format using XSLT (customer complains about losing leading 0's in ZIP codes etc when loading the original CSV into Excel), I had to look into ways that preserve the data in Excel.
But the ultimate goal of this is to produce a CSV file for upload to the payroll processor, and they mandate the column names to be something that is not XML-compliant (e.g. "File #"). The data is reviewed by humans before the upload, and they use Excel.
I resorted to hard-coding the column names in the XSLT after all.
I want to create a simple class that is similar to a datatable, but without the overhead.
So loading the object with a sqldatareader, and then return this custom datatable-like object that will give me access to the rows and columns like:
myObject[rowID]["columnname"]
How would you go about creating such an object?
I don't want any built in methods/behavior for this object except for accessing the rows and columns of the data.
Update:
I don't want a datable, I want something much leaner (plus I want to learn how to create such an object).
This type of structure can be easily created with a type signature of:
List<Dictionary<string, object>>
This will allow access as you specify and should be pretty easy to populate.
You can always create an object that inherits from List < Dictionary < string, object > > and implements a constructor that takes a SqlDataReader. This constructor should create a enw dictionary for each row, and insert a new entry into the dictionary for each column, using the column name as the key.
I think you're missing something about how .Net works. The extra overhead involved in a DataTable is not significant. Can you point to a specific performance problem in existing code that you believe is caused by a datatable? Perhaps we can help correct that in a more elegant way.
Perhaps the specific thing you're asking about is how to use the convenient ["whatever"] indexing syntax in your own table object.
If so, I suggest you refer to this MSDN page on indexers.
Dictionary<int,object[]> would be better than List<Dictionary<string, object>>. You don't really need a dictionary for each row, since column names are the same for all rows. And if you want to have it lightweight, you should use column indexes instead of names.
So if you have a column "Name" that is a 3rd column, to get its value "Name" from a row ID 10, the code would be:
object val = table[10][2];
Another option is SortedList<int,object[]>... depending on the way you access the data (forward only or random access).
You could also use MultiDictionary<int,object> from PowerCollections.
From the memory usage perspective, I think the best option would be to use a single dimension array with some slack capacity. So after each, say 100, rows, you would create a new array, copy the old contents to it and leave 100 empty rows at the end. But you would have to keep some sort of an index when you delete a row, so that it is marked as deleted without resizing the array.
Isn't this a DataSet/DataTable? Maybe I didn't get the question.
Also, what is the programming language?