Combine Data from multiple DataSets - c#

I'm loading data from multiple xml files with different schemas into DataSets. I do have foreign key style relationships between the tables in each xml file but to date they're only enforced by code. I need to access data coming from multiple files and display it in a DataGridView.
Is there a way to merge the data from multiple files into a single DataSet?
Alternately can I write linq to dataset queries across multiple DataSets?

Perhaps the DataSet.Merge() method will help you out? You can simply load the files as you're currently doing, and merge them together.

Related

How to map elements in XML files to columns in SQL table?

I have multiple XML files whose data needs to be stored in one table in database. All the files should be mapped to one table.
But the elements names in XML files are different from the column names in the table. How to map the XML elements(from different files) to columns in the SQL table? How to perform this mapping and store the data in required columns? What is the optimal way of doing this task in C# when you have many XML files?
Also to note that every XML file will have different element names and format.

Handling multiple relational tables in data access

I have multiple relational tables in the following format.
enter image description here
I'm trying to query that data in an efficient way in .Net so that I can perform transforms on the data (array to object) and insert into DocumentDb. Essentially doing some ETL work, but because the data has to be transformed a certain way and going into DocumentDb, we are using .Net.
We are inserting into one document collection data from all relational tables, so there will be lots of
// if still in same profile record, insert more relational data for each relational table.
We are trying to avoid cartesianing(sp?) the data so that one profile doesn't have 100 records or more. We were thinking about using some of the Oracle methods to convert child records to a Json Array, but can't upgrade the Oracle system to the release that allows for that feature. Another thought was to use create an xml document, but that feels pretty wrong.
Any ideas on essentially the best practice for handling ETL work within .Net? Most of the web sites I've worked on involve only pulling from a few tables at best and a lot are 1:1 relationships.
You could use EntityFramework for this. Simply create a DBContext, and all the POCO classes that represent your tables with their relationships. Then execute the necessary query on your DataSet and you'll have all the data mapped to your objects, which then you can serialize in any way you want.

How to use csv files hierarchicaly as database?

I use csv files as database in seperate processes. I only store all data or read all data in my datagrid in singular relationship. Every field in every txt file is one and only number starting from zero.
//While reaching countries, i read allcountries.txt,
//while reaching cities, i read allcities.txt
//while reaching places i read allplaces.txt.
but one country has many cities and one city has many places. Yet, i don't use any relationship. I want to use and i know there is some needs for this. How can i reach data for reading and writing by adding all text files one extra column?
And is it possible to reach data without sql queries?
Text files don't have any mechanism for SELECTs or JOINs. You'll be at a pretty steep disadvantage.
However, LINQ gives you the ability to search through object collections in SQL-like ways, and you can certainly create entities that have relationships. The application (whatever you're building) will have to load everything from the text files at application start. Since your code will be doing the searching, it has to have the entities loaded (from text files) and in-memory.
That may or may not be a pleasant or effective experience, but if you're set on replacing SQL with text files, it's not impossible.
CSV files are good for mass input and mass output. They are not good for point manipulations or maintaining relationships. Consider using a database. SQLite might be something useful in your application.
Based on your comments, it would make more sense to use XML instead of CSV. This meets your requirements for being human and machine readable, and XML has nice built in libraries for searching, manipulating, serializing etc.
You can use SQL queries in CSV files: How to use SQL against a CSV file, I have done it for reading but never for writing so I don't know if this will work for you.

SQL Server export to Excel

How to increase performance on exporting a database with tables with one to many relationship (in so case many to many relationship) into a single excel file.
Right now, I get all the data from the database and process it into a table using a few for loops, then i change the header of the html file to download it as an excel file. But it take a while for the number of records i have (about 300 records. )
I was just wondering, if there is a faster way to improved performance.
Thanks
It sounds like you're loading each table into memory with your c# code, and then building a flat table by looping through the data. A vastly simpler and faster way to do that would be to use a SQL query with a few JOINs in it:
http://www.w3schools.com/sql/sql_join.asp
http://en.wikipedia.org/wiki/Join_(SQL)
Also, I get the impression that you're rendering the resulting flat table to html, and then saving that as an excel file. There are several ways that you can create that excel (or csv) file directly, without having to turn it into an html table first.

Application aware data import

I'm building an application to import data into a sql server 2008 Express db.
This database is being used by an application that is currently in production.
The data that needs to be imported comes from various sources, mostly excel sheets and xml files.
The database has the following tables:
tools
powertools
strikingtools
owners
Each row, or xml tag in the source files has information about 1 tool:
name, tooltype, weight, wattage, owner, material, etc...
Each of these rows has the name of the tool's owner this name has to be inserted into the owners table but only if the name isn't already in there.
For each of these rows a new row needs to be inserted in the tools table.
The tools table has a field owner_id with a foreign key to the owners table where the primary key of the corresponding row in the owners table needs to be set
Depending on the tooltype a new row must be created in either the powertools table or the strikingtools table. These 2 tables also have a tool_id field with a foreign key to the tools table that must be filled in.
The tools table has a tool_owner_id field with a foreign key to the owners table that must be filled in.
If any of the rows in the importfile fails to import for some reason, the entire import needs to be rolled back
Currently I'm using a dataset to do this but for some large files (over 200.000 tools) this requires quite a lot of memory. Can anybody think of a better aproach for this?
There are two main issues to be solved:
Parsing the a large XML document efficiently.
Adding a large amount of records to the database.
XML Parsing
Although the DataSet approach works, the whole XML document is loaded into memory. To improve the efficiency of working with large XML documents you might want look at the XmlReader class. The API is slightly more difficult to use than what DataSet provides. But you will get the benefit of not loading the whole DOM into memory at once.
Inserting records to the DB
To satisfy your Atomicity requirement you can use a single database transaction but the large number of records you are dealing with for a single transaction is not ideal. You will most likely incur issues like:
Database having to deal with a large number of locks
Database locks that might escalate from row locks to page locks and even table locks.
Concurrent use of the database will be severely affect during the import.
I would recommend the following instead of a single DB transaction:
See if it possible to create smaller transaction batches. Maybe 100 records at a time. Perhaps it is possible to logically load sections of the XML file together, where it would be acceptable load a subset of the data as a unit into the system.
Validate as much of your data upfront. E.g. Check that required fields are filled or that FK's are correct.
Make the upload repeatable. Skip over existing data.
Provide a manual undo strategy. I know this is easier said than done, but might even be required as an additional business rule. For example the upload was successful but someone realises a couple of hours later that the wrong file was uploaded.
It might be useful to upload your data to a initial staging area in your DB to perform validations and to mark which records have been processed.
Use SSIS, and create and ETL package.
Use Transactions for the roll back feature, and stored procedure that handle creating/checking the foreign keys.

Categories