Need advice on database tables merging tool

Need advice on database tables merging tool - c#

I have created a tool that is using Excel (VBA) currently, however I would like to make it independent from Excel as there might be different kind of problems with Excel update packages etc. Basically to eliminate one more factor that can go wrong.
My current Excel application:
Imports two database tables (Customers) into two different worksheets sheet1 and sheet2
Merging these two tables to another sheet3 by comparing column 2 (Name) and 6 (Postanumber)
If there are duplicates (there are partly same Customers in both of them) VBA code inputs only one value to sheet3
After sheet3 is ready I am performing foreach loop for sheet3 for exporting all the Customers to another system
I have started with C# code that is able to connect to database and get all the Customers from both tables by merging them.
Started with DataGrid to get an idea:
private Task<DataView> GetDataAsync()
{
return Task.Run(() =>
{
string connectionStringDE = "Driver={Pervasive ODBC Client Interface};ServerName=DE875;dbq=#DEDBFS;Uid=DEUsername;Pwd=DEPassword;";
string queryStringDE = "select NRO,NAME,NAMEA,NAMEB,ADDRESS,POSTA,POSTN,POSTADR,COMPANYN,COUNTRY,ID,ACTIVE from COMPANY";
string connectionStringFR = "Driver={Pervasive ODBC Client Interface};ServerName=FR875;dbq=#FRDBFS;Uid=FRUsername;Pwd=FRPassword;";
string queryStringFR = "select NRO,NAME,NAMEA,NAMEB,ADDRESS,POSTA,POSTN,POSTADR,COMPANYN,COUNTRY,ID,ACTIVE from COMPANY";
DataTable dataTable = new DataTable("COMPANY");
// using-statement will cleanly close and dispose unmanaged resources i.e. IDisposable instances
using (OdbcConnection dbConnectionDE = new OdbcConnection(connectionStringDE))
{
dbConnectionDE.Open();
OdbcDataAdapter dadapterDE = new OdbcDataAdapter();
dadapterDE.SelectCommand = new OdbcCommand(queryStringDE, dbConnectionDE);
dadapterDE.Fill(dataTable);
}
using (OdbcConnection dbConnectionFR = new OdbcConnection(connectionStringFR))
{
dbConnectionFR.Open();
OdbcDataAdapter dadapterFR = new OdbcDataAdapter();
dadapterFR.SelectCommand = new OdbcCommand(queryStringFR, dbConnectionFR);
var newTable = new DataTable("COMPANY");
dadapterFR.Fill(newTable);
dataTable.Merge(newTable);
}
return dataTable.DefaultView;
});
}
However because of lack of knowledge I am not so confident what method should I use in C# to store all the data (sheet3, there are more than 1000 records) before exporting to another system with foreach? Should I create local database or can use list? I guess I shouldn't import all the values from two tables to two different lists/database tables and compare/merge them after to another one (like I am doing in my current Excel VBA setup), but can perform this action on connection, merging and assigning to list/database table?

However because of lack of knowledge I am not so confident what method should I use in C# to store all the data (sheet3, there are more than 1000 records) before exporting to another system with foreach?
"What should I use" is subjective/opinion based and can't really be answered here. It would be fair to say that anything you can find a reasonable tutorial on such as generic collections or datasets/datatables will be able to easily handle a dataset this small
Should I create local database or can use list?
You can do either, but I would only use a database if i was persisting the information. It sounds to me like your need for data storage is temporary so I would just use an in-memory collection of data
I guess I shouldn't import all the values from two tables to two different lists/database tables and compare/merge them after to another one
I can't see any reason why you shouldn't
but can perform this action on connection, merging and assigning to list/database table?
It is certainly possible to merge two sets of data by uploading one if them into the database where the other one is and having the database do the merge

Related

How to access data in a DataSet within current project?

I'm attempting to access a basic table from a local database in my Windows Forms project. I seem to have created the database correctly, and imported the dataset as expected as it displays in my Solution Explorer.
However, I am stuck on how to actually access the data within my database. I've attempted many different solutions but I cannot seem to get anywhere.
Here is what I've accomplished so far:
Solution Explorer Screenshot
But I cannot figure out how to make queries to the dataset, whether it's selecting, updating, or deleting rows.
The closest I've come to getting the data is from the code here:
InventoryDatabaseDataSet context = new InventoryDatabaseDataSet();
var inv = context.Tables["Inventory"];
var all = inv.Select();
But it doesn't return any seemingly valid data.
How do I go about making queries to my dataset? I understand Linq is the common method of making queries, but I don't understand how to get to the point of being able to do such a thing.
Any help would be greatly appreciated, thanks!

The DataSet item in the Solution Explorer represents a number of types. Just as String, Int32 and Form are types, so InventoryDatabaseDataSet is a type. Just like other types, you create an instance in code and then use it.
You will also find that those types that are components have been added to your Toolbox and can be added to forms in the designer, just like other controls and components.
You can also drag items from the Data Sources window and have the designer generate appropriate objects and code. For instance, if you drag a table from the Data Sources window to a form, it will generate a DataSet to store the data, a table adapter to retrieve data from the database and save changes back, a DataGridView to display the data, a BindingSource to link between the DataTable and the DataGridView and a BindingNavigator to navigate the data.
If you do use the designer then you'll see code generated to retrieve data by calling Fill on the table adapter and save changes by calling Update. If you want to do it in code yourself then you can do it something like this to retrieve:
var data = new InventoryDatabaseDataSet();
using (var adapter = new InventoryTableAdapter())
{
adapter.Fill(data);
}
this.InventoryBindingSource.DataSource = data.Inventory;
this.InventoryDataGridView.DataSource = this.InventoryBindingSource;
and this to save:
this.InventoryBindingSource.EndEdit();
var data = (InventoryDataTable) this.InventoryBindingSource.DataSource;
using adapter = new InventoryTableAdapter())
{
adapter.Update(data);
}

After storing data in DataSet,DataSet is read through DataTable object.
Similarly object of DataRow is used to read the row of a table.
Following is the sample code
InventoryDatabaseDataSet ds = new new InventoryDatabaseDataSet();;
DataTable dt = new DataTable();
da.SelectCommand = new SqlCommand(#"SELECT * FROM FooTable", connString);
da.Fill(ds, "FooTable");
dt = ds.Tables["FooTable"];
foreach (DataRow dr in dt.Rows)
{
MessageBox.Show(dr["Column1"].ToString());
}

Fastest way to update (populate) 1,000,000 records into a database using .NET

I am using this code to insert 1 million records into an empty table in the database. Ok so without much code I will start from the point I have already interacted with data, and read the schema into a DataTable:
So:
DataTable returnedDtViaLocalDbV11 = DtSqlLocalDb.GetDtViaConName(strConnName, queryStr, strReturnedDtName);
And now that we have returnedDtViaLocalDbV11 lets create a new DataTable to be a clone of the source database table:
DataTable NewDtForBlkInsert = returnedDtViaLocalDbV11.Clone();
Stopwatch SwSqlMdfLocalDb11 = Stopwatch.StartNew();
NewDtForBlkInsert.BeginLoadData();
for (int i = 0; i < 1000000; i++)
{
NewDtForBlkInsert.LoadDataRow(new object[] { null, "NewShipperCompanyName"+i.ToString(), "NewShipperPhone" }, false);
}
NewDtForBlkInsert.EndLoadData();
DBRCL_SET.UpdateDBWithNewDtUsingSQLBulkCopy(NewDtForBlkInsert, tblClients._TblName, strConnName);
SwSqlMdfLocalDb11.Stop();
var ResSqlMdfLocalDbv11_0 = SwSqlMdfLocalDb11.ElapsedMilliseconds;
This code is populating 1 million records to an embedded SQL database (localDb) in 5200ms. The rest of the code is just implementing the bulkCopy but I will post it anyway.
public string UpdateDBWithNewDtUsingSQLBulkCopy(DataTable TheLocalDtToPush, string TheOnlineSQLTableName, string WebConfigConName)
{
//Open a connection to the database.
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings[WebConfigConName].ConnectionString))
{
connection.Open();
// Perform an initial count on the destination table.
SqlCommand commandRowCount = new SqlCommand("SELECT COUNT(*) FROM "+TheOnlineSQLTableName +";", connection);
long countStart = System.Convert.ToInt32(commandRowCount.ExecuteScalar());
var nl = "\r\n";
string retStrReport = "";
retStrReport = string.Concat(string.Format("Starting row count = {0}", countStart), nl);
retStrReport += string.Concat("==================================================", nl);
// Create a table with some rows.
//DataTable newCustomers = TheLocalDtToPush;
// Create the SqlBulkCopy object.
// Note that the column positions in the source DataTable
// match the column positions in the destination table so
// there is no need to map columns.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = TheOnlineSQLTableName;
try
{
// Write from the source to the destination.
for (int colIndex = 0; colIndex < TheLocalDtToPush.Columns.Count; colIndex++)
{
bulkCopy.ColumnMappings.Add(colIndex, colIndex);
}
bulkCopy.WriteToServer(TheLocalDtToPush);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
// Perform a final count on the destination
// table to see how many rows were added.
long countEnd = System.Convert.ToInt32(
commandRowCount.ExecuteScalar());
retStrReport += string.Concat("Ending row count = ", countEnd, nl);
retStrReport += string.Concat("==================================================", nl);
retStrReport += string.Concat((countEnd - countStart)," rows were added.", nl);
retStrReport += string.Concat("New Customers Was updated successfully", nl, "END OF PROCESS !");
//Console.ReadLine();
return retStrReport;
}
}
Trying it via a connection to SQL server was around 7000ms(at best) & ~7700ms average. Also via a random kv nosql database took around 40 sec (really I did not even keep records of it as it passed over the x2 of sql variants). So... is there a faster way than what I was testing in my code?
Edit
i am using win7 x64 8gb ram and most important i should think (as i5 3ghz) is not so great by now
the x3 500Gb Wd on Raid-0 does the job even better
but i am just saying if you will check on your pc
though just compare it to any other method in your configuration

Have you tried SSIS? I have never written an SSIS package with a loacldb connection, but this is the sort of activity SSIS should be well suited.
If your data source is a SQL Server, another idea would be setting up a linked server. Not sure if this would work with localdb. If you can set up a linked server, you could bypass the C# all together and load your data with an INSERT .. SELECT ... FROM ... SQL statement.

you can use Dapper.NET.
Dapper is a micro-ORM, executes a query and map the results to a strongly typed List.
Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a “virtual object database” that can be used from within the programming language
For more info:
check out https://code.google.com/p/dapper-dot-net/
GitHub Repository: https://github.com/SamSaffron/dapper-dot-net
Hope It helps..

Remove looping... In SQL, try to make a table with 1 million rows... and left join it use this for insert/select data

Try sending it without storing it in a datatable.
See the example at the end of this post, that allows you to do it with an enumerator http://www.developerfusion.com/article/122498/using-sqlbulkcopy-for-high-performance-inserts/

If you are just creating nonsense data, create a stored procedure and just call that through .net
If you are passing real data, again passing it to a stored proc would be quicker but you would be best off dropping the table and recreating it with the data.
If you insert one row at a time, it will take longer than inserting it all at once. It will take even longer if you have indexes to write.

Create a single XML file for all rows you want to save into data base. Pass this XML to SQL stored procedure and save all record in one call only.
But your stored procedure must be written so that it can read all read then insert into table.

If this is a new project I recommend you to use Entity Framework. In this case you can create a List<> with an object with all the data you need and then simply add it entirely to the corresponding table.
This way you are quickly geting the needed data and then sending it to the database at once.

I agree with Mike on SSIS but it my not suit your environment, however for ETL processes that involve cross server calls and general data flow processes it is a great built in tool and highly integrated.
With 1 million rows you will likely have to do a bulk insert. Depending on the row size you would not really be able to use a stored procedure unless you did this in batches. A datatable will fill memory pretty quick, again depending on the row size. You could make a stored procedure and have that take a table type and call that every X number of rows but why would we do this when you already have a better, more scalable solution. That million rows could be 50 million next year.
I have used SSIS a bit and if that is an organizational fit I would suggest looking at it, but it wouldn't be a one time answer, wouldn't be worth the dependencies.

C# console app. SQLBulkCopy and quickly entering XML into a SQL Server DB

C# with .net 2.0 with a SQL server 2005 DB backend.
I've a bunch of XML files which contain data along the lines of the following, the structure varies a little but is more or less as follows:
<TankAdvisory>
<WarningType name="Tank Overflow">
<ValidIn>All current tanks</ValidIn>
<Warning>Tank is close to capacity</Warning>
<IssueTime Issue-time="2011-02-11T10:00:00" />
<ValidFrom ValidFrom-time="2011-01-11T13:00:00" />
<ValidTo ValidTo-time="2011-01-11T14:00:00" />
</WarningType>
</TankAdvisory>
I have a single DB table that has all the above fields ready to be filled.
When I use the following method of reading the data from the XML file:
DataSet reportData = new DataSet();
reportData.ReadXml("../File.xml");
It successfully populates the Dataset but with multiple tables. So when I come to use SQLBulkCopy I can either save just one table this way:
sbc.WriteToServer(reportData.Tables[0]);
Or if I loop through all the tables in the Dataset adding them it adds a new row in the Database, when in actuality they're all to be stored in the one row.
Then of course there's also the issue of columnmappings, I'm thinking that maybe SQLBulkCopy is the wrong way of doing this.
What I need to do is find a quick way of getting the data from that XML file into the Database under the relevant columns in the DB.

Ok, so the original question is a little old, but i have just came across a way to resolve this issue.
All you need to do is loop through all the DataTables that are in your DataSet and add them to the One DataTable that has all the columns in the Table in your DB like so...
DataTable dataTable = reportData.Tables[0];
//Second DataTable
DataTable dtSecond = reportData.Tables[1];
foreach (DataColumn myCol in dtSecond.Columns)
{
sbc.ColumnMappings.Add(myCol.ColumnName, myCol.ColumnName);
dataTable.Columns.Add(myCol.ColumnName);
dataTable.Rows[0][myCol.ColumnName] = dtSecond.Rows[0][myCol];
}
//Finally Perform the BulkCopy
sbc.WriteToServer(dataTable);

foreach (DataColumn myCol in dtSecond.Columns)
{
dataTable.Columns.Add(myCol.ColumnName);
for (int intRowcnt = 0; intRowcnt <= dtSecond.Rows.Count - 1; intRowcnt++)
{
dataTable.Rows[intRowcnt][myCol.ColumnName] = dtSecond.Rows[intRowcnt][myCol];
}
}

SqlBulkCopy is for many inserts. It's perfect for those cases when you would otherwise generate a lot of INSERT statements and juggle the limit on total number of parameters per batch. The thing about the SqlBulkCopy class though, is that it's a cranky. Unless you fully specify all column mappings for the data set it will throw an exception.
I'm assuming that your data is quite manageable since your reading it into a DataSet. If you where to have even larger data sets you could lift chunks into memory and then flush them to the database piece by piece. But if everything fits in one go, it's as simple as that.
The SqlBulkCopy is the fastest way to put data into the database. Just setup column mappings for all the columns, otherwise it won't work.

Why reinvent the wheel? Use SSIS. Read with an XML Source, transform with one of the many Transformations, then load it with an OLE Db Destination into the SQL Server table. You will never beat SSIS in terms of runtime, speed to deploy the solution, maintenance, error handling etc etc.

Import data from excel into multiple tables

I'm building an offline C# application that will import data off spread sheets and store them in a SQL Database that I have created (Inside the Project). Through some research I have been able to use some code that can import a static table, into a Database that is exactly the same layout as the columns in the worksheet
What I"m looking to do is have specific columns go to their correct tables based on name. This way I have the database designed correctly and not just have one giant table to store everything.
Below is the code I'm using to import a few static fields into one table, I want to be able to split the imported data into more than one.
What is the best way to do this?
public partial class Form1 : Form
{
string strConnection = ConfigurationManager.ConnectionStrings
["Test3.Properties.Settings.Test3ConnectionString"].ConnectionString;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
//Create connection string to Excel work book
string excelConnectionString =
#"Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\Test.xls;
Extended Properties=""Excel 8.0;HDR=YES;""";
//Create Connection to Excel work book
OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);
//Create OleDbCommand to fetch data from Excel
OleDbCommand cmd = new OleDbCommand
("Select [Failure_ID], [Failure_Name], [Failure_Date], [File_Name], [Report_Name], [Report_Description], [Error] from [Failures$]", excelConnection);
excelConnection.Open();
OleDbDataReader dReader;
dReader = cmd.ExecuteReader();
SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection);
sqlBulk.DestinationTableName = "Failures";
sqlBulk.WriteToServer(dReader);
}

You can try an ETL (extract-transform-load) architecture:
Extract: One class will open the file and get all the data in chunks you know how to work with (usually you take a single row from the file and parse its data into a POCO object containing fields that hold pertinent data), and put those into a Queue that other work processes can take from. In this case, maybe the first thing you do is have Excel open the file and re-save it as a CSV, so you can reopen it as basic text in your process and chop it up efficiently. You can also read the column names and build a "mapping dictionary"; this column is named that, so it goes to this property of the data object. This process should happen as fast as possible, and the only reason it should fail is because the format of a row doesn't match what you're looking for given the structure of the file.
Transform: Once the file's contents have been extracted into an instance of a basic row, perform any validation, calculations or other business rules necessary to turn a row from the file into a set of domain objects that conform to your domain model. This process can be as complex as you need it to be, but again it should be as straightforward as you can make it while obeying all the business rules given in your requirements.
Load: Now you've got an object graph in your own domain objects, you can use the same persistence framework you'd call to handle domain objects created any other way. This could be basic ADO, an ORM like NHibernate or MSEF, or an Active Record pattern where objects know how to persist themselves. It's no bulk load, but it saves you having to implement a completely different persistence model just to get file-based data into the DB.
An ETL workflow can help you separate the repetitive tasks into simple units of work, and from there you can identify the tasks that take a lot of time and consider parallel processes.
Alternately, you can take the file and massage its format by detecting columns you want to work with, and arranging them into a format that matches your bulk input spec, before calling a bulk insert routine to process the data. This file processor routine can do anything you want it to, including separating data into several files. However, it's one big process that works on a whole file at a time and has limited opportunities for optimization or parallel processing. However, if your loading mechanism is slow, or you've got a LOT of data that is simple to digest, it may end up faster than even a well-designed ETL.
In any case, I would get away from an Office format and into a plain-text (or XML) format as soon as I possibly could, and I would DEFINITELY avoid having to install Office on a server. If there is ANY way you can require the files be in some easily-parseable format like CSV BEFORE they're loaded, so much the better. Having an Office installation on a server is a Really Bad Thing in general, and OLE operations in a server app is not much better. The app will be very brittle, and anything Office wants to tell you will cause the app to hang until you log onto the server and clear the dialog box.

If you were looking for a more code related answer, you could use the following to modify your code to work with difficult column names / different tables:
private void button1_Click(object sender, EventArgs e)
{
//Create connection string to Excel work book
string excelConnectionString =
#"Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\Test.xls;
Extended Properties=""Excel 8.0;HDR=YES;""";
//Create Connection to Excel work book
OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);
//Create OleDbCommand to fetch data from Excel
OleDbCommand cmd = new OleDbCommand
("Select [Failure_ID], [Failure_Name], [Failure_Date], [File_Name], [Report_Name], [Report_Description], [Error] from [Failures$]", excelConnection);
excelConnection.Open();
DataTable dataTable = new DataTable();
dataTable.Columns.Add("Id", typeof(System.Int32));
dataTable.Columns.Add("Name", typeof(System.String));
// TODO: Complete other table columns
using(OleDbDataReader dReader = cmd.ExecuteReader())
{
DataRow dataRow = dataTable.NewRow();
dataRow["Id"] = dReader.GetInt32(0);
dataRow["Name"] = dReader.GetString(1);
// TODO: Complete other table columns
dataTable.Rows.Add(dataRow);
}
SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection);
sqlBulk.DestinationTableName = "Failures";
sqlBulk.WriteToServer(dataTable);
}
Now you can control the names of the columns and which tables the data gets imported into. SqlBulkCopy is good for insert large amounts of data. If you only have a small amount of rows, you might be better off creating a standard data access layer to insert your records.

If you are only interested in the text (not the formatting etc.), alternatively you can save the excel file as CSV file, and parse the CSV file instead, it's simple.

Depending on the lifetime of the program, I would recommend one of two options.
If the program is to be short lived in use, or generally a "throw away" project, I would recommend a series of routines which parse and input data into another set of tables using standard SQL with some string processing as needed.
If the program will stick around longer and/or find more use on a day-to-day basis, I would recommend implementing a solution similar to the one recommended by #KeithS. With a set of well defined steps for working with the data, much flexibility is gained. More specifically, the .NET Entity Framework would probably be a great fit.
As a bonus, if you're not already well versed in this area, you might find you learn a great deal about working with data between boundaries (xls -> sql -> etc.) during your first stint with an ORM such as EF.

LINQ to DataSet Dataclass assignment question

I'm working on a Silverlight project trying to access a database using LINQ To DataSet and then sending data over to Silverlight via .ASMX web service.
I've defined my DataSet using the Server Explorer tool (dragging and dropping all the different tables that I'm interested in). The DataSet is able to access the server and database with no issues.
Below is code from one of my Web Methods:
public List<ClassSpecification> getSpecifications()
{
DataSet2TableAdapters.SpecificationTableAdapter Sta = new DataSet2TableAdapters.SpecificationTableAdapter();
return (from Spec in Sta.GetData().AsEnumerable()
select new ClassSpecification()
{
Specification = Spec.Field<String>("Specification"),
SpecificationType = Spec.Field<string>("SpecificationType"),
StatusChange = Spec.Field<DateTime>("StatusChange"),
Spec = Spec.Field<int>("Spec")
}).ToList<ClassSpecification>();
}
I created a "ClassSpecification" data class which is going to contain my data and it has all the table fields as properties.
My question is, is there a quicker way of doing the assignment than what is shown here? There are actually about 10 more fields, and I would imagine that since my DataSet knows my table definition, that I would have a quicker way of doing the assignment than going field by field. I tried just "select new ClassSpecification()).ToList
Any help would be greatly appreciated.

First, I'll recommend testing out different possibilities using LINQPad, which is free and awesome.
I can't quite remember what you can do from the table adapter, but you should be able to use the DataSet to get at the data you want, e.g.
string spec = myDataSet.MyTable.Rows[0] // or FindBy... or however you are choosing a row
.Specification;
So you might be able to do
foreach(var row in myDataSet.MyTable.Rows) {
string spec = row.Specification;
...
}
Or
return (from row in myDataSet.Specification
select new ClassSpecification()
{
Specification = row.Specification,
SpecificationType = row.SpecificationType,
StatusChange = row.StatusChange,
Spec = row.Spec,
}).ToList<ClassSpecification>();
Or even
return myDataSet.Specification.Cast<ClassSpecification>()
Not sure if the last one will work, but you can see that there are several ways to get what you want. Also, in my tests the row is strongly typed, so you shouldn't need to create a new class in which to put the data - you should just be able to use the existing "SpecificationRow" class. (In fact, I believe that this is the anemic domain model anti-pattern.)

So are you using a dataset for lack of a Linq provider to the database? That is the only reason I would consider this move. For instance, with Linq to Sql you can drag the table out and drop it. Then you have instant objects with the shape you want for it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.