I have a dataset that looks like the image. I'm trying to filter by table and get all the columns next to it and compare them against other datasets
This dataset has tables named table 1 and table 2 and when they're selected they look like the picture below. It shows the columns and I need to compare those columns against the rows from the matching table in the first dataset
I've looked at dataview but that would be a lot of work and I'm very inexperienced. I'm trying to find a way to implement a foreach loop that'll get the name of the table in the first dataset and then compare the rows in it against the columns inside the datatable in the second dataset that matched the table name from the first dataset.
Without knowing more about these DataSets (like do they have primary keys, the data types of the columns, the number of rows in each table, etc), I can only provide limited help. The following example tries to be as general as possible and avoid some basic problems:
DataSet ds1 = <<fetch dataset1>>;
DataSet ds2 = <<fetch dataset2>>;
foreach (DataTable tbl1 in ds1.Tables)
{
if (ds2.Tables.Contains(tbl1.TableName))
{
DataTable tbl2 = ds2.Tables[tbl1.TableName];
List<string> commonColumnNames = new List<string>(tbl1.Columns.Cast<DataColumn>().Select(c => c.ColumnName).Intersect(tbl2.Columns.Cast<DataColumn>().Select(c => c.ColumnName)));
int maxRows = Math.Min(tbl1.Rows.Count, tbl2.Rows.Count);
for (int r = 0; r <= maxRows; r++)
{
foreach (string colName in commonColumnNames)
{
if (tbl1.Rows[r][colName] != tbl2.Rows[r][colName])
{
// Different value
}
}
}
}
}
Update 1: I've added comments to the following example to explain step-by-step what this code is doing. As I try to say before, since I didn't know much about your data, I had to put in extra code. This extra code is for things like: 'Does the table ABC exist in both DataSets?', 'Do the two tables have the same columns in them?', 'Do the tables have the same number of rows in them?'. Your original question did not have this information, so I made this code a little more robust to handle those unknowns.
DataSet ds1 = <<fetch dataset1>>;
DataSet ds2 = <<fetch dataset2>>;
// Loop through all of the tables in the 1st DataSet
foreach (DataTable tbl1 in ds1.Tables)
{
// If the 2nd DataSet has a table with same name as the one from the 1st DataSet
if (ds2.Tables.Contains(tbl1.TableName))
{
DataTable tbl2 = ds2.Tables[tbl1.TableName];
// Create a list of column names that the two tables have in common.
// We will only compare the values in these two tables, in this set of matching column names.
List<string> commonColumnNames = new List<string>(tbl1.Columns.Cast<DataColumn>().Select(c => c.ColumnName).Intersect(tbl2.Columns.Cast<DataColumn>().Select(c => c.ColumnName)));
// Before we start comparing the rows in the two tables, find out which one has the fewer number of rows in it.
int maxRows = Math.Min(tbl1.Rows.Count, tbl2.Rows.Count);
// If the tables have a different number of rows, then we will only compare the set of rows numbered 0-to-MinRowCount
for (int r = 0; r <= maxRows; r++)
{
// For each row, compare the values of common columns
foreach (string colName in commonColumnNames)
{
if (tbl1.Rows[r][colName] != tbl2.Rows[r][colName])
{
// Different value
}
}
}
}
}
Related
I'm trying to prevent inserting duplicated data into MS Access table like below,
MS Access table(Record) with Columns: Situation, Check_Item, and starts with no data in the table.
DataTable from DataSet is filled with the query "SELECT * FROM Record WHERE Situation = 'A'".
Then I try to do this process,
DataRow = DataTable.Select("Check_Item = '"+InputTextBox.Text"'");
If (DataRow.Length == 0)
{
Use OleDbCommand to insert InputTextBox.Text string to Check_Item of Record table.
}
Result:
First time key in(e.g. 123456), because there is no data in the table, so 123456 is inserted into Record table.
But at the second time key in 123456, then it still be inserted into Record table.
What happened in this process??
Assuming your DataTable variable has name table, and that you created it and linked it correctly to your MS Access database then:
DataRow[] rows = table.Select("Check_Item = '"+InputTextBox.Text"'"); //Selects all rows where the condition is met.
If (rows.Length == 0)
{
//No row met the condition so create a new one and add it to the table.
DataRow newRow = table.NewRow(); //Creates a new row with the same schema (columns) as the table.
newRow["Check_Item"] = InputTextBox.Text; //Assigns the value.
table.Rows.Add(newRow); //Adds the new row to the row collection of the table.
table.AcceptChanges(); //Commits and persist the changes done to the table.
}
Be sure to check this and also the official docs for DataTable.
I have an existing datatable called _longDataTable containing data. Now, I want to duplicate each row and in each duplicate of the row, I want to set only the value in the SheetCode column according to a value from a different datatable called values, see code below. For example, the values datatable contains 1, 2 and 3, then I want each row of _longDataTable to be duplicated three times and in each of the duplicated rows, I want the Sheet Code column to have values 1, 2 and 3 respectively. My code now looks like below:
foreach (DataRow sheets in _longDataTable.Rows)
{
for(int k = 0; k < number_of_sheets; k++)
{
var newRowSheets = _longDataTable.NewRow();
newRowSheets.ItemArray = sheets.ItemArray;
newRowSheets["SheetCode"] = values.Rows[k]["Sheet Code"];
//add edited row to long datatable
_longDataTable.Rows.Add(newRowSheets);
}
}
However, I get the following error:
Collection was modified; enumeration operation might not execute.
Does anyone know where this error comes from and how to solve my problem?
you get enumeration error because you are iterating through a collection which is changing in the loop(new rows added to it),
as you said in the comment, you get out of memory exception because you are iterating on the _longDataTable, then you add rows to it, the iteration never reach to end and you will get out of memory exception.
I assume this can help you:
//assume _longDataTable has two columns : column1 and SheetCode
var _longDataTable = new DataTable();
var duplicatedData = new DataTable();
duplicatedData.Columns.Add("Column1");
duplicatedData.Columns.Add("SheetCode");
foreach (DataRow sheets in _longDataTable.Rows)
{
for (int k = 0; k < number_of_sheets; k++)
{
var newRowSheets = duplicatedData.NewRow();
newRowSheets.ItemArray = sheets.ItemArray;
newRowSheets["SheetCode"] = values.Rows[k]["Sheet Code"];
newRowSheets["Column1"] = "anything";
//add edited row to long datatable
duplicatedData.Rows.Add(newRowSheets);
}
}
_longDataTable.Merge(duplicatedData);
do not modify _longDataTable, add rows to the temp table (with the same schema) and after the iteration merge two data tables.
I have a .csv file with around 200 columns and the order of columns changes all the time. I want to read each row from the file, identify the corresponding column names in the database, and write data to the table accordingly.
For this I can use a simple switch case checking for the name of column. Since there are 200 columns, I'm wondering if there is any other way to do it.
Example:
public void ColName(string str, Type a)
{
SampleTableName obj = new SampleTableName();
obj."str" = a;
connection.AddSampleTableName(obj);
connection.savechanges();
}
/* SampleTableName has columns: [Name, Age] */
ColName("Name","XYZ");
Output:
Name Age
XYZ NULL
Any ideas please? Thanks.
If the column names are the same you can use SqlBulkCopy and add a list of Column Mappings. The order doesn't matter, as long as the DataTable name is set.
DataTable table = CreateTable(rows);
using (var bulkCopy = new SqlBulkCopy(connectionString))
{
foreach (var col in table.Columns.OfType<DataColumn>())
{
bulkCopy.ColumnMappings.Add(
new SqlBulkCopyColumnMapping(col.ColumnName, col.ColumnName));
}
bulkCopy.BulkCopyTimeout = 600; // in seconds
bulkCopy.DestinationTableName = "<tableName>";
bulkCopy.WriteToServer(table);
}
If the column names are no the same, a dictionary to lookup the different names could be used.
To keep it simple for maintenance purpose, I went with a switch case sigh. However, I wrote a small script to add all those fields values to the table object.
I have two datatables in my ASP.NET application that are filled from csv files and I am trying to combine the two into one.
Heres what the interface looks like:
When I click the 'Merge Data' button it should merge the test1.csv and test2.csv which kind of works but looks like this:
So my question is how do I align these two datatables so that all the data is on the same row?
Below is the code for the Merge Data Button:
List<string> filepaths = new List<string>();
List<DataTable> allTables = new List<DataTable>();
DataTable mergedTables = new DataTable();
int rowCount = grdFiles.Rows.Count;
for (int i = 0; i < rowCount; i++)
{
string filename = grdFiles.Rows[i].Cells[0].Text;
filepaths.Add(Server.MapPath("~/Uploads/") + filename);
}
foreach(string path in filepaths)
{
DataTable dt = new DataTable();
//converts csv into datatable
dt = GetDataTableFromCsv(path, true);
//add table to list of tables
allTables.Add(dt);
}
foreach(DataTable datatable in allTables)
{
//Merge each table in the list to the mergedTables datatable
mergedTables.Merge(datatable);
}
csvUploadResults.DataSource = mergedTables;
csvUploadResults.DataBind();
Thanks in advance for any help :)
If your objective is just to merge data without considering the relationship between the two data then you can add two more columns into first datatable and through loop get data from second table and assign them to first datatable columns. The way the data is received will be the way data will be saved in first datatable.
public DataTable MergeData(DataTable dtFirst,DataTable dtSecond)
{
dtFirst.Columns.Add("LocalAuthority");
dtFirst.Columns.Add("AverageSpeed");
for (int i = 0; i < dtFirst.Rows.Count; i++)
{
dtFirst.Rows[i]["LocalAuthority"] = dtSecond.Rows[i]["LocalAuthority"];
dtFirst.Rows[i]["AverageSpeed"] = dtSecond.Rows[i]["AverageSpeed"];
}
return dtFirst;
}
Now , you need to pass datatable as parameter in following method.
MergeData(allTables.ElementAt(0), allTables.ElementAt(1));
You're going to need a unique key on both datatables and merge them together. You could add the SchoolName to your second datatable and merge the two tables on the postcode. Or more preferably, add an id to both of the datatables and merge the two datatables on the id.
I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all.
What other technique can be applied to remove duplicate records from a csv file
Here's the code, definitely I am doing something wrong
DataTable dtCSV = ReadCsv(file, columns);
//columns is a list of string List column
DataTable dt=RemoveDuplicateRecords(dtCSV, columns);
private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns)
{
DataView dv = dtCSV.DefaultView;
string RowFilter=string.Empty;
if(dt==null)
dt = dv.ToTable().Clone();
DataRow row = dtCSV.Rows[0];
foreach (DataRow row in dtCSV.Rows)
{
try
{
RowFilter = string.Empty;
foreach (string column in columns)
{
string col = column;
RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and ";
}
RowFilter = RowFilter.Substring(0, RowFilter.Length - 4);
dv.RowFilter = RowFilter;
DataRow dr = dt.NewRow();
bool result = RowExists(dt, RowFilter);
if (!result)
{
dr.ItemArray = dv.ToTable().Rows[0].ItemArray;
dt.Rows.Add(dr);
}
}
catch (Exception ex)
{
}
}
return dt;
}
One way to do this would be to go through the table, building a HashSet<string> that contains the combined column values you're interested in. If you try to add a string that's already there, then you have a duplicate row. Something like:
HashSet<string> ScannedRecords = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
// Build a string that contains the combined column values
StringBuilder sb = new StringBuilder();
foreach (string col in columns)
{
sb.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
// Try to add the string to the HashSet.
// If Add returns false, then there is a prior record with the same values
if (!ScannedRecords.Add(sb.ToString())
{
// This record is a duplicate.
}
}
That should be very fast.
If you've implemented your sorting routine as a couple of nested for or foreach loops, you could optimise it by sorting the data by the columns you wish to de-duplicate against, and simply compare each row to the last row you looked at.
Posting some code is a sure-fire way to get better answers though, without an idea of how you've implemented it anything you get will just be conjecture.
Have you tried Wrapping the rows in a class and using Linq?
Linq will give you options to get distinct values etc.
You're currently creating a string-defined filter condition for each and every row and then running that against the entire table - that is going to be slow.
Much better to take a Linq2Objects approach where you read each row in turn into an instance of a class and then use the Linq Distinct operator to select only unique objects (non-uniques will be thrown away).
The code would look something like:
from row in inputCSV.rows
select row.Distinct()
If you don't know the fields you're CSV file is going to have then you may have to modify this slightly - possibly using an object which reads the CSV cells into a List or Dictionary for each row.
For reading objects from file using Linq, this article by someone-or-other might help - http://www.developerfusion.com/article/84468/linq-to-log-files/
Based on the new code you've included in your question, I'll provide this second answer - I still prefer the first answer, but if you have to use DataTable and DataRows, then this second answer might help:
class DataRowEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
// perform cell-by-cell comparison here
return result;
}
public int GetHashCode(DataRow obj)
{
return base.GetHashCode();
}
}
// ...
var comparer = new DataRowEqualityComparer();
var filteredRows = from row in dtCSV.Rows
select row.Distinct(comparer);