I have a lot of data that I need to insert into a SQL table and I'd like to use SqlBulkCopy. I have millions of rows that need to be added to a DataTable and I seem to be bottle-necked because the only way I've found to add rows is to use DataTable.Rows.Add(DataRow); This is an example of the code I've got so far:
var table = new DataTable("MyTableName");
table.Columns.Add("Id");
table.Columns.Add("SomeOtherColumn");
IEnumerable<MyObject> myData; // this is a method parameter which has millions of values
var rows = myData.Select(data =>
{
return data.SomeInnerEnumerable.Select(e =>
{
var row = table.NewRow();
row["Id"] = e.Id;
row["SomeOtherColumn"] = e.SomeOtherColumn;
return row;
});
}).SelectMany(r => r); // flattens IEnumerable<IEnumerable<DataRow>> into IEnumerable<DataRow>
Console.WriteLine("DataRows created");
foreach (var row in rows)
{
table.Rows.Add(row);
}
The code prints out "DataRows created" very quickly, but hangs up on the loop for 25+ minutes. Is there a more efficient way of adding this many rows to a DataTable?
Related
I have more than 9000 rows and 18 columns in my datagridview. I have to read the columns from an external datatable. If I find a match between column names, I have to copy all values from the datatable column into the datagridview column. My problem is, I cannot iterate over these rows for 18 times for more than 9000 rows and write for every iteration the value in the datagridview cell because it is too slow. Is there any valid alternative?
I add some code below so that you can understand better my question. Here I'm iterating the columns first, then the rows. Sorry for the indentation but I'm having problems in copy paste code on StackOverflow.
dgvMappatura is my dataGridView, dtExcel is my DataTable
foreach (DataColumn col in dtExcel.Columns)
{
if (col.ColumnName.Equals(nome_colonna_origine))
{
foreach (DataRow drExcel in dtExcel.Rows)
{
// some code to add values to datagridview from the datatable column
}
}
}
See if following is faster :
DataTable dt = new DataTable();
foreach (DataRow row in dt.AsEnumerable())
{
var matches = row.ItemArray.Select((x, i) => new { name = x.ToString(), index = i }).GroupBy(x => x.name).Where(x => x.Count() > 1).ToArray();
}
I have a datatable that I am trying to make an update on.
my datatable is the data source of a data gridview (Forms application)
I want to update all rows that are part of a textbox
the textbox contains a comma separated values such as
A1,A11,B4,B38,C44
I have this code but stuck on how to make it working
DataTable dt = new DataTable();
dt = (DataTable)grd1.DataSource;
DataRow[] dr = dt.Select("'," + TextBox1.Text + ",' LIKE '%,Code,%'");
foreach (DataRow row in dr)
{
row["Price"] = 1000;
}
The problem is in this code
"'," + TextBox1.Text + ",' LIKE '%,Code,%'"
it does not retuen any rows so I think I did not write it the right way.
How to fix my select line?
Note : I added a comma before and after so I do not get "T37" when I am looking for "T3"
Your question wasn't easy to understand for me, but you seem to be saying that you will type a list of values into the textbox and these values are to be looked up in the [Code] column of the datatable. I'm not clear on whether the Code column itself is a single value or a comma separated list of codes, so I'll answer for both. Assuming the Code column is a CSV, and you want that any row where any one of the values in Code is one of these values in the textbox, shall have its price updated to 1000:
So for a textbox of "A1,B1" and a datarows like:
Code Price
A1,C3 200
B4,C7 400
The 200 row shall be updated and the 490 row shall not
I'd use LINQ for this rather than datatable select
var codes = textbox.Split(',');
var rows = dt.AsEnumerable().Where(r => codes.Any(c => (r["Code"] as string).Split(',').Contains(c)));
foreach(var r in rows)
r["Price") = 1000;
If you're doing this a lot I wouldn't have the codes in the row as a CSV string; a row field is allowed to be an array of strings - storing the codes as an array in the row will save having to split them every time you want to query them
If I've got this wrong and the row contains just a single Code value, the logic is the same, it just doesn't need the split(though the code above would work, it's not optimal):
var rows = dt.AsEnumerable().Where(r => codes.Any(c => (r["Code"] as string) == c));
And actually if you're going to be doing this a lot, I would index the datatable:
//if it's a csv in the datatable
var index = dt.AsEnumerable()
.SelectMany(r => r["Code"].ToString().Split(','), (row, code) => new { R=row, C=code})
.ToLookup(o => o.C, o => o.R);
This will give something like a dictionary where a code maps to a list of rows where the code appears. For a row set like
Code Price
A1,C3 200
B4,C3 400
You get a "dictionary" like:
A1: { "A1,C3", 200 }
C3: { "A1,C3", 200 },{ "B4,C3", 400 }
B4: { "B4,C3", 400 }
so you could:
foreach(var c in codesTextbox.Split)
foreach(var row in index["c"])
row["Price"] = 1000;
If the Code column doesn't contain a csv, doing a selectmany should still be fine, but to optimize it:
var index = dt.AsEnumerable().ToLookup(r => (string)r["Code"]);
I have a datatable has data like this format
........ IVR........
.........IVR........
.........IVR........
.........City1......
.........City1......
.........City1......
.........City2......
.........City2......
.........City2......
I want to take the last row of each value. in order words, the rows that are bold now
The challenge is that i wan these three rows in a datatable. I tried to search on internet but i didn't know what is the name of this feature. could you help me please
You can GroupBy() and then select last row with the help of the Last() method.
var result = from b in myDataTable.AsEnumerable()
group b by b.Field<string>("Your_Column_Name") into g
select g.Last();
DataTable filtered = myDataTable.Clone();
foreach(DataRow row in result)
{
filtered.ImportRow(row);
}
Clone clones the structure of the DataTable, including all DataTable schemas and constraints.
This can be implemented in a simple loop using a Dictionary to hold found rows:
var cRows = new Dictionary<string, DataRow>(StringComparer.InvariantCultureIgnoreCase);
foreach (DataRow oRow in oTable.Rows)
{
var sKey = oRow["KeyValue"].ToString();
if (!cRows.ContainsKey(sKey))
{
cRows.Add(sKey, oRow);
}
else
{
cRows[sKey] = oRow;
}
}
This approach will store the last row for each unique value in the column that you nominate.
To move the selected rows into a new DataTable:
var oNewTable = oTable.Clone();
foreach (var oRow in cRows.Values)
{
oNewTable.Rows.Add(oRow);
}
Clone just clones the structure of the current table, not the rows.
I have a data table in which if the Addresses match move one of the rows to the top of the data table. I am using the following code but it doesnt work. Any idea how to achieve this. The data in the data table is imported from an excel file. I have tried the same if statement in GridView to highlight the duplicates and that works but I also want to move them to the top because the data has more than 1000 rows and its hard to move up and down again and again to check for the highlighted row.
for (int row = 1; row < dtf1.Rows.Count; row++)
{
for (int rowinner = 1; rowinner < dtf1.Rows.Count; rowinner++)
{
if (rowinner != row)
{
if (dtf1.Rows[row][addresscolno] == dtf1.Rows[rowinner][addresscolno].ToString())
{
DataRow newrow = dtf1.Rows[row];
dtf1.Rows.RemoveAt(row);
dtf1.AcceptChanges();
dtf1.Rows.InsertAt(newrow, 1);
dtf1.AcceptChanges();
GridView1.DataSource = dtf1;
GridView1.DataBind();
}
}
}
}
So you want to reorder the DataTable. The row with a given address should be on the top. You can use Linq-To-DataSet to order the rows and CopyToDataTable to create a new DataTable with the new order:
// assuming the address is a string in a variable address, to simplify matters
DataTable tblOrdered = dtf1.AsEnumerable()
.OrderByDescending(r => r.Field<string>("addresscolno") == address)
.ThenBy(r => r.Field<string>("addresscolno"))
.CopyToDataTable();
Then you can use that as DataSource for your GridView.
Edit: Give also DataTable.Rows.InsertAt a try.
dtf1.Rows.InsertAt(dtf1.Rows[rowinner], 0);
I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all.
What other technique can be applied to remove duplicate records from a csv file
Here's the code, definitely I am doing something wrong
DataTable dtCSV = ReadCsv(file, columns);
//columns is a list of string List column
DataTable dt=RemoveDuplicateRecords(dtCSV, columns);
private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns)
{
DataView dv = dtCSV.DefaultView;
string RowFilter=string.Empty;
if(dt==null)
dt = dv.ToTable().Clone();
DataRow row = dtCSV.Rows[0];
foreach (DataRow row in dtCSV.Rows)
{
try
{
RowFilter = string.Empty;
foreach (string column in columns)
{
string col = column;
RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and ";
}
RowFilter = RowFilter.Substring(0, RowFilter.Length - 4);
dv.RowFilter = RowFilter;
DataRow dr = dt.NewRow();
bool result = RowExists(dt, RowFilter);
if (!result)
{
dr.ItemArray = dv.ToTable().Rows[0].ItemArray;
dt.Rows.Add(dr);
}
}
catch (Exception ex)
{
}
}
return dt;
}
One way to do this would be to go through the table, building a HashSet<string> that contains the combined column values you're interested in. If you try to add a string that's already there, then you have a duplicate row. Something like:
HashSet<string> ScannedRecords = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
// Build a string that contains the combined column values
StringBuilder sb = new StringBuilder();
foreach (string col in columns)
{
sb.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
// Try to add the string to the HashSet.
// If Add returns false, then there is a prior record with the same values
if (!ScannedRecords.Add(sb.ToString())
{
// This record is a duplicate.
}
}
That should be very fast.
If you've implemented your sorting routine as a couple of nested for or foreach loops, you could optimise it by sorting the data by the columns you wish to de-duplicate against, and simply compare each row to the last row you looked at.
Posting some code is a sure-fire way to get better answers though, without an idea of how you've implemented it anything you get will just be conjecture.
Have you tried Wrapping the rows in a class and using Linq?
Linq will give you options to get distinct values etc.
You're currently creating a string-defined filter condition for each and every row and then running that against the entire table - that is going to be slow.
Much better to take a Linq2Objects approach where you read each row in turn into an instance of a class and then use the Linq Distinct operator to select only unique objects (non-uniques will be thrown away).
The code would look something like:
from row in inputCSV.rows
select row.Distinct()
If you don't know the fields you're CSV file is going to have then you may have to modify this slightly - possibly using an object which reads the CSV cells into a List or Dictionary for each row.
For reading objects from file using Linq, this article by someone-or-other might help - http://www.developerfusion.com/article/84468/linq-to-log-files/
Based on the new code you've included in your question, I'll provide this second answer - I still prefer the first answer, but if you have to use DataTable and DataRows, then this second answer might help:
class DataRowEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
// perform cell-by-cell comparison here
return result;
}
public int GetHashCode(DataRow obj)
{
return base.GetHashCode();
}
}
// ...
var comparer = new DataRowEqualityComparer();
var filteredRows = from row in dtCSV.Rows
select row.Distinct(comparer);