c# datatable select last row on a speicfic condition - c#

I have a datatable has data like this format
........ IVR........
.........IVR........
.........IVR........
.........City1......
.........City1......
.........City1......
.........City2......
.........City2......
.........City2......
I want to take the last row of each value. in order words, the rows that are bold now
The challenge is that i wan these three rows in a datatable. I tried to search on internet but i didn't know what is the name of this feature. could you help me please

You can GroupBy() and then select last row with the help of the Last() method.
var result = from b in myDataTable.AsEnumerable()
group b by b.Field<string>("Your_Column_Name") into g
select g.Last();
DataTable filtered = myDataTable.Clone();
foreach(DataRow row in result)
{
filtered.ImportRow(row);
}
Clone clones the structure of the DataTable, including all DataTable schemas and constraints.

This can be implemented in a simple loop using a Dictionary to hold found rows:
var cRows = new Dictionary<string, DataRow>(StringComparer.InvariantCultureIgnoreCase);
foreach (DataRow oRow in oTable.Rows)
{
var sKey = oRow["KeyValue"].ToString();
if (!cRows.ContainsKey(sKey))
{
cRows.Add(sKey, oRow);
}
else
{
cRows[sKey] = oRow;
}
}
This approach will store the last row for each unique value in the column that you nominate.
To move the selected rows into a new DataTable:
var oNewTable = oTable.Clone();
foreach (var oRow in cRows.Values)
{
oNewTable.Rows.Add(oRow);
}
Clone just clones the structure of the current table, not the rows.

Related

Is there a way to bulk add rows into a DataTable?

I have a lot of data that I need to insert into a SQL table and I'd like to use SqlBulkCopy. I have millions of rows that need to be added to a DataTable and I seem to be bottle-necked because the only way I've found to add rows is to use DataTable.Rows.Add(DataRow); This is an example of the code I've got so far:
var table = new DataTable("MyTableName");
table.Columns.Add("Id");
table.Columns.Add("SomeOtherColumn");
IEnumerable<MyObject> myData; // this is a method parameter which has millions of values
var rows = myData.Select(data =>
{
return data.SomeInnerEnumerable.Select(e =>
{
var row = table.NewRow();
row["Id"] = e.Id;
row["SomeOtherColumn"] = e.SomeOtherColumn;
return row;
});
}).SelectMany(r => r); // flattens IEnumerable<IEnumerable<DataRow>> into IEnumerable<DataRow>
Console.WriteLine("DataRows created");
foreach (var row in rows)
{
table.Rows.Add(row);
}
The code prints out "DataRows created" very quickly, but hangs up on the loop for 25+ minutes. Is there a more efficient way of adding this many rows to a DataTable?

Using LINQ Query DataTable using partial column names

I have a DataTable with a variable number of columns.
I want to create a LINQ query that returns the data from the columns that start with a 'B_'.
I have a query that returns the columns names that start with 'B_'. It is below:
var arrayNames = (from DataColumn x in stationTable.Columns
where x.ColumnName.Contains("B_")
select x.ColumnName).ToArray();
Now that I have the column names how to I create a query using this array to return the data in the columns?
Thanks
You can create a DataView that hides the columns not in your list - that way you get to keep any type information:
var arrayNames = (from DataColumn x in stationTable.Columns
where !x.ColumnName.Contains("B_") // note the reversal
select x.ColumnName).ToArray();
DataView dv = new DataView(stationTable);
foreach (string colName in arrayNames)
dv.Table.Columns[colName].ColumnMapping = MappingType.Hidden
There are a couple of ways to approach this.
If you don't care about grouping the items by column type, this query accomplishes that:
var query = from DataColumn col in stationTable.Columns
from DataRow row in stationTable.Rows
where col.ColumnName.StartsWith("B_")
select row[col.ColumnName];
However, to maintain the grouping, you could use a lookup as follows:
var query = (from DataColumn col in stationTable.Columns
from DataRow row in stationTable.Rows
where col.ColumnName.StartsWith("B_")
select new { Row = row[col.ColumnName], col.ColumnName })
.ToLookup(o => o.ColumnName, o => o.Row);
foreach (var group in query)
{
Console.WriteLine("ColumnName: {0}", group.Key);
foreach (var item in group)
{
Console.WriteLine(item);
}
}
The downside of either approach is you're ending up with an object. Retaining the results in a strongly typed manner would require some extra work given the dynamic nature of the question.

How to copy all the rows in a datatable to a datarow array?

I have two tables:
tbl_ClassFac:
ClassFacNo (Primary Key)
,FacultyID
,ClassID
tbl_EmpClassFac:
EmpID, (Primary Key)
DateImplement, (Primary Key)
ClassFacNo
I want to know all the Employees who are on a specific ClassFacNo. ie. All EmpID with a specific ClassFacNo... What I do is that I first search tbl_EmpClassFac with the EmpID supplied by the user. I store these datarows. Then use the ClassFacNo from these datarows to search through tbl_ClassFac.
The following is my code.
empRowsCF = ClassFacDS.Tables["EmpClassFac"].Select("EmpID='" + txt_SearchValueCF.Text + "'");
int maxempRowsCF = empRowsCF.Length;
if (maxempRowsCF > 0)
{
foundempDT = ClassFacDS.Tables["ClassFac"].Clone();
foreach (DataRow dRow in empRowsCF)
{
returnedRowsCF = ClassFacDS.Tables["ClassFac"].Select("ClassFacNo='" + dRow[2].ToString() + "'");
foundempDT.ImportRow(returnedRowsCF[0]);
}
}
dataGrid_CF.DataSource = null;
dataGrid_CF.DataSource = foundempDT.DefaultView;
***returnedRowsCF = foundempDT.Rows;*** // so NavigateRecordsCF can be used
NavigateRecordsCF("F"); // function to display data in textboxes (no importance here)
I know the code is not very good but that is all I can think of. If anyone has any suggestions please please tell me. If not tell me how do I copy all the Rows in a datatable to a datarow array ???
"How to copy all the rows in a datatable to a datarow array?"
If that helps, use the overload of Select without a parameter
DataRow[] rows = table.Select();
DataTable.Select()
Gets an array of all DataRow objects.
According to the rest of your question: it's actually not clear what's the question.
But i assume you want to filter the first table by a value of a field in the second(related) table. You can use this concise Linq-To-DataSet query:
var rows = from cfrow in tbl_ClassFac.AsEnumerable()
join ecfRow in tbl_EmpClassFac.AsEnumerable()
on cfrow.Field<int>("ClassFacNo") equals ecfRow.Field<int>("ClassFacNo")
where ecfRow.Field<int>("EmpId") == EmpId
select cfrow;
// if you want a new DataTable from the filtered tbl_ClassFac-DataRows:
var tblResult = rows.CopyToDataTable();
Note that you can get an exception at CopyToDataTable if the sequence of datarows is empty, so the filter didn't return any rows. You can avoid it in this way:
var tblResult = rows.Any() ? rows.CopyToDataTable() : tbl_ClassFac.Clone(); // empty table with same columns as source table

Sorting the objects of a DataTable collection

I fetch all tables from a database using
tables = Utility.DBConnection.GetSchema("Tables", restrictions);
How do I bring the tables in a alphabetical sort order after?
I checked the GetSchema, there is no property to give any sort order.
I want to do later:
foreach (DataRow row in tables.Rows) {}
But I want to have the tables sorted before.
If tables is a DataTable, you can use the DataTable.DefaultView Property to provide a sorted view into the data:
DataView view = tables.DefaultView;
view.Sort = "Name";
foreach (DataRowView row in view)
{
}
just make an array/list of tables copied from the collection in the dataset, and sort it yourself?
Not sure what Utility.DBConnection.GetSchema returns, but this is probably very close to what you want:
var sortedTables = from table in tables
orderby table.TableName ascending
select table;
You can use a sorted dictionary --
DataTable dtTables = conn.GetSchema("Tables");
SortedDictionary<string, DataRow> dcSortedTables = new SortedDictionary<string, DataRow>();
foreach (DataRow table in dtTables.Rows) {
string tableName = (string)table[2];
dcSortedTables.Add(tableName, table);
}
// Loop through tables
foreach (DataRow table in dcSortedTables.Values) {
// your stuff here
}

Delete Duplicate records from large csv file C# .Net

I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all.
What other technique can be applied to remove duplicate records from a csv file
Here's the code, definitely I am doing something wrong
DataTable dtCSV = ReadCsv(file, columns);
//columns is a list of string List column
DataTable dt=RemoveDuplicateRecords(dtCSV, columns);
private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns)
{
DataView dv = dtCSV.DefaultView;
string RowFilter=string.Empty;
if(dt==null)
dt = dv.ToTable().Clone();
DataRow row = dtCSV.Rows[0];
foreach (DataRow row in dtCSV.Rows)
{
try
{
RowFilter = string.Empty;
foreach (string column in columns)
{
string col = column;
RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and ";
}
RowFilter = RowFilter.Substring(0, RowFilter.Length - 4);
dv.RowFilter = RowFilter;
DataRow dr = dt.NewRow();
bool result = RowExists(dt, RowFilter);
if (!result)
{
dr.ItemArray = dv.ToTable().Rows[0].ItemArray;
dt.Rows.Add(dr);
}
}
catch (Exception ex)
{
}
}
return dt;
}
One way to do this would be to go through the table, building a HashSet<string> that contains the combined column values you're interested in. If you try to add a string that's already there, then you have a duplicate row. Something like:
HashSet<string> ScannedRecords = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
// Build a string that contains the combined column values
StringBuilder sb = new StringBuilder();
foreach (string col in columns)
{
sb.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
// Try to add the string to the HashSet.
// If Add returns false, then there is a prior record with the same values
if (!ScannedRecords.Add(sb.ToString())
{
// This record is a duplicate.
}
}
That should be very fast.
If you've implemented your sorting routine as a couple of nested for or foreach loops, you could optimise it by sorting the data by the columns you wish to de-duplicate against, and simply compare each row to the last row you looked at.
Posting some code is a sure-fire way to get better answers though, without an idea of how you've implemented it anything you get will just be conjecture.
Have you tried Wrapping the rows in a class and using Linq?
Linq will give you options to get distinct values etc.
You're currently creating a string-defined filter condition for each and every row and then running that against the entire table - that is going to be slow.
Much better to take a Linq2Objects approach where you read each row in turn into an instance of a class and then use the Linq Distinct operator to select only unique objects (non-uniques will be thrown away).
The code would look something like:
from row in inputCSV.rows
select row.Distinct()
If you don't know the fields you're CSV file is going to have then you may have to modify this slightly - possibly using an object which reads the CSV cells into a List or Dictionary for each row.
For reading objects from file using Linq, this article by someone-or-other might help - http://www.developerfusion.com/article/84468/linq-to-log-files/
Based on the new code you've included in your question, I'll provide this second answer - I still prefer the first answer, but if you have to use DataTable and DataRows, then this second answer might help:
class DataRowEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
// perform cell-by-cell comparison here
return result;
}
public int GetHashCode(DataRow obj)
{
return base.GetHashCode();
}
}
// ...
var comparer = new DataRowEqualityComparer();
var filteredRows = from row in dtCSV.Rows
select row.Distinct(comparer);

Categories