Select Distinct from DataTable using Linq and C# - c#

I need to select distinct records from a data table using linq and C# and I can't seem to get the syntax correct. I have the following code, which returns all the rows in a data table, how do I just return DISTINCT rows?
DataTable dt = ds.Tables[0];
var q = from dr in dt.AsEnumerable() select dr;

You'll need to use DataRowComparer
IEnumerable<DataRow> distinctRows =
dt.AsEnumerable().Distinct(DataRowComparer.Default);
More info on comparing data rows using linq to dataset can be found here

We could have:
var q = (from dr in dt.AsEnumerable() select dr).Distinct(DataRowComparer.Default);
But really, the from x in ... select x is redundant, so we can have:
var q = dt.AsEnumerable().Distinct(DataRowComparer.Default);
But all AsEnumerable() will do most of the time, is either nothing (if it's already as such) or potentially slow things up (if distinct could be processed better elsewhere), so it's normally better to do:
var q = dt.Distinct(DataRowComparer.Default);
Though there are cases where the former beats the latter.

(from dr in dt.AsEnumerable() select dr).Distinct();

Related

Using from DataRow dRow in dt.AsEnumerable() to create a distinct ordered list in C# .NET

I have the following code but am having issues with how to maintain the order:
var cars = (from DataRow dRow in dt.AsEnumerable()
select new
{
Car = dRow["Car"],
CarId = dRow["CarId"],
CarOrder = dRow["CarOrder"]
}).Distinct();
The distinct works well but I need to preserver the CarOrder which goes from 1 to X (ascedning).
The DataTable dt have them all in order correctly but when it hits this distinct code the order does not get preserved.
I am trying to figure out how to use the OrderBy clause.
According to docs
It returns an unordered sequence of the unique items in source
That means that order does not get preserved. So what you can do is probably OrderBy() after the Distinct() as following:
var cars = (from DataRow dRow in dt.AsEnumerable()
select new
{
Car = dRow["Car"],
CarId = dRow["CarId"],
CarOrder = dRow["CarOrder"]
}).Distinct().OrderBy(x => x.CarOrder);
See: Does Distinct() method keep original ordering of sequence intact?

Using LINQ, how can I get a joined table returned as IEnumerable<DataRow>

I use SQL a lot, and I'm trying to transfer the same joining logic to LINQ queries against a DataSet. The DataSet is a bunch of tables which are pulled from SQL queries further up the line.
I managed to get this join working -
IEnumerable<DataRow> query =
from a in ds.Tables["Names"].AsEnumerable()
join b in ds.Tables["NameHasAffiliate"].AsEnumerable()
on a.Field<int>("PK") equals b.Field<int>("fk_MainTable_PK")
select a;
This doesn't compile:
select a.Field<int>("PK"), a.Field<string>("Name")
and nor does
select new { PK = a.Field<int>("PK"), Name = a.Field<string>("Name") }
It's definitely querying correctly (I see the expected amount of rows and duplicated table a data) but this is only returning the table a columns - obviously because of select a.
I've tried changing to select new { a, b } and also wrapping the query up in () to add an .ToList() at the end, but neither compiles to give me the IEnumerable version of the query for a simple converstion to table aftewards using
DataTable boundTable = query.CopyToDataTable<DataRow>();
How can I select ALL the columns like this?
Or rather, if I'm joining a lot of tables, how can I specify which columns from each table?
This is because returning anything other than a DataRow will not work in your example, since you are defining the query to be of type IEnumerable<DataRow>. So if you are returning a field or collection of fields, then you need to change the expected return type.
I tried the following and it works just fine.
var ds = new DataSet();
ds.Tables.Add("Names");
ds.Tables["Names"].Columns.Add("PK", typeof(Int32));
ds.Tables["Names"].Columns.Add("Name", typeof(String));
ds.Tables["Names"].Rows.Add("1", "NameValue1");
ds.Tables.Add("NameHasAffiliate");
ds.Tables["NameHasAffiliate"].Columns.Add("fk_MainTable_PK", typeof(Int32));
ds.Tables["NameHasAffiliate"].Columns.Add("AffiliateValue", typeof(String));
ds.Tables["NameHasAffiliate"].Rows.Add("1", "AffiliateValue1");
var query =
ds.Tables["Names"].AsEnumerable()
.Join(
ds.Tables["NameHasAffiliate"].AsEnumerable(),
n => n.Field<int>("PK"), a => a.Field<int>("fk_MainTable_PK"),
(n, a) => new { Key = n.Field<Int32>("PK"), Name = n.Field<String>("Name")})
.ToList();

Find All Rows in DataTable Where Column Value is NOT Unique Using Linq Query

I am attempting to write a Linq to SQL query that returns all rows in a DataTable where a columns value (TxnNumber) is not unique. So far, I have the Linq to SQL below where dt is the DataTable that contains the field TxnNumber. I think that I am pretty close but, intellisense is complaining about the CONTAINS clause. I have tried specifying that I only want to return the TxnNumber field in the sub-select but, it will not compile. Can anyone see what I am doing wrong?
dt.AsEnumerable().Where(u => u.TxnNumber.Contains (dt.AsEnumerable().GroupBy(t => t.TxnNumber).Count() > 1));
Try this
(from r in dt.AsEnumerable()
group r by r.TxnNumber into grp
where grp.Count() > 1
select grp).SelectMany(x=>x).ToList();

Select multiple rows & colums in datatable where value occurs more than once

I have a table like the picture below:
I am trying to return a result that will return all rows (and columns) where product codes are the same
I have not really used linq before and have playing around with some group by clauses but havent gotten really anywhere, except returning the each individual part code
var GetProductsRows = from DataRow dr in table.Rows
group dr by dr.Field<string>("Product Code") into g
select g;
Somehow I think I am treading water a little out of my depth
A nested linq query should do the trick:
var GetProductsRows = from DataRow dr in table.Rows
group dr by dr.Field<string>("Product Code") into gp
from rows in gp where gp.Count() > 1 select rows;
Basically this will select all rows that belong to groups whose count is greater than one.

finding distinct rows in dataset using linq

I am using the below query to find the distinct rows from a dataset but its not getting me the distinct for example its not removing the duplicate and show me the distinct count.
var distinctRows = (from DataRow dRow in _dsMechanic.Tables[0].Rows
select new { col1 = dRow["colName"] }).Distinct();
This should work:
var distinctRows = (
from DataRow dRow in _dsMechanic.Tables[0].Rows
select dRow["colName"]).
Distinct();
Doing the distinct on an anonymous type is just asking for trouble.

Categories