Get Distinct String List From DataTable - c#

To reduce the amount of code, is it possible to combine this in one line - where I convert a DataTable column into a string list, but I only want the distinct items in that list (there are multiple columns, so sometimes columns will have multiple values, where one won't):
List<string> column1List = returnDataTable.AsEnumerable().Select(x => x["Column1"].ToString()).ToList();
var distinctColumn1 = (from distinct1 in column1List select distinct1).Distinct();
The above works, but is an extra line. Since the distinct is an option on the list, I did try:
List<string> column1List = (returnDataTable.AsEnumerable().Select(x => x["Column1"].ToString()).ToList()).Distinct();
However, that errors, so it appears that distinct can't be called on a list being converted from a DataTable (?).
Just curious if it's possible to convert a DataTable into a string list and only get the distinct values in one line. May not be possible.

Distinct returns IEnumerable<TSource> in your case it returns IEnumerable<String> and you are trying to get the List<String> in the output.
You need to change the code from
List<string> column1List = (returnDataTable.AsEnumerable().Select(x => x["Column1"].ToString()).ToList()).Distinct();
List<string> column1List = (returnDataTable.AsEnumerable().Select(x => x["Column1"].ToString()).Distinct().ToList();

Using System.Linq you can use something like this
my_enumerable.GroupBy(x => x.Column1).Select(x => x.First).ToList()

Related

LINQ query to find items in a list containing substring elements from a second list

I am trying to list all elements from the first list where it contains a substring equal to all elements from the second list
First list:
C:\Folder\Files_01026666.pdf
C:\Folder\Files_01027777.pdf
C:\Folder\Files_01028888.pdf
C:\Folder\Files_01029999.pdf
Second list:
01027777
01028888
List result should be:
C:\Folder\Files_01027777.pdf
C:\Folder\Files_01028888.pdf
the closer that I got was with .Intersect() but both string-element should be equals
List<string> resultList = firstList.Select(i => i.ToString()).Intersect(secondList).ToList();
List<string> resultList = firstList.Where(x => x.Contains(secondList.Select(i=>i).ToString()));
List<string> resultList = firstList.Where(x => x == secondList.Select(i=>i).ToString());
I know I can do this another way but I'd like to do it with LINQ.
I have looked at other queries but I can find a close comparison to this with Linq. Any ideas or anywhere you can point me to would be a great help.
We can use EndsWith() with Path.GetFileNameWithoutExtension(), as strings in the secondList are not entire file names.
var result = firstList
.Where(path => secondList.Any(fileName => Path.GetFileNameWithoutExtension(path).EndsWith(fileName)));
Try Online
var q = list1.Where(t=>Regex.IsMatch(t,String.Join("|",list2.ToArray()))));
Seems to work for lists of strings. Using Regex can be problem in LINQ. This won't work in Linq2SQL for example.

List Array to DataTable

I have a List of String[], which I am trying to convert to a dataset/datatable using LINQ.
I parsed the text file to list, in which the first row has 4 columns and others have data associated with columns.
Everything comes up as an array in the list.
List[10] where List [0] has string[4] items.
List<string[]> list = File.ReadLines(s)
.Select(r => r.TrimEnd('#'))
.Select(line => line.Split(';'))
.ToList();
DataTable table = new DataTable();
table.Columns.AddRange(list.First().Select(r => new DataColumn(r.Value)).ToArray());
list = list.Skip(1).ToArray().ToList();
list.ForEach(r => table.Rows.Add(r.Select(c => c.Value).Cast<object>().ToArray()));
The LINQ doesn't accept the Value property.
Can some one suggest the simple and efficient way for this implementation?
System.String doesn't have a property named Value.
If you want to create a column for each item in the first row, just give it the strings:
table.Columns.AddRange(list.First().Select(r => new DataColumn(r)).ToArray());
// You don't need ToArray() here.
list = list.Skip(1).ToList();
// Get rid of Value in this line too, and you don't need
// .Select(c => c) either -- that's a no-op so leave it out.
list.ForEach(row => table.Rows.Add(row.Cast<object>().ToArray()));
There's no Dictionary here. list.First() is an array of strings. When you call Select on an array, it just passes each item in the array to the lambda in turn.
Dictionary<TKey,TValue>.Select() passes the lambda a series of KeyValuePair<TKey, TValue>. Different class, different behavior.

Finding the list of common objects between two lists

I have list of objects of a class for example:
class MyClass
{
string id,
string name,
string lastname
}
so for example: List<MyClass> myClassList;
and also I have list of string of some ids, so for example:
List<string> myIdList;
Now I am looking for a way to have a method that accept these two as paramets and returns me a List<MyClass> of the objects that their id is the same as what we have in myIdList.
NOTE: Always the bigger list is myClassList and always myIdList is a smaller subset of that.
How can we find this intersection?
So you're looking to find all the elements in myClassList where myIdList contains the ID? That suggests:
var query = myClassList.Where(c => myIdList.Contains(c.id));
Note that if you could use a HashSet<string> instead of a List<string>, each Contains test will potentially be more efficient - certainly if your list of IDs grows large. (If the list of IDs is tiny, there may well be very little difference at all.)
It's important to consider the difference between a join and the above approach in the face of duplicate elements in either myClassList or myIdList. A join will yield every matching pair - the above will yield either 0 or 1 element per item in myClassList.
Which of those you want is up to you.
EDIT: If you're talking to a database, it would be best if you didn't use a List<T> for the entities in the first place - unless you need them for something else, it would be much more sensible to do the query in the database than fetching all the data and then performing the query locally.
That isn't strictly an intersection (unless the ids are unique), but you can simply use Contains, i.e.
var sublist = myClassList.Where(x => myIdList.Contains(x.id));
You will, however, get significantly better performance if you create a HashSet<T> first:
var hash = new HashSet<string>(myIdList);
var sublist = myClassList.Where(x => hash.Contains(x.id));
You can use a join between the two lists:
return myClassList.Join(
myIdList,
item => item.Id,
id => id,
(item, id) => item)
.ToList();
It is kind of intersection between two list so read it like i want something from one list that is present in second list. Here ToList() part executing the query simultaneouly.
var lst = myClassList.Where(x => myIdList.Contains(x.id)).ToList();
you have to use below mentioned code
var samedata=myClassList.where(p=>p.myIdList.Any(q=>q==p.id))
myClassList.Where(x => myIdList.Contains(x.id));
Try
List<MyClass> GetMatchingObjects(List<MyClass> classList, List<string> idList)
{
return classList.Where(myClass => idList.Any(x => myClass.id == x)).ToList();
}
var q = myClassList.Where(x => myIdList.Contains(x.id));

Check array for duplicates, return only items which appear more than once

I have an text document of emails such as
Google12#gmail.com,
MyUSERNAME#me.com,
ME#you.com,
ratonabat#co.co,
iamcool#asd.com,
ratonabat#co.co,
I need to check said document for duplicates and create a unique array from that (so if "ratonabat#co.co" appears 500 times in the new array he'll only appear once.)
Edit:
For an example:
username1#hotmail.com
username2#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
This is my "data" (either in an array or text document, I can handle that)
I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be
username1#hotmail.com
You can simply use Linq's Distinct extension method:
var input = new string[] { ... };
var output = input.Distinct().ToArray();
You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates.
To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
Explanation:
.GroupBy group identical strings together
.Where filter the groups by the following criteria
.Skip(1).Any() return true if there are 2 or more items in the group. This is equivalent to .Count() > 1, but it's slightly more efficient because it stops counting after it finds a second item.
.Select return a set consisting only of a single string (rather than the group)
.ToArray convert the result set to an array.
Here's another solution using a custom extension method:
public static class MyExtensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
var a = new HashSet<T>();
var b = new HashSet<T>();
foreach(var x in input)
{
if (!a.Add(x) && b.Add(x))
yield return x;
}
}
}
And then you can call this method like this:
var output = input.Duplicates().ToArray();
I haven't benchmarked this, but it should be more efficient than the previous method.
You can use the built in in .Distinct() method, by default the comparisons are case sensitive, if you want to make it case insenstive use the overload that takes a comparer in and use a case insensitive string comparer.
List<string> emailAddresses = GetListOfEmailAddresses();
string[] uniqueEmailAddresses = emailAddresses.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();
EDIT: Now I see after you made your clarification you only want to list the duplicates.
string[] duplicateAddresses = emailAddresses.GroupBy(address => address,
(key, rows) => new {Key = key, Count = rows.Count()},
StringComparer.OrdinalIgnoreCase)
.Where(row => row.Count > 1)
.Select(row => row.Key)
.ToArray();
To select emails which occur more then once..
var dupEmails=from emails in File.ReadAllText(path).Split(',').GroupBy(x=>x)
where emails.Count()>1
select emails.Key;

How to select distinct values from DB separated by comma?

I have a front end including 2 columns, Keywords1 and keywords2 in data base they goes in a single field called keywords (separated by ,). Now I have a search screen which have a Keywords as auto complete text box, now in order populate it I need to get single values from DB, so I have something like,
Keywords
A
A
A,B
B,C
C,E
D,K
Now in order to populate them as a single listItem I need something like.
Keywords
A
B
C
D
k
So that front end doesn't contains and duplicate in it. I am not much expert in SQL, One way I know is just to get the distinct values from DB with like %entered keywords% and the use LINQ to separate them by comma and then get the distinct values. But that would be a lengthy path.
Any suggestion would be highly appreciated.
Thanks in advance.
Maybe a bit late, but an alternative answer that ends up with distinct keywords:
List<string> yourKeywords= new List<string>(new string[] { "A,B,C", "C","B","B,C" });
var splitted = yourKeywords
.SelectMany(item => item.Split(','))
.Distinct();
This will not work straight against the DB though. you would have to read the DB contents into memory before doing the SelectMany, since Split has not equivalent in SQL. It would then look like
var splitted = db.Keywords
.AsEnumerable()
.SelectMany(item => item.Split(','))
.Distinct();
Getting them by using string split and Linq group by
List<string> yourKeywords= new List<string>(new string[] { "A,B,C", "C","B","B,C" });
List<string> splitted = new List<string>();
yourKeywords.ForEach(x => splitted.AddRange(x.Split(',')));
var t = splitted.GroupBy(x => x);

Categories