C# List<string> contains partial match from another List<string> - c#

I have these lists:
var list1 = new List<string>
{
"BOM_Add",
"BOM_Edit",
"BOM_Delete",
"Paper_Add",
"Paper_Edit",
"Paper_Delete"
};
var list2 = new List<string> {"BOM", "Paper_Add"};
I want to create a third list of the common items based on a partial match. So, the third list should contain:
"BOM_Add",
"BOM_Edit",
"BOM_Delete",
"Paper_Add"
because the second list contains "BOM".
If the second list contained "_Edit", then I would expect the third list to have
"BOM_Edit",
"Paper_Edit"
I know how to do this with .Intersect() if I spell out each item (e.g. "BOM_Add") in the second list, but I need it to be more flexible than that.
Can this be done without iterating through each item on the first list? These lists may get very long and I would prefer to avoid that if I can.

You can use LINQ
var result = list1.Where(r => list2.Any(t => r.Contains(t)))
.ToList();
For output:
foreach (var item in result)
{
Console.WriteLine(item);
}
Output would be:
BOM_Add
BOM_Edit
BOM_Delete
Paper_Add
Can this be done without iterating through each item on the first
list?
You have to iterate, either through a loop or using LINQ (which internally iterates as well)

Can this be done without iterating through each item on the first list?
No, if you want to find all of the items that contain one of the items that you have. There is no way of building an index, or any sort of structure that can rule out large sections of items without checking each one. The only option is to compare every single item in the first list with every single item in the other list, doing your Contains check.
If you only needed to do a StartsWith instead of a Contains, then you could sort your list, do a BinarySearch to find the item nearest to the item that you're searching for, which would allow you to easily find all of the items that start with a particular string while only actually needing to check O(log(n) + m) items (where n is the size of the list an m is the average number of matches). You could do the same thing with an EndsWith too, if you just sorted items based on the reverse of the string, but there's no way to sort an items such that a Contains check does this.

Related

Link to find collection items from second collection very slow

Please let me know if this is already answered somewhere; I can't find it.
In memory, I have a collection of objects in firstList and related objects in secondList. I want to find all items from firstList where the id's match the secondList item's RelatedId. So it's fairly straightforward:
var items = firstList.Where(item => secondList.Any(secondItem => item.Id == secondItem.RelatedId));
But when both first and second lists get even moderately larger, this call is extremely expensive and takes many seconds to complete. Is there a way to break this up, or redesign it into multiple queries to make it more efficient?
The reason this code is so inefficient is that for every element of the first list, it has to iterate over (in the worst case) the entirety of the second list to look for an item with matching id.
A more efficient way to do this using LINQ would be using the Join method as follows:
var items = firstList.Join(secondList, item => item.Id, secondItem => secondItem.RelatedId, (item, _) => item);
If the second collection may contain duplicate IDs, you will additionally have to run Distinct() (possibly with some changes depending on the equality semantics for the members of the first list) on the result to maintain the semantics of your original code.
This code resulted in a roughly 100x speedup for me with a test using two lists of 10000 elements each.
If you're running this operation often and one of the collections does not change, you could consider caching the Ids or RelatedIds in a HashSet instead.

Multiple List Iteration

I have (2) lists we shall call List A & List B which may be at different lengths. Each item has a contained parameter associated to it as an identifier, in this case we can call Comment. I need to iterate:
foreach (item a in A)
{
foreach (item b in B)
{
if (b.Comment == a.Comment)
{
send to a void to process: void(b,a);
Essentially, I need to process each item from one list to the other if they have the same identifier. Would a zip benefit in this case? From what I've laid out, logically I would like to loop for each item in List A, check each item in List B that has the same identifier "Comment", if yes then send the current a & b value into a function to process and continue to loop the rest of List A.
Rethink the collection type you are using.
If you had one of your lists as a dictionary, where the key is the comment, you would only have to iterate one of the lists, and then check if your dictionary contains that.
If comment is string, and item is a type.
Dictionary<string, item>
Using LINQ it's easy to generate a dictionary: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.todictionary
Of course you can't have multiple items with the same key in the dictionary. If there is no unique constraint on comment, your dictionary could look like this
Dictionary<string, List<item>>
I won't go into more detail with the code without full context, but I think you get the point.
After some research I've went the Cartesian Product route obtaining each pair's respective [0] and [1] elements
var Fruits =
from apple in ListA
from Banana in ListB
select new[] {apple, Banana};
then went on to break down sending Smoothie(Fruits[0],Fruits[1])

How to filter a list by another list

I have two lists :
List<myObject> mainList;
And
List<myObject> blackList;
I'm trying to make a new list by a condition and another condition that the elements should not be in the blacklist.
Here is my attempt :
List<myObject> newList = mainList.Where(x => x.Id == 5 && !blackList.Contains(x)).ToList();
This newList is generated inside a loop, in the first round of the loop, blackList is empty and it works, in the second round blackList contains about 200k elements. And when the above line works, it doesn't move next, it stays there for minutes. How can I do filtering more efficiently so that I wouldn't get elements which are in the blackList? Thanks.
The problem you're facing is due to the way List<T> implements Contains - it searches linearly through which is quite slow and inefficient for long lists.
To get better performance you could use a better structure for the blacklist - one with a faster/better implementation for long lists like a HashSet<T>
var blackList = new HashSet<myObject>(theBlackList);

Need help joining two lists of objects in C#

With my two lists of objects, I want to keep the total set of unique items based on a string key, where any collisions come from the first list, and any misses come from the second. Stated differently, I want to ignore any items in the first list that are not in the second list, but I want to keep all items that do exist in the second list as well as any remaining items from the second list.
What's the best way to do this?
Edit: This problem is more subtle than a simple union. A union will join the distinct items from two lists. In the case of a collision it takes the item from the outer list.
In my case, I have some items in List1 that I don't want to keep because they don't exist in List2, while I do want to keep all items from list 2.
Is there a cleaner / shorter way to do the below?
var remaining = allowedItems.Except(recentItems)
var allowedRecentItems = recentItems.Intersect(allowedItems)
var result = allowedRecentItems.Concat(remaining);
Try this:
var resultlist = list1.Union(list2);
var list1 = new List<string>{"A", "B", "C"};
var list2 = new List<string>{"B", "C", "D"};
var list = list1.Union(list2);
If I understood it correctly - You should keep the second list only. By that your both conditions are fulfilled
I want to ignore any items in the first list that are not in the second list
off-course if you keep the second list only then all the items in the first list which are not present in the second list will automatically be ignored.
I want to keep all items that do exist in the second list as well as any remaining items from the second list
On the other hand, the items present in both of the lists will automatically be selected including those which are only present in the second list.
If that's not what you want then you should check List.Distinct(), List.Except() and List.Union() extension methods.
Using the second list just do the trick. Directly translating your your requirements produces:
list1.Intersect(list2).Union(list2) which results list2

Select an element by index from a .NET HashSet

At the moment I am using a custom class derived from HashSet. There's a point in the code when I select items under certain condition:
var c = clusters.Where(x => x.Label != null && x.Label.Equals(someLabel));
It works fine and I get those elements. But is there a way that I could receive an index of that element within the collection to use with ElementAt method, instead of whole objects?
It would look more or less like this:
var c = select element index in collection under certain condition;
int index = c.ElementAt(0); //get first index
clusters.ElementAt(index).RunObjectMthod();
Is manually iterating over the whole collection a better way? I need to add that it's in a bigger loop, so this Where clause is performed multiple times for different someLabel strings.
Edit
What I need this for? clusters is a set of clusters of some documents collection. Documents are grouped into clusters by topics similarity. So one of the last step of the algorithm is to discover label for each cluster. But algorithm is not perfect and sometimes it makes two or more clusters with the same label. What I want to do is simply merge those cluster into big one.
Sets don't generally have indexes. If position is important to you, you should be using a List<T> instead of (or possibly as well as) a set.
Now SortedSet<T> in .NET 4 is slightly different, in that it maintains a sorted value order. However, it still doesn't implement IList<T>, so access by index with ElementAt is going to be slow.
If you could give more details about why you want this functionality, it would help. Your use case isn't really clear at the moment.
In the case where you hold elements in HashSet and sometimes you need to get elements by index, consider using extension method ToList() in such situations. So you use features of HashSet and then you take advantage of indexes.
HashSet<T> hashset = new HashSet<T>();
//the special situation where we need index way of getting elements
List<T> list = hashset.ToList();
//doing our special job, for example mapping the elements to EF entities collection (that was my case)
//we can still operate on hashset for example when we still want to keep uniqueness through the elements
There's no such thing as an index with a hash set. One of the ways that hash sets gain efficincy in some cases is by not having to maintain them.
I also don't see what the advantage is here. If you were to obtain the index, and then use it this would be less efficient than just obtaining the element (obtaining the index would be equally efficient, and then you've an extra operation).
If you want to do several operations on the same object, just hold onto that object.
If you want to do something on several objects, do so on the basis of iterating through them (normal foreach or doing foreach on the results of a Where() etc.). If you want to do something on several objects, and then do something else on those several same objects, and you have to do it in such batches, rather than doing all the operations in the same foreach then store the results of the Where() in a List<T>.
why don't use a dictionary?
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < 10; i++)
{
dic.Add("value " + i, dic.Count + 1);
}
string find = "value 3";
int position = dic[find];
Console.WriteLine("the position of " + find + " is " + position);
example

Categories