Get certain item from each Tuple from List - c#

What is the correct way to go about creating a list of, say, the first item of each Tuple in a List of Tuples?
If I have a List<Tuple<string,string>>, how would I get a List<string> of the first string in each Tuple?

A little Linq will do the trick:
var myStringList = myTupleList.Select(t=>t.Item1).ToList();
As an explanation, since Tim posted pretty much the same answer, Select() creates a 1:1 "projection"; it takes each input element of the Enumerable, and for each of them it evaluates the lambda expression and returns the result as an element of a new Enumerable having the same number of elements. ToList() will then spin through the Enumerable produced by Select(), and add each element one at a time to a new List<T> instance.
Tim has a good point on the memory-efficiency; ToList() will create a list and add the elements one at a time, which will cause the List to keep resizing its underlying array, doubling it each time to ensure it has the proper capacity. For a big list, that could cause OutOfMemoryExceptions, and it will cause the CLR to allocate more memory than necessary to the List unless the number of elements happens to be a power of 2.

List<string> list = tuples.Select(t => t.Item1).ToList();
or, potentially less memory expensive:
List<string> list = new List<String>(tuples.Count);
list.AddRange(tuples.Select(t => t.Item1));
because it avoids the doubling algorithm of List.Add in ToList.

If you have a List<Tuple<string, string>> listoftuples, you can use the List's implementation of the Select method to take the first string from each Tuple.
It would look like this:
List<string> firstelements = listoftuples.Select(t => t.Item1).ToList();

Generialised Variant:
for selecting a particular item where the collection's tuple's length is unknown i.e 2,3,4 ...:
static IEnumerable TupleListSelectQuery<T>(IEnumerable<T> lst, int index) where T : IStructuralEquatable, IStructuralComparable, IComparable
{
return lst.Select(t => typeof(T).GetProperty("Item" + Convert.ToString(itemNumber)).GetValue(t)).ToList();
}
where index's value corresponds to the way tuples are enumerated i.e 1,2,3 ... (not 0,1,2...).

Related

C# best approach to check a value is exist in a List Several times

I have list of integers and I want to know a value exist in my list Several times.
What is the best approach to do this?
catching data using LookUp or dictionary or HashMap or ...?
Example:
List<int> samples = {5,4,6,2,1}
// if(2 exist in samples) do something ...
// if(3 exist in samples) do something ...
// if(5 exist in samples) do something ...
// if(8 exist in samples) do something ...
// if(13 exist in samples) do something ...
// if ....
You can store them in HashSet and check whether value exists with O(1):
var unique = new HashSet<int>(){ 5,4,6,2,1};
var hasValue = unique.Contains(1);
and then just check:
if (unique.Contains(2))
// do something ...
In addition, HashSet<T> prevents storing duplicates, so it is extremely fast.
UPDATE:
List<T> will search with O(N). Why? Because Big O Notation should consider the worst case of time complexity. Let's imagine we have the following list:
var numbers = new List<int> { 5, 4, 6, 2, 1 };
and we want to find number 1. So Contains() method of List<T> has to iterate the whole array until it finds number 1. So we have O(N).
LinkedList<T> will search with O(N). Why? The reason is the same like in List<T>. However, LinkedList<T> does not have an array under the hood, it has a class which has a pointer to next element and next element has pointer to the next element and so on. We have to iterate all elements to find an item.
HashSet<T> will search with O(1). Why? The reason is HashSet<T> under the hood will not iterate through array. It will run internal method InternalGetHashCode which returns position of number in array. You can see the source code here.
In addition, there is a very nice answer about How can hashset.contains be O(1) with this implementation?
It depends on what you mean by best. If you are using a list, you could use the Exists() method.
if(samples.Exists(value => value == valueToCheck)){
//Do work
}

Removing list item from another list

I have a list with some elements and I want to remove elements from another list. An item should be removed if its value Contains (not equals) the value from another list.
One of the ways is to do this:
var MyList = new List<string> { ... }
var ToRemove = new List<string> { ... }
MyList.RemoveAll(_ => ToRemove.Any(_.Contains));
It works...
but, I have a LOT of lists (>1 million) and since the ToRemove can be sorted, it would make sense to use that in order to speed the process.
It's easy to make a loop that does it, but is there a way to do this with the sorted collections?
Update:
On 20k iterations on a text with our forbidden list, I get this:
Forbidden list as List -> 00:00:07.1993364
Forbidden list as HashSet -> 00:00:07.9749997
It's consistent after multiple runs, so the hashset is slower
Since this is a removal of strings that contain strings that are in another list, a HashSet wouldn't be much help. Actually not much would be unless you were looking for exact full matches or maintain an index of all substrings (expensive and AFIK only SQL Server does this semi-efficiently outside the BigData realm).
If all you cared about was if it starts with items in 'ToRemove', sorting could help. Sort the 'MyList' and foreach string in 'ToRemove' custom binary search to find any string starting with that string and RemoveAt index until not starts with, then decrement index backwards removing until not starts with.
Well, sorting ToRemove may be beneficial because of binary search O(log n) complexity (you will need to rewrite _ => ToRemove.Any(_.Contains)).
But, instead, using a HashSet<string> instead of List<string> for ToRemove will be much faster, because finding an element in a hashset (using Contains) is O(1) operation.
Also, using LinkedList<string> for MyList can potentially be beneficial, since removing an item from a linked list is generally faster than removing from an array based list because of array size adjusting.

Accessing the item at a specified index in a 'SortedSet'

How can I access the item at a specified index (position) in a SortedSet?
Unlike SortedList, SortedSet does not offer an Item property.
(Also, unlike SortedList, SortedSet enforces each of its members to be unique. That is, a SortedSet is guaranteed not to contain duplicates.)
That's because a SortedSet has the semantics of a set and is not a List-like construct. Consequently, it does not implement IList (which give you the ability to address items by index via the Item property).
As noted by #DavidRR, you could use the Linq extension method Enumerable.ElementAt(). However, since the backing store of a SortedSet is a red-black tree -- a height-balanced binary tree, accessing an element by index via ElementAt() involves a tree walk — O(N), worst case and O(N/2) on the average, to get to the desired item. Pretty much the same as traversing a singly-linked list to access the Nth item.
So...for large sets, performance is likely to be poor.
If what you want is a unique collection that offers array-like semantics, why not roll your own IList<T> implementation that would enforce uniqueness, just as SorteSet<T> does (ignoring adds of elements that already exist in the colleciton). Use a List<T> as the backing store. Maintain it in sorted sequence so you can use a binary search to determine if the element being added already exists. Or, simply subtype List<T> and override the appropriate methods to get the semantics you want.
EDIT: An ordinary (unordered) set such as HashSet<T> manages its elements in no particular order. So, the index of a particular element in an unordered set does not carry any particular meaning.
In contrast however, it makes semantic sense to request an element by its position (index) in a SortedSet<T>. Why bother with the overhead of an ordered collection, otherwise?
That said, for a small SortedSet<T> where performance is not a concern (see example below), the Linq extension method Enumerable.ElementAt() provides a convenient means of retrieving an item by its index. However, for a large SortedSet<T> where the runtime performance of retrieving an element is paramount, consider implementing a custom collection as #Nicholas Carey outlines in his answer.
Original Answer:
You can access an item of interest by its index (position) from your SortedSet via the Enumerable.ElementAt<TSource> method:
var item = mySortedSet.ElementAt(index);
Demonstration:
using System;
using System.Collections.Generic;
using System.Linq;
class SortedSetDemo
{
static void Main(string[] args)
{
var words = new string[]
{"the", "quick", "brown", "fox", "jumps",
"over", "the", "lazy", "dog"};
// Create a sorted set.
var wordSet = new SortedSet<string>();
foreach (string word in words)
{
wordSet.Add(word);
}
// List the members of the sorted set.
Console.WriteLine("Set items in sorted order:");
int i = 0;
foreach (string word in wordSet)
{
Console.WriteLine("{0}. {1}", i++, word);
}
// Access an item at a specified index (position).
int index = 6;
var member = wordSet.ElementAt(index);
Console.WriteLine("\nThe item at index {0} is '{1}'!", index,
member);
}
}
Expected Output:
The set items in sorted order is:
0. brown
1. dog
2. fox
3. jumps
4. lazy
5. over
6. quick
7. the
The item at position 6 is 'quick'!
If you intend to load the data into a set, then access the set, use HashSet and ImmutableSortedSet instead of SortedSet.
Load your data into the HashSet, then call ToImmutableSortedSet() to convert to an immutable sorted set that can be indexed.
you can use MCollections which do insert, edit, remove, search and index lookup In O(Lg(N)) time, it uses BST and stores count of sub-nodes in each node to access the item at a specified index

finding all lines in a list that contain x or y?

can I do this without looping through the whole list?
List<string> responseLines = new List<string>();
the list is then filled with around 300 lines of text.
next I want to search the list and create a second list of all lines that either start with "abc" or contain "xyz".
I know I can do a for each but is there a better / faster way?
You could use LINQ. This is no different performance-wise to using foreach -- that's pretty much what it does behind the scenes -- but you might prefer the syntax:
var query = responseLines.Where(s => s.StartsWith("abc") || s.Contains("xyz"))
.ToList();
(If you're happy dealing with an IEnumerable<string> rather than List<string> then you can omit the final ToList call.)
var newList = (from line in responseLines
where line.StartsWith("abc") || line.Contains("xyz")
select line).ToList();
Try this:
List<string> responseLines = new List<string>();
List<string> myLines = responseLines.Where(line => line.StartsWith("abc", StringComparison.InvariantCultureIgnoreCase) || line.Contains("xyz")).ToList();
The StartsWith and Contains shortcut - the Contains will only evaluate if the StartsWith is not satisfied. This still iterates the whole list, but of course there is no way to avoid that if you want to check the whole list, but it saves you from doing typing a foreach.
Use LINQ:
List<string> list = responseLines.Where(x => x.StartsWith("abc") || x.Contains("xyz")).ToList();
Unless you need all the text for some reason, it would be quicker to inspect each line at the time when you were generating the List and discard the ones that don't match without ever adding them.
This depends on how the List is loaded as well - that code is not shown. This would be effective if you were reading from a text file since then you could just use your LINQ query to operate directly on the input data using File.ReadLines as the source instead of the final List<string>.
var query = File.ReadLines("input.txt").
Where(s => s.StartsWith("abc") || s.Contains("xyz"))
.ToList();
LINQ works well as far as offering you improved syntax for this sort of thing (See LukeH's answer for a good example), but it isn't any faster than iterating over it by hand.
If you need to do this operation often, you might want to come up with some kind of indexed data structure that watches for all "abc" or "xyz" strings as they come into the list, and can thereby use a faster algorithm for serving them up when asked, rather than iterating through the whole list.
If you don't have to do it often, it's probably a "premature optimization."
Quite simply, there is no possible algorithm that can guarantee you will never have to iterate through every item in the list. However, it is possible to improve the average number of items you need to iterate through - sorting the list before you begin your search. By doing so, the only times you would have to iterate through the entire list would be when it is filled with only "abc" and "xyz."
Assuming that it's not practical for you to have a pre-sorted list by the time you need to search through it, then the only way to improve the speed of your search would be to use a different data structure than a list - for example, a binary search tree.

Please explain System.Linq.Enumerable.Where(Func<T, int, bool> predicate)

I can't make any sense of the MSDN documentation for this overload of the Where method that accepts a predicate that has two arguments where the int, supposedly, represents the index of the source element, whatever that means (I thought an enumerable was a sequence and you couldn't see further than the next item, much less do any indexing on it).
Can someone please explain how to use this overload and specifically what that int in the Func is for and how it is used?
The int parameter represents the index of the current item within the current iteration. Each time you call one of the LINQ extension methods, you aren't in theory guaranteed to get the items returned in the same order, but you know they're all be returned once each and thus can be assigned indices. (Well, you are guaranteed if you know the query object is a List<T> or such, but not in general.)
Example:
var result1 = myEnumerable.Where((item, index) => index < 4);
var result2 = myEnumerable.Take(4);
// result1 and result2 are equivalent.
You can't index an IEnumerable<T> in the same way you can an array, but you might be able to use the index to filter the list in some way, or possibly to index some data in another collection which will be used in the condition.
EDIT: As an example, to skip every other element you could use:
var results = sequence.Where((item, idx) => idx % 2 == 0);

Categories