I was studying about linq and wondering can linq be applied and used in the scenario bellow or not;
Suppose we split a string with space as delimiter and want to add every item from the result of split into a list if item is not already in the list;
string text = "This is just a test!";
List<string> uniqueList = new List<string>();
foreach (string item in text.Split(' '))
{
if (!uniqueList.Contains(item))
{
uniqueList.Add(item);
}
}
using linq I can write (as far as I know):
var items = from item in text.Split(' ')
where !uniqueList.ContainsKey(item)
select item;
items now is a collection and I have to iterate it another time to add the items in to the uniqueList.
Is there a capability in linq to combine the second and third computations (removing the need for second iteration) or I can't get better than the first solution?
Please note that this is just an example, consider it broadly, maybe next time I want to show a dialog box for every matched item rather than adding into a list.
You can use :
string text = "This is just a test! is This aa";
var uniqueList = text.Split(' ').Distinct().ToList();
If you use method syntax, you can do your Select using a lambda expression with scope, where you can execute more than one operation:
string text = "This is just a test ! test";
var uniqueList = new List<string>();
var items = text.Split(' ').Where(s => !uniqueList.Contains(s))
.Select(s=> {
uniqueList.Add(s);
return s;
})
.ToList();
Yes, this can accomplished elegantly via Linq (and more efficiently too, because Contains was causing it to be O(n^2) -- the Distinct Linq method exists for exactly this purpose):
var uniqueList = text.Split(' ').Distinct().ToList();
Does it matter what order the elements are in the list? If not, you could use a collection that implements ISet (like HashSet):
ISet<string> uniqueList = new HashSet<string>();
foreach (string item in text.Split(' '))
{
uniqueList.Add(item);
}
This lets the collection decide if it needs to add the item or not (.Add will return true if it did). It just doesn't guarantee enumerating in the same order in which they were added. I use these a lot for "is it there?" kind of tests. Kind of like a dictionary without a value.
uniqueList.AddRange(text.Split(' ').Where(s => !uniqueList.Contains(s)));
Edit (since OP was edited indicating that adding items to a list is not the actual intent)
Linq executes queries and provides result sets. It's not about executing code using the results as parameters.
For what it's worth, if you have your results in a List<T> you can do this:
myList.ForEach(itemInList => {
// Execute multiple statements using each item in the list
});
or
myList.ForEach(itemInList => DoSomethingWithItem(itemInList));
or even shorter,
myList.ForEach(DoSomethingWithItem);
But it's just for convenience. It's really no different from a for...each loop.
Related
Resharper is suggesting to use the top example, over the bottom example. However I am under the impression that a new list of items will be created first, and thus all of the _executeFuncs will be run before the runstoredprocedure is called.
This would normally not be an issue, however exceptions are prone to occur and if my hypothesis is correct then my database will not be update despite the functions having been ran??
foreach (var result in rows.Select(row => _executeFunc(row)))
{
RunStoredProcedure(result)
}
Or
foreach(var row in rows)
{
var result = _executeFunc(row);
RunStoredProcedure(result);
}
The statements are, in this case, semantically the same as Select (and linq in general) uses deferred execution of delegates. It won't run any declared queries until the result is being materialised, and depending on how you write that query it will do it in proper sequence.
A very simple example to show that:
var list = new List<string>{"hello", "world", "example"};
Func<string, string> func = (s) => {
Console.WriteLine(s);
return s.ToUpper();
};
foreach(var item in list.Select(i => func(i)))
{
Console.WriteLine(item);
}
results in
hello
HELLO
world
WORLD
example
EXAMPLE
In your first example, _executeFunc(row) will NOT be called first for each item in rows before your foreach loop begins. LINQ will defer execution. See This answer for more details.
The order of events will be:
Evaluate the first item in rows
Call executeFunc(row) on that item
Call RunStoredProcedure(result)
Repeat with the next item in rows
Now, if your code were something like this:
foreach (var result in rows.Select(row => _executeFunc(row)).ToList())
{
RunStoredProcedure(result)
}
Then it WOULD run the LINQ .Select first for every item in rows because the .ToList() causes the collection to be enumerated.
In the top example, using Select will project the rows, by yielding them one by one.
So
foreach (var result in rows.Select(row => _executeFunc(row)))
is basically the same as
foreach(var row in rows)
Thus Select is doing something like this
for each row in source
result = _executeFunc(row)
yield result
That yield is passing each row back one by one (it's a bit more complicated than that, but this explanation should suffice for now).
If you did this instead
foreach (var result in rows.Select(row => _executeFunc(row)).ToList())
Calling ToList() will return a List of rows immediately, and that means _executeFunc() will indeed be called for every row, before you've had a chance to call RunStoredProcedure().
Thus what Resharper is suggesting is valid. To be fair, I'm sure the Jetbrains devs know what they are doing :)
Select uses deferred execution. This means that it will, in order:
take an item from rows
call _executeFunc on it
call RunStoredProcedure on the result of _executeFunc
And then it will do the same for the next item, until all the list has been processed.
The execution will be deferred meaning they will have the same exec
I am reading some data from a database table. One of the fields in the database "VendorList" returns a comma seperated list of Vendors or just one id.
Ex: "1256,553,674" or "346"
There are a couple things I need to do:
Convert this string to an int[]
Perform a "Contains" against an IEnumerable collection.
Return that collection and assign it to a property.
This code is being called inside of a .Select when creating a new object and "Vendor" is a property on that new object.
Here is my code that I am currently using:
Vendors = (m.VendorList.Contains(","))
? (from v in vendors
where m.VendorList.Split(',')
.Select(n => Convert.ToInt32(n))
.ToArray()
.Contains(v.VendorID)
select v).ToList()
: (string.IsNullOrEmpty(m.VendorList))
? null
: (from s in vendors
where s.VendorID == int.Parse(m.VendorList)
select s).ToList()
The code works but it looks very messy and it will be hard to maintain if another developer were to try and refactor this.
I am sort of new to linq, can you provide any tips to clean up this mess?
As you can see I am using two ternary operators. The first one is to detect if its a comma separated list. The second is to detect if the comma separated list even have values.
Try this. I believe it's equivalent to what you're trying to do.. correct me if I'm wrong.
You could do the following in a single line of code, but I think it's more readable (maintainable) this way.
var Vendors = new List<int>();
if (m.VendorList != null)
Vendors.AddRange(vendors.Where(v => m.VendorList
.Split(',')
.Select(y => Convert.ToInt32(y))
.Contains(v))
.Select(v => v));
Vendors = from v in vendors
let vendorList = from idString in m.Split(',')
select int.Parse(idString)
where vendorList.Contains(v.VendorID)
select v;
There is no need to check for the presence of ",".
This is a case where I'd suggest pulling part of this out of your LINQ statement:
var vendorIds = m.VendorList
.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries)
.Select(n => Convert.ToInt32(n))
.ToArray();
someObj.Vendors = vendors.Where(v => vendorIds.Contains(v.VendorID));
This is more readable. By assigning a variable to vendorIds, you indicate to future programmers what this variable means. They don't have to fully grok all your LINQ code before they can understand the general intent.
This will perform better. In your original code, you are re-parsing the entire vendor list twice for each value in vendors. This code parses it once, and reuses the data structure for all of your ID checks. (If you have large lists of vendor IDs, you can further improve performance by making vendorIds a HashSet<>.)
If your input is an empty string, the RemoveEmptyEntries part will ensure you end up with an empty list of vendor IDs, and hence no matching Vendors. If your input has only one value without commas, you'll end up with a single ID in the list.
Note that this will not behave exactly like your original code, in that it won't set the value to null if given a null or empty m.VendorList. I'm guessing that if you take time to think about it, having a null m.VendorList is not actually something you expect to happen, and it'd be better to "fail fast" if it ever did happen, rather than be left wondering why your .Vendors property ended up null. I'm also guessing that if you have an empty .Vendors property, it will be easier for consuming code to deal with correctly than if they have to check for null values.
You can try this:
string str = "356"; //"1256,553,674";
string[] arr = str.Split(',');
List<int> lst = new List<int>();
foreach (string s in arr)
{
lst.Add(Convert.ToInt32(s));
}
List will contain all numbers in your string
string str = "1256,553,674";
IEnumerable<int> array = str.Split(',').Select(n => Convert.ToInt32(n)).ToArray();
I've got a collection of items (ADO.NET Entity Framework), and need to return a subset as search results based on a couple different criteria. Unfortunately, the criteria overlap in such a way that I can't just take the collection Where the criteria are met (or drop Where the criteria are not met), since this would leave out or duplicate valid items that should be returned.
I decided I would do each check individually, and combine the results. I considered using AddRange, but that would result in duplicates in the results list (and my understanding is it would enumerate the collection every time - am I correct/mistaken here?). I realized Union does not insert duplicates, and defers enumeration until necessary (again, is this understanding correct?).
The search is written as follows:
IEnumerable<MyClass> Results = Enumerable.Empty<MyClass>();
IEnumerable<MyClass> Potential = db.MyClasses.Where(x => x.Y); //Precondition
int parsed_key;
//For each searchable value
foreach(var selected in SelectedValues1)
{
IEnumerable<MyClass> matched = Potential.Where(x => x.Value1 == selected);
Results = Results.Union(matched); //This is where the problem is
}
//Ellipsed....
foreach(var selected in SelectedValuesN) //Happens to be integer
{
if(!int.TryParse(selected, out parsed_id))
continue;
IEnumerable<MyClass> matched = Potential.Where(x => x.ValueN == parsed_id);
Results = Results.Union(matched); //This is where the problem is
}
It seems, however, that Results = Results.Union(matched) is working more like Results = matched. I've stepped through with some test data and a test search. The search asks for results where the first field is -1, 0, 1, or 3. This should return 4 results (two 0s, a 1 and a 3). The first iteration of the loops works as expected, with Results still being empty. The second iteration also works as expected, with Results containing two items. After the third iteration, however, Results contains only one item.
Have I just misunderstood how .Union works, or is there something else going on here?
Because of deferred execution, by the time you eventually consume Results, it is the union of many Where queries all of which are based on the last value of selected.
So you have
Results = Potential.Where(selected)
.Union(Potential.Where(selected))
.Union(potential.Where(selected))...
and all the selected values are the same.
You need to create a var currentSelected = selected inside your loop and pass that to the query. That way each value of selected will be captured individually and you won't have this problem.
You can do this much more simply:
Reuslts = SelectedValues.SelectMany(s => Potential.Where(x => x.Value == s));
(this may return duplicates)
Or
Results = Potential.Where(x => SelectedValues.Contains(x.Value));
As pointed out by others, your LINQ expression is a closure. This means your variable selected is captured by the LINQ expression in each iteration of your foreach-loop. The same variable is used in each iteration of the foreach, so it will end up having whatever the last value was. To get around this, you will need to declare a local variable within the foreach-loop, like so:
//For each searchable value
foreach(var selected in SelectedValues1)
{
var localSelected = selected;
Results = Results.Union(Potential.Where(x => x.Value1 == localSelected));
}
It is much shorter to just use .Contains():
Results = Results.Union(Potential.Where(x => SelectedValues1.Contains(x.Value1)));
Since you need to query multiple SelectedValues collections, you could put them all inside their own collection and iterate over that as well, although you'd need some way of matching the correct field/property on your objects.
You could possibly do this by storing your lists of selected values in a Dictionary with the name of the field/property as the key. You would use Reflection to look up the correct field and perform your check. You could then shorten the code to the following:
// Store each of your searchable lists here
Dictionary<string, IEnumerable<MyClass>> DictionaryOfSelectedValues = ...;
Type t = typeof(MyType);
// For each list of searchable values
foreach(var selectedValues in DictionaryOfSelectedValues) // Returns KeyValuePair<TKey, TValue>
{
// Try to get a property for this key
PropertyInfo prop = t.GetProperty(selectedValues.Key);
IEnumerable<MyClass> localSelected = selectedValues.Value;
if( prop != null )
{
Results = Results.Union(Potential.Where(x =>
localSelected.Contains(prop.GetValue(x, null))));
}
else // If it's not a property, check if the entry is for a field
{
FieldInfo field = t.GetField(selectedValues.Key);
if( field != null )
{
Results = Results.Union(Potential.Where(x =>
localSelected.Contains(field.GetValue(x, null))));
}
}
}
No, your use of union is absoloutely correct.
The only thing to keep in mind is it excludes duplicates as based on the equality operator. Do you have sample data?
Okay, I think you are are haveing a problem because Union uses deferred execution.
What happens if you do,
var unionResults = Results.Union(matched).ToList();
Results = unionResults;
Somehow I can't seem to get string replacement within a foreach loop in C# to work. My code is as follows :
foreach (string s in names)
{
s.Replace("pdf", "txt");
}
Am still quite new to LINQ so pardon me if this sounds amateurish ;)
You say you're after a LINQ solution... that's easy:
var replacedNames = names.Select(x => x.Replace("pdf", "txt"));
We don't know the type of names, but if you want to assign back to it you could potentially use ToArray or ToList:
// If names is a List<T>
names = names.Select(x => x.Replace("pdf", "txt")).ToList();
// If names is an array
names = names.Select(x => x.Replace("pdf", "txt")).ToArray();
You should be aware that the code that you've posted isn't using LINQ at all at the moment though...
Strings in C# are immutable (does not change), so s.Replace will return a new string. Unfortunately this means you cannot use foreach to do the update. If names is an array this should work:
for(int i = 0; i < names.Length; i++)
{
names[i] = names[i].Replace("pdf", "txt");
}
As others have mentioned you'd need to use a for loop to do this in-place. However, if you don't need the operation to be done in-place (i.e. the results can be a different collection), then you could also do it as a linq query, e.g.
var results = from name in names select name.Replace("pdf", "txt");
One thing though - it looks like you are trying to change the extension of some file names. If that's what you are trying to do then I'd recommend Path.ChangeExtension which is specifically designed for this purpose.
var results = from name in names select Path.ChangeExtension(name, "txt");
s.Replace is a function so you would like s=s.Replace().. although it's better to use StringBuilder. (see upper answer)
Why use replace? It will make the application slow. Use regex instead:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx
I have a method that performs a simplistic 'grep' across files, using an enumerable of "search strings". (Effectively, I'm doing a very naive "Find All References")
IEnumerable<string> searchStrings = GetSearchStrings();
IEnumerable<string> filesToLookIn = GetFiles();
MultiMap<string, string> references = new MultiMap<string, string>();
foreach( string fileName in filesToLookIn )
{
foreach( string line in File.ReadAllLines( fileName ) )
{
foreach( string searchString in searchStrings )
{
if( line.Contains( searchString ) )
{
references.AddIfNew( searchString, fileName );
}
}
}
}
Note: MultiMap<TKey,TValue> is roughly the same as Dictionary<TKey,List<TValue>>, just avoiding the NullReferenceExceptions you'd normally encounter.
I have been trying to put this into a more "functional" style, using chained LINQ extension methods but haven't figured it out.
One dead-end attempt:
// I get lost on how to do a loop within a loop here...
// plus, I lose track of the file name
var lines = filesToLookIn.Select( f => File.ReadAllLines( f ) ).Where( // ???
And another (hopefully preserving the file name this time):
var filesWithLines =
filesToLookIn
.Select(f => new { FileName = f, Lines = File.ReadAllLines(f) });
var matchingSearchStrings =
searchStrings
.Where(ss => filesWithLines.Any(
fwl => fwl.Lines.Any(l => l.Contains(ss))));
But I still seem to lose the information I need.
Maybe I'm just approaching this from the wrong angle? From a performance standpoint, the loops ought to perform in roughly the same order as the original example.
Any ideas of how to do this in a more compact functional representation?
How about:
var matches =
from fileName in filesToLookIn
from line in File.ReadAllLines(fileName)
from searchString in searchStrings
where line.Contains(searchString)
select new
{
FileName = fileName,
SearchString = searchString
};
foreach(var match in matches)
{
references.AddIfNew(match.SearchString, match.FileName);
}
Edit:
Conceptually, the query turns each file name into a set of lines, then cross-joins that set of lines to the set of search strings (meaning each line is paired with each search string). That set is filtered to matching lines, and the relevant information for each line is selected.
The multiple from clauses are similar to nested foreach statements. Each indicates a new iteration in the scope of the previous one. Multiple from clauses translate into the SelectMany method, which selects a sequence from each element and flattens the resulting sequences into one sequence.
All of C#'s query syntax translates to extension methods. However, the compiler does employ some tricks. One is the use of anonymous types. Whenever 2+ range variables are in the same scope, they are probably part of an anonymous type behind the scenes. This allows arbitrary amounts of scoped data to flow through extension methods like Select and Where, which have fixed numbers of arguments. See this post for further details.
Here is the extension method translation of the above query:
var matches = filesToLookIn
.SelectMany(
fileName => File.ReadAllLines(fileName),
(fileName, line) => new { fileName, line })
.SelectMany(
anon1 => searchStrings,
(anon1, searchString) => new { anon1, searchString })
.Where(anon2 => anon2.anon1.line.Contains(anon2.searchString))
.Select(anon2 => new
{
FileName = anon2.anon1.fileName,
SearchString = anon2.searchString
});
I would use the FindFile (FindFirstFileEx, FindNextFile, etc, etc) API calls to look in the file for the term that you are searching on. It will probably do it faster than you reading line-by-line.
However, if that won't work for you, you should consider creating an IEnumerable<String> implementation which will read the lines from the file and yield them as they are read (instead of reading them all into an array). Then, you can query on each string, and only get the next one if it is needed.
This should save you a lot of time.
Note that in .NET 4.0, a lot of the IO apis that return lines from files (or search files) will return IEnumerable implementations which do exactly what is mentioned above, in that it will search directories/files and yield them when appropriate instead of front-loading all the results.