How to make a C# 'grep' more Functional using LINQ?

How to make a C# 'grep' more Functional using LINQ? - c#

I have a method that performs a simplistic 'grep' across files, using an enumerable of "search strings". (Effectively, I'm doing a very naive "Find All References")
IEnumerable<string> searchStrings = GetSearchStrings();
IEnumerable<string> filesToLookIn = GetFiles();
MultiMap<string, string> references = new MultiMap<string, string>();
foreach( string fileName in filesToLookIn )
{
foreach( string line in File.ReadAllLines( fileName ) )
{
foreach( string searchString in searchStrings )
{
if( line.Contains( searchString ) )
{
references.AddIfNew( searchString, fileName );
}
}
}
}
Note: MultiMap<TKey,TValue> is roughly the same as Dictionary<TKey,List<TValue>>, just avoiding the NullReferenceExceptions you'd normally encounter.
I have been trying to put this into a more "functional" style, using chained LINQ extension methods but haven't figured it out.
One dead-end attempt:
// I get lost on how to do a loop within a loop here...
// plus, I lose track of the file name
var lines = filesToLookIn.Select( f => File.ReadAllLines( f ) ).Where( // ???
And another (hopefully preserving the file name this time):
var filesWithLines =
filesToLookIn
.Select(f => new { FileName = f, Lines = File.ReadAllLines(f) });
var matchingSearchStrings =
searchStrings
.Where(ss => filesWithLines.Any(
fwl => fwl.Lines.Any(l => l.Contains(ss))));
But I still seem to lose the information I need.
Maybe I'm just approaching this from the wrong angle? From a performance standpoint, the loops ought to perform in roughly the same order as the original example.
Any ideas of how to do this in a more compact functional representation?

How about:
var matches =
from fileName in filesToLookIn
from line in File.ReadAllLines(fileName)
from searchString in searchStrings
where line.Contains(searchString)
select new
{
FileName = fileName,
SearchString = searchString
};
foreach(var match in matches)
{
references.AddIfNew(match.SearchString, match.FileName);
}
Edit:
Conceptually, the query turns each file name into a set of lines, then cross-joins that set of lines to the set of search strings (meaning each line is paired with each search string). That set is filtered to matching lines, and the relevant information for each line is selected.
The multiple from clauses are similar to nested foreach statements. Each indicates a new iteration in the scope of the previous one. Multiple from clauses translate into the SelectMany method, which selects a sequence from each element and flattens the resulting sequences into one sequence.
All of C#'s query syntax translates to extension methods. However, the compiler does employ some tricks. One is the use of anonymous types. Whenever 2+ range variables are in the same scope, they are probably part of an anonymous type behind the scenes. This allows arbitrary amounts of scoped data to flow through extension methods like Select and Where, which have fixed numbers of arguments. See this post for further details.
Here is the extension method translation of the above query:
var matches = filesToLookIn
.SelectMany(
fileName => File.ReadAllLines(fileName),
(fileName, line) => new { fileName, line })
.SelectMany(
anon1 => searchStrings,
(anon1, searchString) => new { anon1, searchString })
.Where(anon2 => anon2.anon1.line.Contains(anon2.searchString))
.Select(anon2 => new
{
FileName = anon2.anon1.fileName,
SearchString = anon2.searchString
});

I would use the FindFile (FindFirstFileEx, FindNextFile, etc, etc) API calls to look in the file for the term that you are searching on. It will probably do it faster than you reading line-by-line.
However, if that won't work for you, you should consider creating an IEnumerable<String> implementation which will read the lines from the file and yield them as they are read (instead of reading them all into an array). Then, you can query on each string, and only get the next one if it is needed.
This should save you a lot of time.
Note that in .NET 4.0, a lot of the IO apis that return lines from files (or search files) will return IEnumerable implementations which do exactly what is mentioned above, in that it will search directories/files and yield them when appropriate instead of front-loading all the results.

Related

using lambda, how to apply an existing function to all elements of a list?

Excuse me, a quick question:
I have a list of strings, string are full paths of some files. I would like to get only the filename without the path neither the extension for each string (and to understand lambda more)
Based on the lambda expression in How to bind a List to a DataGridView control? I am trying something like the below:
FilesName = Directory.GetFiles(fbd.SelectedPath).ToList(); // full path
List<string> FilesNameWithoutPath = AllVideosFileNames.ForEach(x => Path.GetFileNameWithoutExtension(x)); // I want only the filename
AllVideosGrid.DataSource = FilesNameWithoutPath.ConvertAll(x => new { Value = x }); // to then bind it with the grid
The error is:
Can not convert void() to List of string
So I want to apply Path.GetFileNameWithoutExtension() for each string in FilesName. And would appreciate any extra description on how Lamba works in this case.

ForEach will execute some code on each item in your list, but will not return anything (see: List<T>.ForEach Method). What you want to do is Select the result of the method (see: Enumerable.Select<TSource, TResult> Method), which would look something like:
List<string> FilesNameWithoutPath = AllVideosFileNames
.Select(x => Path.GetFileNameWithoutExtension(x))
.ToList();

You are using List<T>.ForEach method which takes each element in the list and applies the given function to them, but it doesn't return anything. So what you are doing basically is getting each file name and throwing them away.
What you need is a Select instead of ForEach:
var fileNamesWithoutPath = AllVideosFileNames
.Select(x => Path.GetFileNameWithoutExtension(x))
.ToList();
AllVideosGrid.DataSource = fileNamesWithoutPath;
This will project each item, apply Path.GetFileNameWithoutExtension to them and return the result, then you put that result into a list by ToList.
Note that you can also shorten the Select using a method group without declaring a lambda variable:
.Select(Path.GetFileNameWithoutExtension)

Using linq to execute an expression

I was studying about linq and wondering can linq be applied and used in the scenario bellow or not;
Suppose we split a string with space as delimiter and want to add every item from the result of split into a list if item is not already in the list;
string text = "This is just a test!";
List<string> uniqueList = new List<string>();
foreach (string item in text.Split(' '))
{
if (!uniqueList.Contains(item))
{
uniqueList.Add(item);
}
}
using linq I can write (as far as I know):
var items = from item in text.Split(' ')
where !uniqueList.ContainsKey(item)
select item;
items now is a collection and I have to iterate it another time to add the items in to the uniqueList.
Is there a capability in linq to combine the second and third computations (removing the need for second iteration) or I can't get better than the first solution?
Please note that this is just an example, consider it broadly, maybe next time I want to show a dialog box for every matched item rather than adding into a list.

You can use :
string text = "This is just a test! is This aa";
var uniqueList = text.Split(' ').Distinct().ToList();

If you use method syntax, you can do your Select using a lambda expression with scope, where you can execute more than one operation:
string text = "This is just a test ! test";
var uniqueList = new List<string>();
var items = text.Split(' ').Where(s => !uniqueList.Contains(s))
.Select(s=> {
uniqueList.Add(s);
return s;
})
.ToList();

Yes, this can accomplished elegantly via Linq (and more efficiently too, because Contains was causing it to be O(n^2) -- the Distinct Linq method exists for exactly this purpose):
var uniqueList = text.Split(' ').Distinct().ToList();

Does it matter what order the elements are in the list? If not, you could use a collection that implements ISet (like HashSet):
ISet<string> uniqueList = new HashSet<string>();
foreach (string item in text.Split(' '))
{
uniqueList.Add(item);
}
This lets the collection decide if it needs to add the item or not (.Add will return true if it did). It just doesn't guarantee enumerating in the same order in which they were added. I use these a lot for "is it there?" kind of tests. Kind of like a dictionary without a value.

uniqueList.AddRange(text.Split(' ').Where(s => !uniqueList.Contains(s)));
Edit (since OP was edited indicating that adding items to a list is not the actual intent)
Linq executes queries and provides result sets. It's not about executing code using the results as parameters.
For what it's worth, if you have your results in a List<T> you can do this:
myList.ForEach(itemInList => {
// Execute multiple statements using each item in the list
});
or
myList.ForEach(itemInList => DoSomethingWithItem(itemInList));
or even shorter,
myList.ForEach(DoSomethingWithItem);
But it's just for convenience. It's really no different from a for...each loop.

Filtering a string array when elements represented (partially) in an other array of strings

I'm reconstructing the following statement.
IEnumerable<String>
input = ...,
filter = ...,
output = input.Where(filter.Contains(element));
For now, it works as supposed to but the words matched this way need to be exact. In the language of my customer there are a lot of conjugations and a requirement is posted to use joker characters ("dog" should match "dog", "doggy" and "dogmatic").
I've suggested the following change. Now sure, though, if it can be regarded as smooth for the eyes. Can someone suggest an improvement or is it as good as it gets?
IEnumerable<String>
input = ...,
filter = ...,
output = input.Where(word => filter.Any(head => word.StartsWith(head)))
I was considering IEqualityComparer implementation but that's only for objects of the same type, while my condition is on String contra IEnumerable.

Generally, what you already have as your LINQ statement is fine and I don't see a big issue with about it being "smooth on the eyes" (LINQ calls can often get even more out of hand than this).
If you want, you could move the filter.Any(head => word.Startswith(head)) into a separate Func<string, bool> delegate and pass that in:
Func<string, bool> myConstraint = word => filter.Any(head => word.StartsWith(head));
output = input.Where(myConstraint);
You can also move the constraint construction to a separate method which may open the door to some flexibility with your client if matching rules change or have to cover even more complicated cases:
private Func<string, bool> BuildConstraints()
{
filter = ...,
if (CheckEqualityOnly)
return word => filter.Contains(word);
else
return word => filter.Any(head => word.StartsWith(head));
}
output = input.Where(BuildConstraints());

Find any entities that contain any one string from a list

I can't quite get my head around this one for some reason.
Say we have a class Foo
public class Foo
{
public string Name {get;set;}
}
And we have a generic list of them. I want to search through the generic list and pick out those that have a Name that contains any from a list of strings.
So something like
var source = GetListOfFoos();//assume a collection of Foo objects
var keywords = GetListOfKeyWords();//assume list/array of strings
var temp = new List<Foo>();
foreach(var keyword in keywords)
{
temp.AddRange(source.Where(x => x.Name.Contains(keyword));
}
This issue here being a) the loop (doesn't feel optimal to me) and b) each object might appear more than once (if the name was 'Rob StackOverflow' and there was a keyword 'Rob' and keyword 'Stackoverflow').
I guess I could call Distinct() but again, it just doesn't feel optimal.
I think I'm approaching this incorrectly - what am I doing wrong?

I want to search through the generic list and pick out those that have
a Name that contains any from a list of strings.
Sounds rather easy:
var query = source.Where(e => keywords.Any(k => e.Name.Contains(k)));
Add ToList() to get results as a List<Foo>:
var temp = query.ToList();

Put the keywords into a HashSet for fast lookup, so that you're not doing a N2 loop.
HashSet<string> keywords = new HashSet<string>(GetListOfKeyWords(), StringComparer.InvariantCultureIgnoreCase);
var query = source.Where(x => keywords.Contains(x.Name));
EDIT: Actually, I re-read the question, and was wrong. This will only match the entire keyword, not see if the Name contains the keyword. Working on a better fix.
I like MarcinJuraszek's answer, but I would also assume you want case-insensitive matching of the keywords, so I'd try something like this:
var query = source.Where(f => keywords.Any(k => f.Name.IndexOf(k, StringComparison.OrdinalIgnoreCase) >= 0));

Concatenate collection of XML tags to string with LINQ

I'm stuck with using a web service I have no control over and am trying to parse the XML returned by that service into a standard object.
A portion of the XML structure looks like this
<NO>
<L>Some text here </L>
<L>Some additional text here </L>
<L>Still more text here </L>
</NO>
In the end, I want to end up with one String property that will look like "Some text here Some additional text here Still more text here "
What I have for an initial pass is what follows. I think I'm on the right track, but not quite there yet:
XElement source = \\Output from the Webservice
List<IndexEntry> result;
result = (from indexentry in source.Elements(entryLevel)
select new IndexEntry()
{
EtiologyCode = indexentry.Element("IE") == null ? null : indexentry.Element("IE").Value,
//some code to set other properties in this object
Note = (from l in indexentry.Elements("NO").Descendants
select l.value) //This is where I stop
// and don't know where to go
}
I know that I could add a ToList() operator at the end of that query to return the collection. Is there an opertaor or technique that would allow me to inline the concatentation of that collection to a single string?
Feel free to ask for more info if this isn't clear.
Thanks.

LINQ to XML is indeed the way here:
// Note: in earlier versions of .NET, string.Join only accepts
// arrays. In more modern versions, it accepts sequences.
var text = string.Join(" ", topElement.Elements("L").Select(x => x.Value));
EDIT: Based on the comment, it looks like you just need a single-expression way of representing this. That's easy, if somewhat ugly:
result = (from indexentry in source.Elements(entryLevel)
select new IndexEntry
{
EtiologyCode = indexentry.Element("IE") == null
? null
: indexentry.Element("IE").Value,
//some code to set other properties in this object
Note = string.Join(" ", indexentry.Elements("NO")
.Descendants()
.Select(x => x.Value))
};
Another alternative is to extract it into a separate extension method (it has to be in a top-level static class):
public static string ConcatenateTextNodes(this IEnumerable<XElement> elements) =>
string.Join(" ", elements.Select(x => x.Value));
then change your code to:
result = (from indexentry in source.Elements(entryLevel)
select new IndexEntry
{
EtiologyCode = indexentry.Element("IE") == null
? null
: indexentry.Element("IE").Value,
//some code to set other properties in this object
Note = indexentry.Elements("NO")
.Descendants()
.ConcatenateTextNodes()
}
EDIT: A note about efficiency
Other answers have suggested using StringBuilder in the name of efficiency. I would check for evidence of this being the right way to go before using it. If you think about it, StringBuilder and ToArray do similar things - they create a buffer bigger than they need to, add data to it, resize it when necessary, and come out with a result at the end. The hope is that you won't need to resize too often.
The difference between StringBuilder and ToArray here is what's being buffered - in StringBuilder it's the entire contents of the string you've built up so far. With ToArray it's just references. In other words, resizing the internal buffer used for ToArray is likely to be cheaper than resizing the one for StringBuilder, particularly if the individual strings are long.
After doing the buffering in ToArray, string.Join is hugely efficient: it can look through all the strings to start with, work out exactly how much space to allocate, and then concatenate it without ever having to copy the actual character data.
This is in sharp contrast to a previous answer I've given - but unfortunately I don't think I ever wrote up the benchmark.
I certainly wouldn't expect ToArray to be significantly slower, and I think it makes the code simpler here - no need to use side-effects etc, aggregation etc.

I don't have experience with it myself, but it strikes me that LINQ to XML could vastly simplify your code. Do a select of XML document, then loop through it and use a StringBuilder to append the L element to some string.

The other option is to use Aggregate()
var q = topelement.Elements("L")
.Select(x => x.Value)
.Aggregate(new StringBuilder(),
(sb, x) => return sb.Append(x).Append(" "),
sb => sb.ToString().Trim());
edit: The first lambda in Aggregate is the accumulator. This is taking all of your values and creating one value from them. In this case, it is creating a StringBuilder with your desired text. The second lambda is the result selector. This allows you to translate your accumulated value into the result you want. In this case, changing the StringBuilder to a String.

I like LINQ as much as the next guy, but you're reinventing the wheel here. The XmlElement.InnerText property does exactly what's being asked for.
Try this:
using System.Xml;
class Program
{
static void Main(string[] args)
{
XmlDocument d = new XmlDocument();
string xml =
#"<NO>
<L>Some text here </L>
<L>Some additional text here </L>
<L>Still more text here </L>
</NO>";
d.LoadXml(xml);
Console.WriteLine(d.DocumentElement.InnerText);
Console.ReadLine();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to make a C# 'grep' more Functional using LINQ? - c#

Related

using lambda, how to apply an existing function to all elements of a list?

Using linq to execute an expression

Filtering a string array when elements represented (partially) in an other array of strings

Find any entities that contain any one string from a list

Concatenate collection of XML tags to string with LINQ

Categories

Resources