LINQ/Dictionary Connecting two methods into a single one - c#

I've tried like almost everything, I've asked similar question earlier and got some guidlines to do so, but it doesn't work, I mean it works when there are two methods but it hurts to look at all those lines of code that are duplicated. So I need help how to connect those two into a single method.
private Dictionary<string, int> SplitterMP(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (pairs)
{
var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));
return query
.Concat(pairWise)
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
private Dictionary<string, int> SplitterS(string[] file, bool distinct, bool pairs)
{
List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
var query = File.ReadLines(filename)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit));
if (distinct)
{
allFilesWords.AddRange(query.Distinct());
}
}
return allFilesWords
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
So the first function works for pairs = true and distinct = false and the second one works for pairs = false and distinct = true. I want it to be in one single Splitter method to be even able to call both true and not doing some shenanigans like now I do.

I'm not 100% sure what you mean, but can you do this?
private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (pairs)
query = query.Concat(query.Pairwise((first, second) => string.Format("{0} {1}", first, second)));
if(distinct)
query = query.Distinct();
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}

Related

How to convert IEnumerable<IEnumerable<IGrouping<int,string>>> to IEnumerable<IEnumerable<string>>

I'm trying to partition some comma separated lines into groups of size 2 at max.
How can i convert the collection of groups to list of lists as below?
I expect the partitions to be 3 first and then 4 after grouping.
List<string> chunk = new List<string>()
{
"a,b,c",
"a,d,e",
"b,c,d",
"b,e,d",
"b,f,g",
"e"
};
var partitons = chunk.GroupBy(c => c.Split(',')[0], (key, g) => g);
var groups = partitons.Select(x => x.Select((i, index) => new { i, index }).GroupBy(g => g.index / 2, e => e.i));
IEnumerable<IEnumerable<string>> parts = groups.Select(???)
This is what I wanted
var parts = groups.SelectMany(x => x).Select(y => y.Select(z => z));
Try this:
partitons = groups.Select(x => x.SelectMany(y => y));
I get this:

C# Distinct in LINQ query

I've got the problem after changing some code. My idea is like this: I am counting the number of words in document, but just 1 copy of a word for each document, for example:
Document 1 = Smith Smith Smith Smith => Smith x1
Document 2 = Smith Alan Alan => Smith x1, Alan x1
Document 3 = John John => John x1
but the total count of smiths should:
Smith x2 (in 2 documents out of 3), Alan x1 (1 out of 3 documents), John x1 (1 out of 3 documents)
I think it was working before when I had a separate method for distinct (counting also all the words if distinct = false), now it produces just 1.
The code before:
private Dictionary<string, int> tempDict = new Dictionary<string, int>();
private void Splitter(string[] file)
{
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
It should be changed so it returns dictionary, but in the proces of making app changed this to this code:
private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (distinct)
{
query = query.Distinct();
}
if (pairs)
{
var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));
return query
.Concat(pairWise)
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
Also note that query = file.Distinct(); returns just name of the document. SO it has to be something different.
#edit
This is how I am calling this method:
private void EnterDocument(object sender, RoutedEventArgs e)
{
List<string> myFile= new List<string>();
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Multiselect = true;
openFileDialog.Filter = "All files (*.*)|*.*|Text files (*.txt)|*.txt";
if (openFileDialog.ShowDialog() == true)
{
foreach (string filename in openFileDialog.FileNames)
{
myFile.Add(filename);
}
}
string[] myFiles= myFile.ToArray();
myDatabase = Splitter(myFiles, true, false);
}
Distinct() will remove duplicates from your IEnumerable so calling it before the following...
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
...will result in a list of all the unique words but with a count of 1.
Edit:
To solve the merging all lines issue you could do something like this:
List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
var fileQuery = File.ReadLines(filename)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
allFilesWords.AddRange(fileQuery.Distinct());
}
return allFilesWords
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());

How to count number of occurrence of each word in string?

I use the following code to extract words from string input, how can I get the occurrence of each words too?
var words = Regex.Split(input, #"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10)
.Select(g => g.Key);
Instead of Regex.Split you can use string.Split and get the count for each word like:
string str = "Some string with Some string repeated";
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
If you want to filter out those words which are repeated 10 times atleast then you can add a condition before Select like Where(grp=> grp.Count >= 10)
For output:
foreach (var item in result)
{
Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}
Output:
Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1
For case insensitive grouping you can replace the current GroupBy with:
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
So your query would be:
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
.Where(grp => grp.Count() >= 10)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
Try this:
var words = Regex.Split(input, #"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Select(g => new {key = g.Key, count = g.Count()});
Remove the Select statement to keep the IGrouping which you can use to view both the keys and take a count of values.
var words = Regex.Split(input, #"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10);
foreach (var wordGrouping in words)
{
var word = wordGrouping.Key;
var count = wordGrouping.Count();
}
You could produce a dictionary like this:
var words = Regex.Split(input, #"\W+")
.GroupBy(w => w)
.Select(g => g.Count() > 10)
.ToDictionary(g => g.Key, g => g.Count());
Or if you'd like to avoid having to compute the count twice, like this:
var words = Regex.Split(input, #"\W+")
.GroupBy(w => w)
.Select(g => new { g.Key, Count = g.Count() })
.Where(g => g.Count > 10)
.ToDictionary(g => g.Key, g => g.Count);
And now you can get the count of words like this (assuming the word "foo" appears more than 10 times in input):
var fooCount = words["foo"];

Linq DataTable reuse concatenated variable

Here is what I am doing in my LINQ on a datatable.
var result = resTable.Rows.Where(r => Map.ContainsKey(string.Concat(r[HeaderCol].ToString().Trim(),dot,r[FooterCol].ToString().Trim(),dot,r[TypeCol].ToString().Trim())))
.GroupBy(r => string.Concat(r[HeaderCol].ToString().Trim(), dot, r[FooterCol].ToString().Trim(), dot, r[TypeCol].ToString().Trim()))
.ToDictionary(g => g.Key,
g => g.GroupBy(r => DateTime.FromOADate((double)r[DateCol]))
.ToDictionary(c => c.Key,
c => c.Select(r => new ResultObj(DateTime.FromOADate((double)r[ResultDateCol]), new Decimal((double)r[PriceCol])))
.ToList()));
I am creating a key from column values and need to use it in group by as well.
string.Concat(r[HeaderCol].ToString().Trim(), dot, r[FooterCol].ToString().Trim(), dot, r[TypeCol].ToString().Trim())
Any way I can do string concat only once and use it twice in LINQ ?
I don't know why it is necessary, but here t you want.
var result = resTable.Rows.Select(r => new {r, res = string.Concat(r[HeaderCol].ToString().Trim(),dot,r[FooterCol].ToString().Trim(),dot,r[TypeCol].ToString().Trim())})
.Where(r => Map.ContainsKey(r.res))
.GroupBy(r => r.res)
.ToDictionary(g => g.Key,
g => g.GroupBy(r => DateTime.FromOADate((double)r.r[DateCol]))
.ToDictionary(c => c.Key,
c => c.Select(r => new ResultObj(DateTime.FromOADate((double)r.r[ResultDateCol]), new Decimal((double)r.r[PriceCol])))
.ToList()));

How to compare 2 list by characters content and its correspondents double values?

I have 2 lists: a string list and a double list with same length and with same index of correspondence. I need to compare all the strings, find the indexes of the list that has the same characters, independent of its order, and delete the highest double value that corresponds to both,
Example:
List<string> str= new List<string>();
str.add("efc");
str.add("abc");
str.add("cde");
str.add("cab");
str.add("fbc");
List<double> vlr= new List<double>();
vlr.add(0.1);
vlr.add(0.5);
vlr.add(0.4);
vlr.add(0.2);
vlr.add(0.3);
and this case, "abc" => (0.5) must be deleted because "cab" has the same characters AND lower correspondent value =>(0.2).
There is a lambda expression for this 2 arrays??
What I've tried:
var distinct = list .Select((str, idx) => new { Str = str, Idx = idx })
.GroupBy(pair => new HashSet<char>(pair.Str), HashSet<char>.CreateSetComparer())
.Select(grp => grp.OrderBy(p => p.Idx).First())
.ToList();
Here's one way to solve it:
// Pair the strings with their correspondence values
var pairs = str.Zip(vlr, (s, d) => new {s, d});
// Group using a sorted string, eliminating differences due to character order
var groups = pairs.GroupBy(x => new string(x.s.ToCharArray().OrderBy(c => c).ToArray()));
// For each group, retain the item with the lowest correspondence value
var filtered = groups.Select(x => x.OrderBy(y => y.d).First().s);
var newDict = str.Zip(vlr, (s, d) => new { s, d })
.GroupBy(x => String.Join("", x.s.OrderBy(y => y)))
.Select(g => g.OrderBy(x => x.d).First())
.ToDictionary(x => x.s, x => x.d);
here is the code:
var group = str.GroupBy(s => string.Join("", s.ToCharArray().OrderBy(c => c)));
var _vlr = group.Select(g => g.Min(s => vlr[str.IndexOf(s)]));
var _str = group.Select(g => g.OrderBy(s => vlr[str.IndexOf(s)]).First());
and the result:

Categories