I use the following code to extract words from string input, how can I get the occurrence of each words too?
var words = Regex.Split(input, #"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10)
.Select(g => g.Key);
Instead of Regex.Split you can use string.Split and get the count for each word like:
string str = "Some string with Some string repeated";
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
If you want to filter out those words which are repeated 10 times atleast then you can add a condition before Select like Where(grp=> grp.Count >= 10)
For output:
foreach (var item in result)
{
Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}
Output:
Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1
For case insensitive grouping you can replace the current GroupBy with:
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
So your query would be:
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
.Where(grp => grp.Count() >= 10)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
Try this:
var words = Regex.Split(input, #"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Select(g => new {key = g.Key, count = g.Count()});
Remove the Select statement to keep the IGrouping which you can use to view both the keys and take a count of values.
var words = Regex.Split(input, #"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10);
foreach (var wordGrouping in words)
{
var word = wordGrouping.Key;
var count = wordGrouping.Count();
}
You could produce a dictionary like this:
var words = Regex.Split(input, #"\W+")
.GroupBy(w => w)
.Select(g => g.Count() > 10)
.ToDictionary(g => g.Key, g => g.Count());
Or if you'd like to avoid having to compute the count twice, like this:
var words = Regex.Split(input, #"\W+")
.GroupBy(w => w)
.Select(g => new { g.Key, Count = g.Count() })
.Where(g => g.Count > 10)
.ToDictionary(g => g.Key, g => g.Count);
And now you can get the count of words like this (assuming the word "foo" appears more than 10 times in input):
var fooCount = words["foo"];
Related
List<string> testList = new List<string>();
testList.Add("A");
testList.Add("A");
testList.Add("C");
testList.Add("d");
testList.Add("D");
This query is case sensitive:
// Result: "A"
List<String> duplicates = testList.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
How would it look case insensitive? (Result: "A", "d")
By using overloaded implementation of the GroupBy where you can provide the comparer required, e.g. StringComparer.OrdinalIgnoreCase:
var result = testList
.GroupBy(item => item, StringComparer.OrdinalIgnoreCase)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
By replacing
.GroupBy(x => x)
with
.GroupBy(x => x.ToLower())
you turn all string elements to lower case and group case insensitive.
var result = testList.GroupBy(x => x.ToLower())
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
I'm trying to partition some comma separated lines into groups of size 2 at max.
How can i convert the collection of groups to list of lists as below?
I expect the partitions to be 3 first and then 4 after grouping.
List<string> chunk = new List<string>()
{
"a,b,c",
"a,d,e",
"b,c,d",
"b,e,d",
"b,f,g",
"e"
};
var partitons = chunk.GroupBy(c => c.Split(',')[0], (key, g) => g);
var groups = partitons.Select(x => x.Select((i, index) => new { i, index }).GroupBy(g => g.index / 2, e => e.i));
IEnumerable<IEnumerable<string>> parts = groups.Select(???)
This is what I wanted
var parts = groups.SelectMany(x => x).Select(y => y.Select(z => z));
Try this:
partitons = groups.Select(x => x.SelectMany(y => y));
I get this:
I've got the problem after changing some code. My idea is like this: I am counting the number of words in document, but just 1 copy of a word for each document, for example:
Document 1 = Smith Smith Smith Smith => Smith x1
Document 2 = Smith Alan Alan => Smith x1, Alan x1
Document 3 = John John => John x1
but the total count of smiths should:
Smith x2 (in 2 documents out of 3), Alan x1 (1 out of 3 documents), John x1 (1 out of 3 documents)
I think it was working before when I had a separate method for distinct (counting also all the words if distinct = false), now it produces just 1.
The code before:
private Dictionary<string, int> tempDict = new Dictionary<string, int>();
private void Splitter(string[] file)
{
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
It should be changed so it returns dictionary, but in the proces of making app changed this to this code:
private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (distinct)
{
query = query.Distinct();
}
if (pairs)
{
var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));
return query
.Concat(pairWise)
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
Also note that query = file.Distinct(); returns just name of the document. SO it has to be something different.
#edit
This is how I am calling this method:
private void EnterDocument(object sender, RoutedEventArgs e)
{
List<string> myFile= new List<string>();
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Multiselect = true;
openFileDialog.Filter = "All files (*.*)|*.*|Text files (*.txt)|*.txt";
if (openFileDialog.ShowDialog() == true)
{
foreach (string filename in openFileDialog.FileNames)
{
myFile.Add(filename);
}
}
string[] myFiles= myFile.ToArray();
myDatabase = Splitter(myFiles, true, false);
}
Distinct() will remove duplicates from your IEnumerable so calling it before the following...
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
...will result in a list of all the unique words but with a count of 1.
Edit:
To solve the merging all lines issue you could do something like this:
List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
var fileQuery = File.ReadLines(filename)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
allFilesWords.AddRange(fileQuery.Distinct());
}
return allFilesWords
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
I know that we can find duplicate items like this:
var dublicateItems = itemStrings.GroupBy(x => x)
.Where(x => x.Count() > 1)
.ToDictionary(g => g.Key, g => g.Count());
And distinct items like this:
var distinctItems = itemStrings.Distinct();
But how to combine it to the following list of string:
input: a, b, b, c, d, d, d, d
output: a, b (2 times), c, d (4 times)
You're almost there:
var duplicateItems =
itemStrings
.GroupBy(i => i)
.Select(i => new { Key = i.Key, Count = i.Count() })
.Select(i => i.Key + (i.Count > 1 ? " (" + i.Count + " times)" : string.Empty));
If you want the result as a comma-separated string, you can then do this:
var result = string.Join(", ", duplicateItems);
You have already the solution with the first approach, remove the Where
var itemCounts = itemStrings.GroupBy(x => x)
.ToDictionary(g => g.Key, g => g.Count());
string result = String.Join(", ",
itemCounts.Select(kv => kv.Value > 1
? string.Format("{0} ({1} times)", kv.Key, kv.Value)
: kv.Key));
Another approach is using Enumerable.ToLookup instead of GroupBy:
var itemLookup = itemStrings.ToLookup(x => x);
string result = String.Join(", ",
itemLookup.Select(grp => grp.Count() > 1
? string.Format("{0} ({1} times)", grp.Key, grp.Count())
: grp.Key));
With something like:
string[] itemStrings = new[] { "a", "b", "b", "c", "d", "d", "d", "d" };
string[] duplicateItems = (from x in itemStrings.OrderBy(x => x).GroupBy(x => x)
let cnt = x.Count()
select cnt == 1 ?
x.Key :
string.Format("{0} ({1} times)", x.Key, cnt)
).ToArray();
I've added an OrderBy() because your list seems to be ordered, and I've overcomplicated it a little just to cache the x.Count() (the let cnt = x.Count()) .
If you then want a single big string, you can
string joined = string.Join(",", duplicateItems);
I have 2 lists: a string list and a double list with same length and with same index of correspondence. I need to compare all the strings, find the indexes of the list that has the same characters, independent of its order, and delete the highest double value that corresponds to both,
Example:
List<string> str= new List<string>();
str.add("efc");
str.add("abc");
str.add("cde");
str.add("cab");
str.add("fbc");
List<double> vlr= new List<double>();
vlr.add(0.1);
vlr.add(0.5);
vlr.add(0.4);
vlr.add(0.2);
vlr.add(0.3);
and this case, "abc" => (0.5) must be deleted because "cab" has the same characters AND lower correspondent value =>(0.2).
There is a lambda expression for this 2 arrays??
What I've tried:
var distinct = list .Select((str, idx) => new { Str = str, Idx = idx })
.GroupBy(pair => new HashSet<char>(pair.Str), HashSet<char>.CreateSetComparer())
.Select(grp => grp.OrderBy(p => p.Idx).First())
.ToList();
Here's one way to solve it:
// Pair the strings with their correspondence values
var pairs = str.Zip(vlr, (s, d) => new {s, d});
// Group using a sorted string, eliminating differences due to character order
var groups = pairs.GroupBy(x => new string(x.s.ToCharArray().OrderBy(c => c).ToArray()));
// For each group, retain the item with the lowest correspondence value
var filtered = groups.Select(x => x.OrderBy(y => y.d).First().s);
var newDict = str.Zip(vlr, (s, d) => new { s, d })
.GroupBy(x => String.Join("", x.s.OrderBy(y => y)))
.Select(g => g.OrderBy(x => x.d).First())
.ToDictionary(x => x.s, x => x.d);
here is the code:
var group = str.GroupBy(s => string.Join("", s.ToCharArray().OrderBy(c => c)));
var _vlr = group.Select(g => g.Min(s => vlr[str.IndexOf(s)]));
var _str = group.Select(g => g.OrderBy(s => vlr[str.IndexOf(s)]).First());
and the result: