how to remove duplicate data for group query in Linq - c#

I'm trying to find a distinct list of filenames related to each bugid, and I used linq to group all filenames related to each bug id. I don't know how I can remove duplicate filenames related to each bugid,in file ouput I have multiple rows like this:
bugid filename1 filename2 filename3 filename4 .............
there are multiple rows with the same bugid and also there duplicate filenames for each bug id,
this is my code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
namespace finalgroupquery
{
class MainClass
{
public static void Main (string[] args)
{
List <bug> list2=new List <bug> ();
using(System.IO.StreamReader reader1= new System.IO.StreamReader( #"/home/output"))
using (System.IO.StreamWriter file = new System.IO.StreamWriter( #"/home/output1"))
{string line1;
while ((line1=reader1.ReadLine())!=null)
{ string[] items1=line1.Split('\t');
bug bg=new bug();
bg.bugid=items1[0];
for (int i=1; i<=items1.Length -1;i++)
{ bg.list1.Add(items1[i]);}
list2.Add(bg);
}
var bugquery= from c in list2 group c by c.bugid into x select
new Container { BugID = x.Key, Grouped = x };
foreach (Container con in bugquery)
{
StringBuilder files = new StringBuilder();
files.Append(con.BugID);
files.Append("\t");
foreach(var x in con.Grouped)
{
files.Append(string.Join("\t", x.list1.ToArray()));
}
file.WriteLine(files.ToString()); }
}
}
}
public class Container
{
public string BugID {get;set;}
public IGrouping<string, bug> Grouped {get;set;}
}
public class bug
{
public List<string> list1{get; set;}
public string bugid{get; set;}
public bug()
{
list1=new List<string>();
}
}
}
}

From your description it sounds like you want to do this:
List <bug> bugs = new List<bug>();
var lines = System.IO.File.ReadLines(#"/home/bugs");
foreach (var line in lines) {
string[] items = line.Split('\t');
bug bg=new bug();
bg.bugid = items[0];
bg.list1 = items.Skip(1).OrderBy(f => f).Distinct().ToList();
bugs.Add(bg);
}
This will produce a list of objects, where each object has a unique list of filenames.

Try to use this code :
var bugquery = from c in list2
group c by c.bugid into x
select new bug { bugid = x.Key, list1 = x.SelectMany(l => l.list1).Distinct().ToList() };
foreach (bug bug in bugquery)
{
StringBuilder files = new StringBuilder();
files.Append(bug.bugid);
files.Append("\t");
files.Append(string.Join("\t", bug.list1.ToArray()));
file.WriteLine(files.ToString());
}
Thanks to the combination of SelectMany and Distinct Linq operators, you can flatten the filename list and delete duplicates in a single line.
SelectMany (from msdn):
Projects each element of a sequence to an IEnumerable and flattens
the resulting sequences into one sequence.
Distinct (from msdn):
Returns distinct elements from a sequence.
It also means that your Container class is no longer needed as there's no need to iterate through the IGrouping<string, bug> collection anymore (here list1 contains all the bug related filenames without duplicates).
Edit
As you may have some blank lines and/or empty strings after reading and parsing your file, you could use this code to get rid of them :
using (System.IO.StreamReader reader1 = new System.IO.StreamReader(#"/home/sunshine40270/mine/projects/interaction2/fasil-data/common history/outputpure"))
{
string line1;
while ((line1 = reader1.ReadLine()) != null)
{
if (!string.IsNullOrWhiteSpace(line1))
{
string[] items1 = line1.Split(new [] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
bug bg = new bug();
bg.bugid = items1[0];
for (int i = 1; i <= items1.Length - 1; i++)
{
bg.list1.Add(items1[i]);
}
list2.Add(bg);
}
}
}
You'll notice :
New lines stored in line1 are checked for emptyness as soon as they are retrieved from your stream (with !string.IsNullOrWhiteSpace(line1))
To omit empty substrings from the return value of the string.Split method, you can use the StringSplitOptions.RemoveEmptyEntries parameter.
Hope this helps.

Related

Why is the return is List<char>?

I am trying to pull file names that match the substring using "contains" method. However, return seem to be List<char> but I expect List<string>.
private void readAllAttribues()
{
using (var reader = new StreamReader(attribute_file))
{
//List<string> AllLines = new List<string>();
List<FileNameAttributeList> AllAttributes = new List<FileNameAttributeList>();
while (!reader.EndOfStream)
{
FileNameAttributeList Attributes = new FileNameAttributeList();
Attributes ImageAttributes = new Attributes();
Point XY = new Point();
string lineItem = reader.ReadLine();
//AllLines.Add(lineItem);
var values = lineItem.Split(',');
Attributes.ImageFileName = values[1];
XY.X = Convert.ToInt16(values[3]);
XY.Y = Convert.ToInt16(values[4]);
ImageAttributes.Location = XY;
ImageAttributes.Radius = Convert.ToInt16(values[5]);
ImageAttributes.Area = Convert.ToInt16(values[6]);
AllAttributes.Add(Attributes);
}
List<string> unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).FirstOrDefault().ImageFileName.ToList();
List<string>var unique_reference_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"ref")).FirstOrDefault().ImageFileName.ToList();
foreach (var unique_raw_filename in unique_raw_filenames)
{
var raw_attributes = AllAttributes.Where(x => x.ImageFileName == unique_raw_filename).ToList();
}
}
}
Datatype class
public class FileNameAttributeList
{ // Do not change the order
public string ImageFileName { get; set; }
public List<Attributes> Attributes { get; set; }
public FileNameAttributeList()
{
Attributes = new List<Attributes>();
}
}
Why is FirstOrDefault() does not work ? (It returns List<char> but I am expecting List<string> and fails.
The ToList() method converts collections that implement IEnumerable<SomeType> into lists.
Looking at the definition of String, you can see that it implements IEnumerable<Char>, and so ImageFileName.ToList() in the following code will return a List<char>.
AllAttributes.Where(x =>
x.ImageFileName.Contains(#"non")).FirstOrDefault().ImageFileName.ToList();
Although I'm guessing at what you want, it seems like you want to filter AllAttributes based on the ImageFileName, and then get a list of those file names. If that's the case, you can use something like this:
var unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).Select(y=>y.ImageFileName).ToList();
In your code
List<string> unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).FirstOrDefault().ImageFileName.ToList();
FirstOrDefault() returns the first, or default, FileNameAttributeList from the list AllAttributes where the ImageFileName contains the text non.
Calling ToList() on the ImageFileName then converts the string value into a list of chars because string is a collection of char.
I think that what you are intending can be achieved by switching out FirstOrDefault to Select. Select allows you to map one value onto another.
So your code could look like this instead.
List<string> unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).Select(x => x.ImageFileName).ToList();
This then gives you a list of string.

How to Get Groups of Files from GetFiles()

I have to process files everyday. The files are named like so:
fg1a.mmddyyyy
fg1b.mmddyyyy
fg1c.mmddyyyy
fg2a.mmddyyyy
fg2b.mmddyyyy
fg2c.mmddyyyy
fg2d.mmddyyyy
If the entire file group is there for a particular date, I can process it. If it isn't there, I should not process it. I may have several partial file groups that run over several days. So when I have fg1a.12062017, fg1b.12062017 and fg1c.12062017, I can process that group (fg1) only.
Here is my code so far. It doesn't work because I can't figure out how to get only the full groups to add to the the processing file list.
fileList = Directory.GetFiles(#"c:\temp\");
string[] fileGroup1 = { "FG1A", "FG1B", "FG1C" }; // THIS IS A FULL GROUP
string[] fileGroup2 = { "FG2A", "FG2B", "FG2C", "FG2D" };
List<string> fileDates = new List<string>();
List<string> procFileList;
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
bool allFiles = true;
foreach (string fg in fileGroup1)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
foreach (string fg in fileGroup2)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
Any help or advice would be greatly appreciated.
Because it can sometimes get messy dealing with multiple lists, groupings, and parsing file names, I would start by creating a class that represents a FileGroupItem. This class would have a Parse method that takes in a file path, and then has properties that represent the group part and date part of the file name, as well as the full path to the file:
public class FileGroupItem
{
public string DatePart { get; set; }
public string GroupName { get; set; }
public string FilePath { get; set; }
public static FileGroupItem Parse(string filePath)
{
if (string.IsNullOrWhiteSpace(filePath)) return null;
// Split the file name on the '.' character to get the group and date parts
var fileParts = Path.GetFileName(filePath).Split('.');
if (fileParts.Length != 2) return null;
return new FileGroupItem
{
GroupName = fileParts[0],
DatePart = fileParts[1],
FilePath = filePath
};
}
}
Then, in my main code, I would create a list of the file group definitions, and then populate a list of FileGroupItems from the directory we're scanning. After that, we can determine if any file group definition is complete by comparing it's items (in a case-insensitive way) to the actual FileGroupItems we found in the directory (after first grouping the FileGroupItems by it's DatePart). If the intersection of these two lists has the same number of items as the file group definition, then it's complete and we can process that group.
Maybe it will make more sense in code:
private static void Main()
{
var scanDirectory = #"f:\public\temp\";
var processedDirectory = #"f:\public\temp2\";
// The lists that define a complete group
var fileGroupDefinitions = new List<List<string>>
{
new List<string> {"FG1A", "FG1B", "FG1C"},
new List<string> {"FG2A", "FG2B", "FG2C", "FG2D"}
};
// Populate a list of FileGroupItems from the files
// in our directory, and group them on the DatePart
var fileGroups = Directory.EnumerateFiles(scanDirectory)
.Select(FileGroupItem.Parse)
.GroupBy(f => f.DatePart);
// Now go through each group and compare the items
// for that date with our file group definitions
foreach (var fileGroup in fileGroups)
{
foreach (var fileGroupDefinition in fileGroupDefinitions)
{
// Get the intersection of the group definition and this file group
var intersection = fileGroup
.Where(f => fileGroupDefinition.Contains(
f.GroupName, StringComparer.OrdinalIgnoreCase))
.ToList();
// If all the items in the definition are there, then process the files
if (intersection.Count == fileGroupDefinition.Count)
{
foreach (var fileGroupItem in intersection)
{
Console.WriteLine($"Processing file: {fileGroupItem.FilePath}");
// Move the file to the processed directory
File.Move(fileGroupItem.FilePath,
Path.Combine(processedDirectory,
Path.GetFileName(fileGroupItem.FilePath)));
}
}
}
}
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
I think you could simplify your algorithm so you just have file groups as a prefix and a number of files to expect, fg1 is 3 files for a given date
I think your code to find the distinct dates present is a good idea, though you should use a hash set rather than a list, if you occasionally expect a large number of dates.. ("Valentine's Day?" - Ed)
Then you just need to work on the other loop that does the checking. An algorithm like this
//make a new Dictionary<string,int> for the filegroup prefixes and their counts3
//eg myDict["fg1"] = 3; myDict["fg2"] = 4;
//list the files in the directory, into an array of fileinfo objects
//see the DirectoryInfo.GetFiles method
//foreach string d in the list of dates
//foreach string fgKey in myDict.Keys - the list of group prefixes
//use a bit of Linq to get all the fileinfos with a
//name starting with the group and ending with the date
var grplist = myfileinfos.Where(fi => fi.Name.StartsWith(fg) && fi.Name.EndsWith(d));
//if the grplist.Count == the filegroup count ( myDict[fgKey] )
//then send every file in grplist for processing
//remember that grplist is a collection of fileinfo objects,
//if your processing method takes a string filename, use fileinfo.Fullname
Putting your file groupings into one dictionary will make things a lot easier than having them as x separate arrays
I haven't written all the code for you, but I've comment sketched the algorithm, and I've put in some of the more awkward bits like the link, dictionary declaration and how to fill it.. have a go at fleshing it out with code, ask any questions in a comment on this post
First, create an array of the groups to make processing easier:
var fileGroups = new[] {
new[] { "FG1A", "FG1B", "FG1C" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D" }
};
Then you can convert the array into a Dictionary to map each name back to its group:
var fileGroupMap = fileGroups.SelectMany(g => g.Select(f => new { key = f, group = g })).ToDictionary(g => g.key, g => g.group);
Then, preprocess the files you get from the directory:
var fileList = from fname in Directory.GetFiles(...)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
};
Now you can take your fileList and group by date and group, and then filter to just completed groups:
var profFileList = (from file in fileList
group file by new { file.fdate, fgroup = fileGroupMap[file.ffilename] } into fng
where fng.Key.fgroup.All(f => fng.Select(fn => fn.ffilename).Contains(f))
from fn in fng
select fn.fname).ToList();
Since you didn't preserve the groups, I flattened the groups at the end of the query into just a list of files to be processed. If you needed, you could keep them in groups and process the groups instead.
Note: If a file exists that belongs to no group, you will get an error from the lookup in fileGroupMap. If that is a possiblity you can filter the fileList to just known names as follows:
var fileList = from fname in GetFiles
let ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
where fileGroupMap.Keys.Contains(ffilename)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename
};
Also note that having a name in multiple groups will cause an error in the creation of fileGroupMap. If that is a possibility, the queries would become more complex and have to be written differently.
Here is a simple class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] filenames = { "fg1a.12012017", "fg1b.12012017", "fg1c.12012017", "fg2a.12012017", "fg2b.12012017", "fg2c.12012017", "fg2d.12012017" };
new SplitFileName(filenames);
List<List<SplitFileName>> results = SplitFileName.GetGroups();
}
}
public class SplitFileName
{
public static List<SplitFileName> names = new List<SplitFileName>();
string filename { get; set; }
string prefix { get; set; }
string letter { get; set; }
DateTime date { get; set; }
public SplitFileName() { }
public SplitFileName(string[] splitNames)
{
foreach(string name in splitNames)
{
SplitFileName splitName = new SplitFileName();
names.Add(splitName);
splitName.filename = name;
string[] splitArray = name.Split(new char[] { '.' });
splitName.date = DateTime.ParseExact(splitArray[1],"MMddyyyy", System.Globalization.CultureInfo.InvariantCulture);
splitName.prefix = splitArray[0].Substring(0, splitArray[0].Length - 1);
splitName.letter = splitArray[0].Substring(splitArray[0].Length - 1,1);
}
}
public static List<List<SplitFileName>> GetGroups()
{
return names.OrderBy(x => x.letter).GroupBy(x => new { date = x.date, prefix = x.prefix })
.Where(x => string.Join(",",x.Select(y => y.letter)) == "a,b,c,d")
.Select(x => x.ToList())
.ToList();
}
}
}
With everyone's help, I solved it too. This is what I'm going with because it's the most maintainable for me but the solutions were so smart!!! Thanks everyone for your help.
private void CheckFiles()
{
var fileGroups = new[] {
new [] { "FG1A", "FG1B", "FG1C", "FG1D" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D", "FG2E" } };
List<string> fileDates = new List<string>();
List<string> pfiles = new List<string>();
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
// check if a date has all the files
foreach (string fd in fileDates)
{
int fgCount = 0;
// for each file group
foreach (Array masterfg in fileGroups)
{
foreach (string fg in masterfg)
{
// see if all the files are there
bool foundIt = false;
string finder = fg + fd;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
pfiles.Add(fn);
}
}
fgCount++;
}
if (fgCount == pfiles.Count())
{
foreach (string fn in pfiles)
{
procFileList.Add(fn);
}
pfiles.Clear();
}
else
{
pfiles.Clear();
}
}
}
return;
}

Add an element/item to a multidimentional list

I am trying to create a multidimensional list filled with an employee and their information.
Ex: "Jane Smith" "Manager" "75,000" "Dallas"
the code I have right now is giving me an out of range exception.
This bigROW[i].Add(ownName); and bigROW[i][j+1] = newElement; gives me errors.
//Begin making rows
for (int i = 0; i < fileRowCount; i++ )
{
string findOwners = "";
findOwners = file5Data.Rows[i][0].ToString();
if(DISTINCTOppOwners.Contains(findOwners))
{
//Find index of where owner is
int useIndex = 0;
useIndex = DISTINCTOppOwners.IndexOf(findOwners);
//Add their name to Multidimensional list
string ownName = DISTINCTOppOwners[useIndex].ToString();
//This line give me the ERROR
bigROW[i].Add(ownName);
for (int j = 0; j < fileColCount; j++)
{
Add Employee information to Multidimensional list
string newElement = file5Data.Rows[i][j].ToString();
if(ownName != newElement)
{
if(j ==0)
{
//Avoid adding their names to the list twice
bigROW[i][j+1] = newElement;
}
bigROW[i][j] = newElement;
}
}
}
}
I tried adding the info to a list called "sublist" then adding it to the BigRow(multidimensional list),but when I cleared the sublist to add a new row it deleted the values from the BigRow.
I tried adding the info to a list called "sublist" then adding it to the BigRow(multidimensional list),but when I cleared the sublist to add a new row it deleted the values from the BigRow.
When you add an object to a list what is stored is a reference, not the contents of the object. Instead of clearing sublist you should create a new List each time. Otherwise you have an outer list that contains multiple copies of the same list inside.
Refer to jdweng's answer above for an example of this. In his code the ToList call creates an new numbers list for each line, so that each row has its own List of numbers.
Here is a simple example
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication64
{
class Program
{
static void Main(string[] args)
{
string input =
"0,1,2,3,4,5,6,7,8,9\n" +
"10,11,12,13,14,15,16,17,18,19\n" +
"20,21,22,23,24,25,26,27,28,29\n" +
"30,31,32,33,34,35,36,37,38,39\n" +
"40,41,42,43,44,45,46,47,48,49\n";
List<List<int>> output = new List<List<int>>();
StringReader reader = new StringReader(input);
string inputline = "";
while ((inputline = reader.ReadLine()) != null)
{
List<int> numbers = inputline.Split(new char[] { ',' }).Select(x => int.Parse(x)).ToList();
output.Add(numbers);
}
}
}
}

Create dynamic string array in C# and add strings (outcome of a split method) into two separate arrays through a loop

I have a list of strings which includes strings in format: xx#yy
xx = feature name
yy = project name
Basically, I want to split these strings at # and store the xx part in one string array and the yy part in another to do further operations.
string[] featureNames = all xx here
string[] projectNames = all yy here
I am able to split the strings using the split method (string.split('#')) in a foreach or for loop in C# but I can't store two parts separately in two different string arrays (not necessarily array but a list would also work as that can be converted to array later on).
The main problem is to determine two parts of a string after split and then appends them to string array separately.
This is one simple approach:
var xx = new List<string>();
var yy = new List<string>();
foreach(var line in listOfStrings)
{
var split = string.split('#');
xx.Add(split[0]);
yy.Add(split[1]);
}
The above instantiates a list of xx and and a list of yy, loops through the list of strings and for each one splits it. It then adds the results of the split to the previously instantiated lists.
How about the following:
List<String> xx = new List<String>();
List<String> yy = new List<String>();
var strings = yourstring.Split('#');
xx.Add(strings.First());
yy.Add(strings.Last());
var featureNames = new List<string>();
var productNames = new List<string>();
foreach (var productFeature in productFeatures)
{
var parts = productFeature.Split('#');
featureNames.Add(parts[0]);
productNames.Add(parts[1]);
}
How about
List<string> lst = ... // your list containging xx#yy
List<string> _featureNames = new List<string>();
List<string> _projectNames = new List<string>();
lst.ForEach(x =>
{
string[] str = x.Split('#');
_featureNames.Add(str[0]);
_projectNames.Add(str[1]);
}
string[] featureNames = _featureNames.ToArray();
string[] projectNames = _projectNames.ToArray();
You can do something like this:
var splits = input.Select(v => v.Split('#'));
var features = splits.Select(s => s[0]).ToList();
var projects = splits.Select(s => s[1]).ToList();
If you don't mind slightly more code but better performance and less pressure on garbage collector then:
var features = new List<string>();
var projects = new List<string>();
foreach (var split in input.Select(v => v.Split('#')))
{
features.Add(split[0]);
projects.Add(split[1]);
}
But overall I'd suggest to create class and parse your input (more C#-style approach):
public class ProjectFeature
{
public readonly string Project;
public readonly string Feature;
public ProjectFeature(string project, string feature)
{
this.Project = project;
this.Feature = feature;
}
public static IEnumerable<ProjectFeature> ParseList(IEnumerable<string> input)
{
return input.Select(v =>
{
var split = v.Split('#');
return new ProjectFeature(split[1], split[0]);
}
}
}
and use it later (just an example of possible usage):
var projectFeatures = ProjectFeature.ParseList(File.ReadAllLines(#"c:\features.txt")).ToList();
var features = projectFeatures.Select(f => f.Feature).ToList();
var projects = projectFeatures.Select(f => f.Project).ToList();
// ??? etc.
var all_XX = yourArrayOfStrings.Select(str => str.split('\#')[0]); // this will be IENumerable
var all_YY = yourArrayOfStrings.Select(str => str.split('\#')[1]); // the same fot YY. But here make sure that element at [1] exists
The main problem is to determine two parts of a string after split and then appends them to string array separately.
Why the different arrays? Wouldn't a dictionary be more fitting?
List<String> input = File.ReadAllLines().ToList<String>(); // or whatever
var output = new Dictionary<String, String>();
foreach (String line in input)
{
var parts = input.Split('#');
output.Add(parts[0], parts[1]);
}
foreach (var feature in output)
{
Console.WriteLine("{0}: {1}", feature.Key, feature.Value);
}
Try this.
var ls = new List<string>();
ls.Add("123#project");
ls.Add("123#project1");
var f = from c in ls
select new
{
XX = c.Split("#")[0],
YY = c.Split("#")[1]
};
string [] xx = f.Select (x => x.XX).ToArray();
string [] yy = f.Select (x => x.YY).ToArray();

WPF list filtering

I am new to WPF so this is probably an easy question. I have an app that reads some words from a csv file and stores them in a list of strings. What I am trying to do is parametise this list to show the most popular words in my list. So in my UI I want to have a text box which when I enter a number e.g. 5 would filter the original list leaving only the 5 most popular (frequent) words in the new list. Can anyone assist with this final step? Thanks -
public class VM
{
public VM()
{
Words = LoadWords(fileList);
}
public IEnumerable<string> Words { get; private set; }
string[] fileList = Directory.GetFiles(#"Z:\My Documents\", "*.csv");
private static IEnumerable<string> LoadWords(String[] fileList)
{
List<String> words = new List<String>();
//
if (fileList.Length == 1)
{
try
{
foreach (String line in File.ReadAllLines(fileList[0]))
{
string[] rows = line.Split(',');
words.AddRange(rows);
}
}
catch (Exception ex)
{
System.Windows.MessageBox.Show(ex.Message, "Problem!");
}
}
else
{
System.Windows.MessageBox.Show("Please ensure that you have ONE read file in the source folder.", "Problem!");
}
return words;
}
}
A LINQ query that groups by the word and orders by the count of that word descending should do it. Try this
private static IEnumerable<string> GetTopWords(int Count)
{
var popularWords = (from w in words
group w by w
into grp
orderby grp.Count() descending
select grp.Key).Take(Count).ToList();
return popularWords;
}
You could use CollectionViewSource.GetDefaultView(viewModel.Words), which returns ICollectionView.
ICollectionView exposes Filter property of type Predicate<object>, that you could involve for filtering.
So the common scenario looks like:
ViewModel exposes property PopularCount, that is binded to some textbox in View.
ViewModel listens for PopularCount property's changing.
When notification occured, model obtains ICollectionView for viewModel.Words collection and set up Filter property.
You could find working sample of Filter property usage here. If you get stuck with code, let me know.
Instead of reading all the words into the list and then sorting it based on the frequency, a cleaner approach would be to create a custom class MyWord that stores the word and the frequency. While reading the file, the frequency of the word can be incremented. The class can implement IComparable<T> to compare the words based on the frequency.
public class MyWord : IComparable<MyWord>
{
public MyWord(string word)
{
this.Word = word;
this.Frequency = 0;
}
public MyWord(string word, int frequency)
{
this.Word = word;
this.Frequency = frequency;
}
public string Word { get; private set;}
public int Frequency { get; private set;}
public void IncrementFrequency()
{
this.Frequency++;
}
public void DecrementFrequency()
{
this.Frequency--;
}
public int CompareTo(MyWord secondWord)
{
return this.Frequency.CompareTo(secondWord.Frequency);
}
}
The main class VM would have these members,
public IEnumerable<MyWord> Words { get; private set; }
private void ShowMostPopularWords(int numberOfWords)
{
SortMyWordsDescending();
listBox1.Items.Clear();
for (int i = 0; i < numberOfWords; i++ )
{
listBox1.Items.Add(this.Words.ElementAt(i).Word + "|" + this.Words.ElementAt(i).Frequency);
}
}
And the call to ShowMostPopularWords()
private void Button_Click(object sender, RoutedEventArgs e)
{
int numberOfWords;
if(Int32.TryParse(textBox1.Text, NumberStyles.Integer, CultureInfo.CurrentUICulture, out numberOfWords))
{
ShowMostPopularWords(numberOfWords);
}
}
I'm not sure if grouping and ordering of the 'words' list is what you want but if yes this could be a way of doing it:
int topN = 3;
List<string> topNWords = new List<string>();
string[] words = new string[] {
"word5",
"word1",
"word1",
"word1",
"word2",
"word2",
"word2",
"word3",
"word3",
"word4",
"word5",
"word6",
};
// [linq query][1]
var wordGroups = from s in words
group s by s into g
select new { Count = g.Count(), Word = g.Key };
for (int i = 0; i < Math.Min(topN, wordGroups.Count()); i++)
{
// (g) => g.Count is a [lambda expression][2]
// .OrderBy and Reverse are IEnumerable extension methods
var element = wordGroups.OrderBy((g) => g.Count).Reverse().ElementAt(i);
topNWords.Add(element.Count + " - " + element.Word);
}
Thsi could be made much shorter by using ordering in the linq select clause but I wished to introduce you to inline lambdas and ienumerable extensions too.
The short version could be:
topNWords = (from s in words
group s by s
into g
orderby g.Count() descending
select g.Key).Take(Math.Min(topN, g.Count()).ToList();

Categories