C# Linq question - c#

I have a text file in which I am storing entries for an address book.
The layout is like so:
Name:
Contact:
Product:
Quantity:
I have written some linq code to grab the name plus the next four lines, for a search by name feature.
I also want to be able to search by contact.
The challenge is to match the contact info, grab the next 3 lines, and also grab the line prior to the match.
That way if Search By Contact is used, the full list of info will be returned.
private void buttonSearch_Click(object sender, EventArgs e)
{
string[] lines = File.ReadAllLines("C:/AddressBook/Customers.txt");
string name = textBoxSearchName.Text;
string contact = textBoxContact.Text;
if (name == "" && contact == "")
{
return;
}
var byName = from line in lines
where line.Contains(name)
select lines.SkipWhile(f => f != line).Take(4);
//var byContact = from line in lines
// where line.Contains(name)
// select lines.SkipWhile(f => f != name).Take(4);
if (name != "")
{
foreach (var item in byName)
foreach (var line in item) { listBox2.Items.Add(line); }
listBox2.Items.Add("");
}
//if (contact != "")
//{
// foreach (var item in byContact)
// foreach (var line in item) { listBox2.Items.Add(line); }
//listBox2.Items.Add("");
}
}

Firstly i would recommend changing your data storage approach if you can.
Secondly i would recommend reading the file into an object, something like this:
public class Contact
{
public string Name {get; set;}
public string Contact {get; set;}
public string Product {get; set;}
public int Quantity {get; set;}
}
...
public IEnumerable<Contact> GetContacts()
{
//make this read line by line if it is big!
string[] lines = File.ReadAllLines("C:/AddressBook/Customers.txt");
for (int i=0;i<lines.length;i += 4)
{
//add error handling/validation!
yield return new Contact()
{
Name = lines[i],
Contact = lines[i+1],
Product = lines[i+2],
Quantity = int.Parse(lines[i+3]
};
}
}
private void buttonSearch_Click(object sender, EventArgs e)
{
...
var results = from c in GetContacts()
where c.Name == name ||
c.Contact == contact
select c;
...
}

See if this will work
var contactLinesList = lines.Where(l => l.Contains(name))
.Select((l, i) => lines.Skip(i - 1).Take(4)).ToList();
contactLinesList.ForEach(cl => listBox2.Items.Add(cl));

This is not the smallest code in earth but it shows how to do a couple of things. Although I don't recommend using it, because it is quite complex to understand. This is to be considered as a hobbyist, just learning code!!! I suggest you load the file in a well known structure, and do Linq on that... anyway... this is a C# Console Application that does what you proposed using Linq syntax, and one extension method:
using System;
using System.Collections.Generic;
using System.Linq;
namespace stackoverflow.com_questions_5826306_c_linq_question
{
public class Program
{
public static void Main()
{
string fileData = #"
Name: Name-1
Contact: Xpto
Product: Abc
Quantity: 12
Name: Name-2
Product: Xyz
Contact: Acme
Quantity: 16
Name: Name-3
Product: aammndh
Contact: YKAHHYTE
Quantity: 2
";
string[] lines = fileData.Replace("\r\n", "\n").Split('\n');
var result = Find(lines, "contact", "acme");
foreach (var item in result)
Console.WriteLine(item);
Console.WriteLine("");
Console.WriteLine("Press any key");
Console.ReadKey();
}
private static string[] Find(string[] lines, string searchField, string searchValue)
{
var result = from h4 in
from g4 in
from i in (0).To(lines.Length)
select ((from l in lines select l).Skip(i).Take(4))
where !g4.Contains("")
select g4
where h4.Any(
x => x.Split(new char[] { ':' }, 2)[0].Equals(searchField, StringComparison.OrdinalIgnoreCase)
&& x.Split(new char[] { ':' }, 2)[1].Trim().Equals(searchValue, StringComparison.OrdinalIgnoreCase))
select h4;
var list = result.FirstOrDefault();
return list.ToArray();
}
}
public static class NumberExtensions
{
public static IEnumerable<int> To(this int start, int end)
{
for (int it = start; it < end; it++)
yield return it;
}
}
}

If your text file is small enough, I'd recommend using regular expressions instead. This is exactly the sort of thing it's designed to do. Off the top of my head, the expression will look something like this:
(?im)^Name:(.*?)$ ^Contact:search_term$^Product:(.*?)$^Quantity:(.*?)$

Related

How to Get Groups of Files from GetFiles()

I have to process files everyday. The files are named like so:
fg1a.mmddyyyy
fg1b.mmddyyyy
fg1c.mmddyyyy
fg2a.mmddyyyy
fg2b.mmddyyyy
fg2c.mmddyyyy
fg2d.mmddyyyy
If the entire file group is there for a particular date, I can process it. If it isn't there, I should not process it. I may have several partial file groups that run over several days. So when I have fg1a.12062017, fg1b.12062017 and fg1c.12062017, I can process that group (fg1) only.
Here is my code so far. It doesn't work because I can't figure out how to get only the full groups to add to the the processing file list.
fileList = Directory.GetFiles(#"c:\temp\");
string[] fileGroup1 = { "FG1A", "FG1B", "FG1C" }; // THIS IS A FULL GROUP
string[] fileGroup2 = { "FG2A", "FG2B", "FG2C", "FG2D" };
List<string> fileDates = new List<string>();
List<string> procFileList;
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
bool allFiles = true;
foreach (string fg in fileGroup1)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
foreach (string fg in fileGroup2)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
Any help or advice would be greatly appreciated.
Because it can sometimes get messy dealing with multiple lists, groupings, and parsing file names, I would start by creating a class that represents a FileGroupItem. This class would have a Parse method that takes in a file path, and then has properties that represent the group part and date part of the file name, as well as the full path to the file:
public class FileGroupItem
{
public string DatePart { get; set; }
public string GroupName { get; set; }
public string FilePath { get; set; }
public static FileGroupItem Parse(string filePath)
{
if (string.IsNullOrWhiteSpace(filePath)) return null;
// Split the file name on the '.' character to get the group and date parts
var fileParts = Path.GetFileName(filePath).Split('.');
if (fileParts.Length != 2) return null;
return new FileGroupItem
{
GroupName = fileParts[0],
DatePart = fileParts[1],
FilePath = filePath
};
}
}
Then, in my main code, I would create a list of the file group definitions, and then populate a list of FileGroupItems from the directory we're scanning. After that, we can determine if any file group definition is complete by comparing it's items (in a case-insensitive way) to the actual FileGroupItems we found in the directory (after first grouping the FileGroupItems by it's DatePart). If the intersection of these two lists has the same number of items as the file group definition, then it's complete and we can process that group.
Maybe it will make more sense in code:
private static void Main()
{
var scanDirectory = #"f:\public\temp\";
var processedDirectory = #"f:\public\temp2\";
// The lists that define a complete group
var fileGroupDefinitions = new List<List<string>>
{
new List<string> {"FG1A", "FG1B", "FG1C"},
new List<string> {"FG2A", "FG2B", "FG2C", "FG2D"}
};
// Populate a list of FileGroupItems from the files
// in our directory, and group them on the DatePart
var fileGroups = Directory.EnumerateFiles(scanDirectory)
.Select(FileGroupItem.Parse)
.GroupBy(f => f.DatePart);
// Now go through each group and compare the items
// for that date with our file group definitions
foreach (var fileGroup in fileGroups)
{
foreach (var fileGroupDefinition in fileGroupDefinitions)
{
// Get the intersection of the group definition and this file group
var intersection = fileGroup
.Where(f => fileGroupDefinition.Contains(
f.GroupName, StringComparer.OrdinalIgnoreCase))
.ToList();
// If all the items in the definition are there, then process the files
if (intersection.Count == fileGroupDefinition.Count)
{
foreach (var fileGroupItem in intersection)
{
Console.WriteLine($"Processing file: {fileGroupItem.FilePath}");
// Move the file to the processed directory
File.Move(fileGroupItem.FilePath,
Path.Combine(processedDirectory,
Path.GetFileName(fileGroupItem.FilePath)));
}
}
}
}
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
I think you could simplify your algorithm so you just have file groups as a prefix and a number of files to expect, fg1 is 3 files for a given date
I think your code to find the distinct dates present is a good idea, though you should use a hash set rather than a list, if you occasionally expect a large number of dates.. ("Valentine's Day?" - Ed)
Then you just need to work on the other loop that does the checking. An algorithm like this
//make a new Dictionary<string,int> for the filegroup prefixes and their counts3
//eg myDict["fg1"] = 3; myDict["fg2"] = 4;
//list the files in the directory, into an array of fileinfo objects
//see the DirectoryInfo.GetFiles method
//foreach string d in the list of dates
//foreach string fgKey in myDict.Keys - the list of group prefixes
//use a bit of Linq to get all the fileinfos with a
//name starting with the group and ending with the date
var grplist = myfileinfos.Where(fi => fi.Name.StartsWith(fg) && fi.Name.EndsWith(d));
//if the grplist.Count == the filegroup count ( myDict[fgKey] )
//then send every file in grplist for processing
//remember that grplist is a collection of fileinfo objects,
//if your processing method takes a string filename, use fileinfo.Fullname
Putting your file groupings into one dictionary will make things a lot easier than having them as x separate arrays
I haven't written all the code for you, but I've comment sketched the algorithm, and I've put in some of the more awkward bits like the link, dictionary declaration and how to fill it.. have a go at fleshing it out with code, ask any questions in a comment on this post
First, create an array of the groups to make processing easier:
var fileGroups = new[] {
new[] { "FG1A", "FG1B", "FG1C" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D" }
};
Then you can convert the array into a Dictionary to map each name back to its group:
var fileGroupMap = fileGroups.SelectMany(g => g.Select(f => new { key = f, group = g })).ToDictionary(g => g.key, g => g.group);
Then, preprocess the files you get from the directory:
var fileList = from fname in Directory.GetFiles(...)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
};
Now you can take your fileList and group by date and group, and then filter to just completed groups:
var profFileList = (from file in fileList
group file by new { file.fdate, fgroup = fileGroupMap[file.ffilename] } into fng
where fng.Key.fgroup.All(f => fng.Select(fn => fn.ffilename).Contains(f))
from fn in fng
select fn.fname).ToList();
Since you didn't preserve the groups, I flattened the groups at the end of the query into just a list of files to be processed. If you needed, you could keep them in groups and process the groups instead.
Note: If a file exists that belongs to no group, you will get an error from the lookup in fileGroupMap. If that is a possiblity you can filter the fileList to just known names as follows:
var fileList = from fname in GetFiles
let ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
where fileGroupMap.Keys.Contains(ffilename)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename
};
Also note that having a name in multiple groups will cause an error in the creation of fileGroupMap. If that is a possibility, the queries would become more complex and have to be written differently.
Here is a simple class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] filenames = { "fg1a.12012017", "fg1b.12012017", "fg1c.12012017", "fg2a.12012017", "fg2b.12012017", "fg2c.12012017", "fg2d.12012017" };
new SplitFileName(filenames);
List<List<SplitFileName>> results = SplitFileName.GetGroups();
}
}
public class SplitFileName
{
public static List<SplitFileName> names = new List<SplitFileName>();
string filename { get; set; }
string prefix { get; set; }
string letter { get; set; }
DateTime date { get; set; }
public SplitFileName() { }
public SplitFileName(string[] splitNames)
{
foreach(string name in splitNames)
{
SplitFileName splitName = new SplitFileName();
names.Add(splitName);
splitName.filename = name;
string[] splitArray = name.Split(new char[] { '.' });
splitName.date = DateTime.ParseExact(splitArray[1],"MMddyyyy", System.Globalization.CultureInfo.InvariantCulture);
splitName.prefix = splitArray[0].Substring(0, splitArray[0].Length - 1);
splitName.letter = splitArray[0].Substring(splitArray[0].Length - 1,1);
}
}
public static List<List<SplitFileName>> GetGroups()
{
return names.OrderBy(x => x.letter).GroupBy(x => new { date = x.date, prefix = x.prefix })
.Where(x => string.Join(",",x.Select(y => y.letter)) == "a,b,c,d")
.Select(x => x.ToList())
.ToList();
}
}
}
With everyone's help, I solved it too. This is what I'm going with because it's the most maintainable for me but the solutions were so smart!!! Thanks everyone for your help.
private void CheckFiles()
{
var fileGroups = new[] {
new [] { "FG1A", "FG1B", "FG1C", "FG1D" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D", "FG2E" } };
List<string> fileDates = new List<string>();
List<string> pfiles = new List<string>();
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
// check if a date has all the files
foreach (string fd in fileDates)
{
int fgCount = 0;
// for each file group
foreach (Array masterfg in fileGroups)
{
foreach (string fg in masterfg)
{
// see if all the files are there
bool foundIt = false;
string finder = fg + fd;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
pfiles.Add(fn);
}
}
fgCount++;
}
if (fgCount == pfiles.Count())
{
foreach (string fn in pfiles)
{
procFileList.Add(fn);
}
pfiles.Clear();
}
else
{
pfiles.Clear();
}
}
}
return;
}

fastest starts with search algorithm

I need to implement a search algorithm which only searches from the start of the string rather than anywhere within the string.
I am new to algorithms but from what I can see it seems as though they go through the string and find any occurrence.
I have a collection of strings (over 1 million) which need to be searched everytime the user types a keystroke.
EDIT:
This will be an incremental search. I currently have it implemented with the following code and my searches are coming back ranging between 300-700ms from over 1 million possible strings. The collection isnt ordered but there is no reason it couldnt be.
private ICollection<string> SearchCities(string searchString) {
return _cityDataSource.AsParallel().Where(x => x.ToLower().StartsWith(searchString)).ToArray();
}
I've adapted the code from this article from Visual Studio Magazine that implements a Trie.
The following program demonstrates how to use a Trie to do fast prefix searching.
In order to run this program, you will need a text file called "words.txt" with a large list of words. You can download one from Github here.
After you compile the program, copy the "words.txt" file into the same folder as the executable.
When you run the program, type a prefix (such as prefix ;)) and press return, and it will list all the words beginning with that prefix.
This should be a very fast lookup - see the Visual Studio Magazine article for more details!
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace ConsoleApp1
{
class Program
{
static void Main()
{
var trie = new Trie();
trie.InsertRange(File.ReadLines("words.txt"));
Console.WriteLine("Type a prefix and press return.");
while (true)
{
string prefix = Console.ReadLine();
if (string.IsNullOrEmpty(prefix))
continue;
var node = trie.Prefix(prefix);
if (node.Depth == prefix.Length)
{
foreach (var suffix in suffixes(node))
Console.WriteLine(prefix + suffix);
}
else
{
Console.WriteLine("Prefix not found.");
}
Console.WriteLine();
}
}
static IEnumerable<string> suffixes(Node parent)
{
var sb = new StringBuilder();
return suffixes(parent, sb).Select(suffix => suffix.TrimEnd('$'));
}
static IEnumerable<string> suffixes(Node parent, StringBuilder current)
{
if (parent.IsLeaf())
{
yield return current.ToString();
}
else
{
foreach (var child in parent.Children)
{
current.Append(child.Value);
foreach (var value in suffixes(child, current))
yield return value;
--current.Length;
}
}
}
}
public class Node
{
public char Value { get; set; }
public List<Node> Children { get; set; }
public Node Parent { get; set; }
public int Depth { get; set; }
public Node(char value, int depth, Node parent)
{
Value = value;
Children = new List<Node>();
Depth = depth;
Parent = parent;
}
public bool IsLeaf()
{
return Children.Count == 0;
}
public Node FindChildNode(char c)
{
return Children.FirstOrDefault(child => child.Value == c);
}
public void DeleteChildNode(char c)
{
for (var i = 0; i < Children.Count; i++)
if (Children[i].Value == c)
Children.RemoveAt(i);
}
}
public class Trie
{
readonly Node _root;
public Trie()
{
_root = new Node('^', 0, null);
}
public Node Prefix(string s)
{
var currentNode = _root;
var result = currentNode;
foreach (var c in s)
{
currentNode = currentNode.FindChildNode(c);
if (currentNode == null)
break;
result = currentNode;
}
return result;
}
public bool Search(string s)
{
var prefix = Prefix(s);
return prefix.Depth == s.Length && prefix.FindChildNode('$') != null;
}
public void InsertRange(IEnumerable<string> items)
{
foreach (string item in items)
Insert(item);
}
public void Insert(string s)
{
var commonPrefix = Prefix(s);
var current = commonPrefix;
for (var i = current.Depth; i < s.Length; i++)
{
var newNode = new Node(s[i], current.Depth + 1, current);
current.Children.Add(newNode);
current = newNode;
}
current.Children.Add(new Node('$', current.Depth + 1, current));
}
public void Delete(string s)
{
if (!Search(s))
return;
var node = Prefix(s).FindChildNode('$');
while (node.IsLeaf())
{
var parent = node.Parent;
parent.DeleteChildNode(node.Value);
node = parent;
}
}
}
}
A couple of thoughts:
First, your million strings need to be ordered, so that you can "seek" to the first matching string and return strings until you no longer have a match...in order (seek via C# List<string>.BinarySearch, perhaps). That's how you touch the least number of strings possible.
Second, you should probably not try to hit the string list until there's a pause in input of at least 500 ms (give or take).
Third, your queries into the vastness should be async and cancelable, because it's certainly going to be the case that one effort will be superseded by the next keystroke.
Finally, any subsequent query should first check that the new search string is an append of the most recent search string...so that you can begin your subsequent seek from the last seek (saving lots of time).
I suggest using linq.
string x = "searchterm";
List<string> y = new List<string>();
List<string> Matches = y.Where(xo => xo.StartsWith(x)).ToList();
Where x is your keystroke search text term, y is your collection of strings to search, and Matches is the matches from your collection.
I tested this with the first 1 million prime numbers, here is the code adapted from above:
Stopwatch SW = new Stopwatch();
SW.Start();
string x = "2";
List<string> y = System.IO.File.ReadAllText("primes1.txt").Split(' ').ToList();
y.RemoveAll(xo => xo == " " || xo == "" || xo == "\r\r\n");
List <string> Matches = y.Where(xo => xo.StartsWith(x)).ToList();
SW.Stop();
Console.WriteLine("matches: " + Matches.Count);
Console.WriteLine("time taken: " + SW.Elapsed.TotalSeconds);
Console.Read();
Result is:
matches: 77025
time taken: 0.4240604
Of course this is testing against numbers and I don't know whether linq converts the values before, or if numbers make any difference.

Reading respective lines of multiple text files to be combined

Firstly, I'm new to C# and I'm having a hard time figuring out how I'd go about doing this.
Basically, I have multiple text files all with their own types of data. My aim is to read the first line of each of these files and combine them into one string so that I can sort them later by their respective days.
For example, in the first line of each file there could be the values...
File 1: 16/02/15
File 2: Monday
File 3: 75.730
File 4: 0.470
File 5: 75.260
File 6: 68182943
So I'd like to combine them in a string like so "16/02/15 Monday 75.730 0.470 75.260 68182943"
I'd also want to do this for the second, third, fourth line etc. There are a total of 144 entries or lines.
Here is the code I have so far. I'm unsure if I'm on the right track.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace BankAlgorithms
{
class Algorithms
{
static void Main(string[] args)
{
//Saves each individual text file into their own string arrays.
string[] Day = File.ReadAllLines(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files\Day.txt");
string[] Date = File.ReadAllLines(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files\Date.txt");
string[] Close = File.ReadAllLines(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files\SH1_Close.txt");
string[] Diff = File.ReadAllLines(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files\SH1_Diff.txt");
string[] Open = File.ReadAllLines(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files\SH1_Open.txt");
string[] Volume = File.ReadAllLines(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files\SH1_Volume.txt");
//Lists all files currently stored within the directory
string[] bankFiles = Directory.GetFiles(#"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files");
Console.WriteLine("Bank Files currently saved within directory:\n");
foreach (string name in bankFiles)
{
Console.WriteLine(name);
}
Console.WriteLine("\nSelect the day you wish to view the data of (Monday-Friday). To view a grouped \nlist of all days, enter \"Day\"\n");
string selectedArray = Console.ReadLine();
if (selectedArray == "Day")
{
Console.WriteLine("Opening Day File...");
Console.WriteLine("\nDays grouped up in alphabetical order\n");
var sort = from s in Day
orderby s
select s;
foreach (string c in sort)
{
Console.WriteLine(c);
}
}
Console.ReadLine();
}
}
}
So this might be a little more than you strictly need, but I think it'll be robust, quite flexible and be able to handle huge files if need be.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication2
{
class Program
{
private static void Main(string[] args)
{
// const string folder = #"C:\Users\computing\Desktop\algorithms\CMP1124M_Assigment_Files";
const string folder = #"C:\Temp\SO";
var filenames = new[] { #"Date.txt", #"Day.txt", #"SH1_Close.txt", #"SH1_Diff.txt", #"SH1_Open.txt", #"SH1_Volume.txt" };
var dataCombiner = new DataCombiner(folder, filenames);
var stockParser = new StockParser();
foreach (var stock in dataCombiner.GetCombinedData(stockParser.Parse)) //can also use where clause here
{
if (ShowRow(stock))
{
var outputText = stock.ToString();
Console.WriteLine(outputText);
}
}
Console.ReadLine();
}
private static bool ShowRow(Stock stock)
{
//use input from user etc...
return (stock.DayOfWeek == "Tuesday" || stock.DayOfWeek == "Monday")
&& stock.Volume > 1000
&& stock.Diff < 10; // etc
}
}
internal class DataCombiner
{
private readonly string _folder;
private readonly string[] _filenames;
public DataCombiner(string folder, string[] filenames)
{
_folder = folder;
_filenames = filenames;
}
private static IEnumerable<string> GetFilePaths(string folder, params string[] filenames)
{
return filenames.Select(filename => Path.Combine(folder, filename));
}
public IEnumerable<T> GetCombinedData<T>(Func<string[], T> parserMethod) where T : class
{
var filePaths = GetFilePaths(_folder, _filenames).ToArray();
var files = filePaths.Select(filePath => new StreamReader(filePath)).ToList();
var lineCounterFile = new StreamReader(filePaths.First());
while (lineCounterFile.ReadLine() != null)// This can be replaced with a simple for loop if the files will always have a fixed number of rows
{
var rawData = files.Select(file => file.ReadLine()).ToArray();
yield return parserMethod(rawData);
}
}
}
internal class Stock
{
public DateTime Date { get; set; }
public string DayOfWeek { get; set; }
public double Open { get; set; }
public double Close { get; set; }
public double Diff { get; set; }
public int Volume { get; set; }
public override string ToString()
{
//Whatever format you want
return string.Format("{0:d} {1} {2} {3} {4} {5}", Date, DayOfWeek, Close, Diff, Open, Volume);
}
}
internal class StockParser
{
public Stock Parse(string[] rawData)
{
//TODO: Error handling required here
var stock = new Stock();
stock.Date = DateTime.Parse(rawData[0]);
stock.DayOfWeek = rawData[1];
stock.Close = double.Parse(rawData[2]);
stock.Diff = double.Parse(rawData[3]);
stock.Open = double.Parse(rawData[4]);
stock.Volume = int.Parse(rawData[5]);
return stock;
}
public string ParseToRawText(string[] rawData)
{
return string.Join(" ", rawData);
}
}
}
PS:
Instead of reading it from the file, I'd rather also calculate the DayOfWeek from the Date.
Also be careful when parsing dates from a different locale (eg. USA vs UK).
If you have an option I'd only use the ISO 8601 datetime format.
Access your file strings from a collection, use this code to read from each file and use a StringBuilder to build your string.
Read only the first few lines of text from a file
var builder = new StringBuilder();
foreach(var file in fileList)
{
using (StreamReader reader = new StreamReader(file))
{
builder.Append(reader.ReadLine());
}
}
return builder.ToString();
You could use following approach: put all in an string[][] first, then it's easier:
string[][] all = { Day, Date, Close, Diff, Open, Volume };
To get the minimum length of all:
int commonRange = all.Min(arr => arr.Length);
Now this is all you need:
string[] merged = Enumerable.Range(0, commonRange)
.Select(i => string.Join(" ", all.Select(arr => arr[i])))
.ToArray();
This is similar to a for-loop from 0 to commonRange where you access all arrays with the same index and use String.Join to get a single string from all files' lines.
Since you have commented that you want to merge only the lines of a specific day:
var lineIndexes = Day.Take(commonRange)
.Select((line, index) => new { line, index })
.Where(x => x.line.TrimStart().StartsWith("Monday", StringComparison.InvariantCultureIgnoreCase))
.Select(x => x.index);
string[] merged = lineIndexes
.Select(i => string.Join(" ", all.Select(arr => arr[i])))
.ToArray();

WPF list filtering

I am new to WPF so this is probably an easy question. I have an app that reads some words from a csv file and stores them in a list of strings. What I am trying to do is parametise this list to show the most popular words in my list. So in my UI I want to have a text box which when I enter a number e.g. 5 would filter the original list leaving only the 5 most popular (frequent) words in the new list. Can anyone assist with this final step? Thanks -
public class VM
{
public VM()
{
Words = LoadWords(fileList);
}
public IEnumerable<string> Words { get; private set; }
string[] fileList = Directory.GetFiles(#"Z:\My Documents\", "*.csv");
private static IEnumerable<string> LoadWords(String[] fileList)
{
List<String> words = new List<String>();
//
if (fileList.Length == 1)
{
try
{
foreach (String line in File.ReadAllLines(fileList[0]))
{
string[] rows = line.Split(',');
words.AddRange(rows);
}
}
catch (Exception ex)
{
System.Windows.MessageBox.Show(ex.Message, "Problem!");
}
}
else
{
System.Windows.MessageBox.Show("Please ensure that you have ONE read file in the source folder.", "Problem!");
}
return words;
}
}
A LINQ query that groups by the word and orders by the count of that word descending should do it. Try this
private static IEnumerable<string> GetTopWords(int Count)
{
var popularWords = (from w in words
group w by w
into grp
orderby grp.Count() descending
select grp.Key).Take(Count).ToList();
return popularWords;
}
You could use CollectionViewSource.GetDefaultView(viewModel.Words), which returns ICollectionView.
ICollectionView exposes Filter property of type Predicate<object>, that you could involve for filtering.
So the common scenario looks like:
ViewModel exposes property PopularCount, that is binded to some textbox in View.
ViewModel listens for PopularCount property's changing.
When notification occured, model obtains ICollectionView for viewModel.Words collection and set up Filter property.
You could find working sample of Filter property usage here. If you get stuck with code, let me know.
Instead of reading all the words into the list and then sorting it based on the frequency, a cleaner approach would be to create a custom class MyWord that stores the word and the frequency. While reading the file, the frequency of the word can be incremented. The class can implement IComparable<T> to compare the words based on the frequency.
public class MyWord : IComparable<MyWord>
{
public MyWord(string word)
{
this.Word = word;
this.Frequency = 0;
}
public MyWord(string word, int frequency)
{
this.Word = word;
this.Frequency = frequency;
}
public string Word { get; private set;}
public int Frequency { get; private set;}
public void IncrementFrequency()
{
this.Frequency++;
}
public void DecrementFrequency()
{
this.Frequency--;
}
public int CompareTo(MyWord secondWord)
{
return this.Frequency.CompareTo(secondWord.Frequency);
}
}
The main class VM would have these members,
public IEnumerable<MyWord> Words { get; private set; }
private void ShowMostPopularWords(int numberOfWords)
{
SortMyWordsDescending();
listBox1.Items.Clear();
for (int i = 0; i < numberOfWords; i++ )
{
listBox1.Items.Add(this.Words.ElementAt(i).Word + "|" + this.Words.ElementAt(i).Frequency);
}
}
And the call to ShowMostPopularWords()
private void Button_Click(object sender, RoutedEventArgs e)
{
int numberOfWords;
if(Int32.TryParse(textBox1.Text, NumberStyles.Integer, CultureInfo.CurrentUICulture, out numberOfWords))
{
ShowMostPopularWords(numberOfWords);
}
}
I'm not sure if grouping and ordering of the 'words' list is what you want but if yes this could be a way of doing it:
int topN = 3;
List<string> topNWords = new List<string>();
string[] words = new string[] {
"word5",
"word1",
"word1",
"word1",
"word2",
"word2",
"word2",
"word3",
"word3",
"word4",
"word5",
"word6",
};
// [linq query][1]
var wordGroups = from s in words
group s by s into g
select new { Count = g.Count(), Word = g.Key };
for (int i = 0; i < Math.Min(topN, wordGroups.Count()); i++)
{
// (g) => g.Count is a [lambda expression][2]
// .OrderBy and Reverse are IEnumerable extension methods
var element = wordGroups.OrderBy((g) => g.Count).Reverse().ElementAt(i);
topNWords.Add(element.Count + " - " + element.Word);
}
Thsi could be made much shorter by using ordering in the linq select clause but I wished to introduce you to inline lambdas and ienumerable extensions too.
The short version could be:
topNWords = (from s in words
group s by s
into g
orderby g.Count() descending
select g.Key).Take(Math.Min(topN, g.Count()).ToList();

Please critique my class

I've taken a few school classes along time ago on and to be honest i never really understood the concept of classes. I recently "got back on the horse" and have been trying to find some real world application for creating a class.
you may have seen that I'm trying to parse a lot of family tree data that is in an very old and antiquated format called gedcom
I created a Gedcom Reader class to read in the file , process it and make it available as two lists that contain the data that i found necessary to use
More importantly to me is i created a class to do it so I would very much like to get the experts here to tell me what i did right and what i could have done better ( I wont say wrong because the thing works and that's good enough for me)
Class:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace GedcomReader
{
class Gedcom
{
private string GedcomText = "";
public struct INDI
{
public string ID;
public string Name;
public string Sex;
public string BDay;
public bool Dead;
}
public struct FAM
{
public string FamID;
public string Type;
public string IndiID;
}
public List<INDI> Individuals = new List<INDI>();
public List<FAM> Families = new List<FAM>();
public Gedcom(string fileName)
{
using (StreamReader SR = new StreamReader(fileName))
{
GedcomText = SR.ReadToEnd();
}
ReadGedcom();
}
private void ReadGedcom()
{
string[] Nodes = GedcomText.Replace("0 #", "\u0646").Split('\u0646');
foreach (string Node in Nodes)
{
string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
if (SubNode[0].Contains("INDI"))
{
Individuals.Add(ExtractINDI(SubNode));
}
else if (SubNode[0].Contains("FAM"))
{
Families.Add(ExtractFAM(SubNode));
}
}
}
private FAM ExtractFAM(string[] Node)
{
string sFID = Node[0].Replace("# FAM", "");
string sID = "";
string sType = "";
foreach (string Line in Node)
{
// If node is HUSB
if (Line.Contains("1 HUSB "))
{
sType = "PAR";
sID = Line.Replace("1 HUSB ", "").Replace("#", "").Trim();
}
//If node for Wife
else if (Line.Contains("1 WIFE "))
{
sType = "PAR";
sID = Line.Replace("1 WIFE ", "").Replace("#", "").Trim();
}
//if node for multi children
else if (Line.Contains("1 CHIL "))
{
sType = "CHIL";
sID = Line.Replace("1 CHIL ", "").Replace("#", "");
}
}
FAM Fam = new FAM();
Fam.FamID = sFID;
Fam.Type = sType;
Fam.IndiID = sID;
return Fam;
}
private INDI ExtractINDI(string[] Node)
{
//If a individual is found
INDI I = new INDI();
if (Node[0].Contains("INDI"))
{
//Create new Structure
//Add the ID number and remove extra formating
I.ID = Node[0].Replace("#", "").Replace(" INDI", "").Trim();
//Find the name remove extra formating for last name
I.Name = Node[FindIndexinArray(Node, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim();
//Find Sex and remove extra formating
I.Sex = Node[FindIndexinArray(Node, "SEX")].Replace("1 SEX ", "").Trim();
//Deterine if there is a brithday -1 means no
if (FindIndexinArray(Node, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BDay = Node[FindIndexinArray(Node, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
// deterimin if there is a death tag will return -1 if not found
if (FindIndexinArray(Node, "1 DEAT ") != -1)
{
//convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
if (Node[FindIndexinArray(Node, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
{
//set death
I.Dead = true;
}
}
}
return I;
}
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}
}
}
Implementation:
using System;
using System.Windows.Forms;
using GedcomReader;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
string path = #"C:\mostrecent.ged";
string outpath = #"C:\gedcom.txt";
Gedcom GD = new Gedcom(path);
GraphvizWriter GVW = new GraphvizWriter("Family Tree");
foreach(Gedcom.INDI I in GD.Individuals)
{
string color = "pink";
if (I.Sex == "M")
{
color = "blue";
}
GVW.ListNode(I.ID, I.Name, "filled", color, "circle");
if (I.ID == "ind23800")
{MessageBox.Show("stop");}
//"ind23800" [ label="Sarah Mandley",shape="circle",style="filled",color="pink" ];
}
foreach (Gedcom.FAM F in GD.Families)
{
if (F.Type == "par")
{
GVW.ConnNode(F.FamID, F.IndiID);
}
else if (F.Type =="chil")
{
GVW.ConnNode(F.IndiID, F.FamID);
}
}
string x = GVW.SB.ToString();
GVW.SaveFile(outpath);
MessageBox.Show("done");
}
}
I am particularly interested in if anything could be done about the structures i don't know if how i use them in the implementation is the greatest but again it works
Thanks alot
Quick thoughts:
Nested types should not be visible.
ValueTypes (structs) should be immutable.
Fields (class variables) should not be public. Expose them via properties instead.
Check passed arguments for invalid values, like null.
It might be more readable. It's hard to read and understand.
You may study SOLID principles (http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod)
Robert C. Martin gave good presentation on Oredev 2008 about clean code (http://www.oredev.org/topmenu/video/agile/robertcmartincleancodeiiifunctions.4.5a2d30d411ee6ffd2888000779.html)
Some recomended books to read about code readability:
Kent Beck "Implemetation patterns"
Robert C Martin "Clean Code" Robert C
Martin "Agile Principles, Patterns
and Practices in C#"
I suggest you check this place out: http://refactormycode.com/.
For some quick things, your naming is the biggest thing I would start to change.
No need to use ALL-CAPS or abbreviated terms.
Also, FxCop will help with a lot of suggested changes. For example, FindIndexinArray would be named FindIndexInArray.
EDIT:
I don't know if this is a bug in your code or by-design, but in FindIndexinArray, you don't break from your loop once you find a match. Do you want the first (break) or last (no break) match in the array?

Categories