Counting names in CSV files - c#

I am trying to write a program for a school project that will read a csv file containing a name on each line and output each name and the number of times it occurrences in a list box. I would prefer for it not to be pre set for a specific name but i guess that would work also. So far i have this but now I'm stuck. The CSV file will have a name on each line and also have a coma after each name. Any help would be great thanks.
This is what I have so far:
string[] csvArray;
string line;
StreamReader reader;
OpenFileDialog openFileDialog = new OpenFileDialog();
//set filter for dialog control
const string FILTER = "CSV Files|*.csv|All Files|*.*";
openFileDialog.Filter = FILTER;
//if user opens file and clicks ok
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
//open input file
reader = File.OpenText(openFileDialog.FileName);
//while not end of stream
while (!reader.EndOfStream)
{
//read line from file
line = reader.ReadLine().ToLower();
//split values
csvArray = line.Split(',');

Using Linq we can do the following:
static IEnumerable<Tuple<int,string>> CountOccurences(IEnumerable<string> data)
{
return data.GroupBy(t => t).Select(t => Tuple.Create(t.Count(),t.Key));
}
Test:
var strings = new List<string>();
strings.Add("John");
strings.Add("John");
strings.Add("John");
strings.Add("Peter");
strings.Add("Doe");
strings.Add("Doe");
foreach (var item in CountOccurences(strings)) {
Console.WriteLine (String.Format("{0} = {1}", item.Item2, item.Item1));
}
John = 3
Peter = 1
Doe = 2
To use in your case:
string filePath = "c:\myfile.txt"
foreach (var item in CountOccurences(File.ReadAllLines(filePath).Select(t => t.Split(',').First())))
Console.WriteLine (String.Format("{0} = {1}", item.Item2, item.Item1));

you can use a dictionary, where you can store the occurrence of each Name:
Dictionary<string,int> NameOcur=new Dictionary<string,int>();
...
while (!reader.EndOfStream)
{
//read line from file
line = reader.ReadLine().ToLower();
//split values
csvArray = line.Split(',');
if (NameOcur.ContainsKey(csvArray[0]))
{
///Name exists in Dictionary increase count
NameOcur[csvArray[0]]++;
}
else
{
//Does not exist add with value 1
NameOcur.Add(csvArray[0],1);
}
}

Related

write a comma delimited list to a data grid view

I'm trying to get a csv file into a data table but there are some things in the csv file that I am trying to omit from being entered into the data table, so I wrote it to a list first.
The csv files that i will using the software for have different sections in it for which I then split the whole list into the separate lists for those sections.
After all that was achieved, i needed to skip some lines in each list and wrote the final form i was happy with to lists respective to previous set of lists.
Now I hit a wall, I need to write each of the lists to a respective data grid.
public partial class Form1 : Form
{
String filePath = "";
//list set 1 = list box
List<String> lines = new List<String>();
List<String> accountList = new List<String>();
List<String> statementList = new List<String>();
List<String> summaryList = new List<String>();
List<String> transactionList = new List<String>();
//list set 2 = dgv
List<String> accountList2 = new List<String>();
List<String> statementList2 = new List<String>();
List<String> summaryList2 = new List<String>();
List<String> transactionList2 = new List<String>();
public Form1()
{
InitializeComponent();
}
private void btn_find_Click(object sender, EventArgs e)
{
try
{
using (OpenFileDialog fileDialog = new OpenFileDialog()
{ Filter = "CSV|* .csv", ValidateNames = true, Multiselect = false })
if (fileDialog.ShowDialog() == DialogResult.OK)
{
String fileName = fileDialog.FileName;
filePath = fileName;
}
try
{
if (File.Exists(filePath))
{
lines = File.ReadAllLines(filePath).ToList();
foreach (String line in lines)
{
String addLine = line.Replace("'", "");
String addLine2 = addLine.Replace("\"", "");
String str = line.Substring(0, 1);
int num = int.Parse(str);
if (addLine2.Length > 1)
{
String addLine3 = addLine2.Substring(2);
switch (num)
{
case 2:
accountList.Add(addLine3);
break;
case 3:
statementList.Add(addLine3);
break;
case 4:
summaryList.Add(addLine3);
break;
case 5:
transactionList.Add(addLine3);
break;
}
}
}
}
else
{
MessageBox.Show("Invalid file chosen, choose an appropriate CSV file and try again.");
}
transactionLB.DataSource = transactionList;
//var liness = transactionList;
//foreach (string line in liness.Skip(2))
// transactionList2.Add(line);
//Console.WriteLine(transactionList2);
//var source = new BindingSource();
//source.DataSource = transactionList2;
//trans_dgv.DataSource = source;
accountLB.DataSource = accountList;
summaryLB.DataSource = summaryList;
statementLB.DataSource = statementList;
}
catch (Exception)
{
MessageBox.Show("Cannot load CSV file, Ensure that a valid CSV file is selected and try again.");
}
}
catch (Exception)
{
MessageBox.Show("Cannot open File Explorer, Something is wrong :(");
}
}
}
EDIT 1:
the table has the following columns for the transaction lists (each of the lists have different columns) :
'Number' , 'Date' , 'Description1' , 'Description2' , 'Description3' , 'Amount' , 'Balance' , 'Accrued Charges'
an example of data in the lines of the transaction list:
9, 02 Sep, Petrol Card Purchase, Shell Kempton Park, 968143*7188 30 Aug, -714.45, -10661.88, 5.5
some liness do contain null values.
If I understand correctly, it seems like you're wanting to get the value of the strings in your string list to appear in your DataGridViews. This can be a little tricky because the DataGridView needs to know which property to display and strings only have the Length property (which probably isn't what you're looking for). There are a lot of ways to go about getting the data you want into the DataGridView. For example you could use DataTables and choose which column you want displayed in the DataGridView. If you want to stick with using string lists, I think you could get this to work by modifying your DataSource line to look something like this:
transactionLB.DataSource = transactionList.Select(x => new { Value = x} ).ToList();
I hope this helps! Let me know if I've misunderstood your question. Thanks!

Data separated by column in C# from a text file

File txt
I have this file in text and need to organize ordered in table.
OBS: need to be console app c #
I did it only:
StreamReader sr = new StreamReader(#"filepatch.txt");
string ler = sr.ReadLine();
string linha = ";";
int cont = 0;
while((linha = sr.ReadLine())!= null)
{
string col = linha.Split(';')[2];
cont++;
Console.WriteLine("{0} : {1}", cont, linha);
}
Try this to get the file text:
var lines = System.IO.File.ReadAllLines(#"filepatch.txt");
Then you can use the returned string[] to carry out the rest of your logic.
foreach(var line in lines)
{
string[] cols = line.Split(';');
// Your logic here.
}
Cheers!

How do I filter Directory.GetFiles() by a numeric range when file names are listed in numeric order?

I want filter which files are getting returned from the Directory.GetFiles() function. The files in the directory are all text files named with 6 digit numbers in incremental order (for example: "200501.txt", "200502.txt", "200503.txt", and so on), I would like to enter a "Starting Invoice Number" and "Ending Invoice Number" through 2 text box controls to return only the files within that range.
The current code is as follows...
using (var fbd = new FolderBrowserDialog())
{
DialogResult result = fbd.ShowDialog();
if (result == DialogResult.OK && !string.IsNullOrWhiteSpace(fbd.SelectedPath))
{
string[] fileDir = Directory.GetFiles(fbd.SelectedPath);
string[] files = fileDir;
foreach (string loopfile in files)
{
int counter = 0;
string line;
//Gets invoice number from text file name
//This strips all unnecessary strings out of the directory and file name
//need to change substring 32 to depending directory using
string loopfileName = loopfile.Substring(32);
string InvoiceNumberLong = Path.GetFileName(loopfile);
string InvoiceNumber = InvoiceNumberLong.Substring(0,(InvoiceNumberLong.Length - 4)).ToString();
var controlCount = new List<string>();
var EndCount = new List<string>();
//Read through text file line by line to find all instances of "control" and "------" string
//adds all line position of these strings to lists
System.IO.StreamReader file = new System.IO.StreamReader(loopfile);
while ((line = file.ReadLine()) != null)
{
if (line.Contains("Control"))
{
controlCount.Add(counter.ToString());
}
if (line.Contains("------"))
{
EndCount.Add(counter.ToString());
}
counter++;
}
}
}
}
Thank you in advance!
You can't use the built in filter that the GetFiles method provides, that can only filter by wild cards. You can do it with some LINQ:
var files = Directory.EnumerateFiles(path, "*.txt")
.Where(d => int.TryParse(Path.GetFileNameWithoutExtension(d), out var value) && value > min && value < max);
Note: Using C#7 out var but can be converted to previous versions if you are not using the latest.

How to efficiently cross reference 2 text files? | Improve my code

Below an outline of what my code does:
Read TextFileA which has 150k lines.
Read TextFileB which has 150k lines and is a cross reference list for TextFileA.
.Split both text files and match specified elements.
Finally, output a 3rd text file which will contain values from both TextFileA and TextFileB.
The below code runs well until about 13,000 lines in and then the program becomes exceedingly slow.
Could someone explain why the program becomes exponentially slower and how I could improve on this code? Thanks.
private void BT_Xref_Click(object sender, EventArgs e)
{
//grabs file path from text box
string ManifestPath = TB_Manifest.Text;
//grabs parent directory from file path
string directoryName = Path.GetDirectoryName(ManifestPath);
//creates a new folder for the final output text file
string pathString = Path.Combine(directoryName, "Final Index");
Directory.CreateDirectory(pathString);
//list for matching text lines which will eventually be output to the final text file
List<string> NewData = new List<string>();
//initializes StreamReader for the first text file
StreamReader ManifestReader = new StreamReader(ManifestPath);
String[] ManifestArray = File.ReadAllLines(ManifestPath);
List<string> RemoveManifest = new List<string>(ManifestArray);
//initializes StreamReader for the second text file
StreamReader OutputReader = new StreamReader(TB_Complete.Text);
String[] OutputArray = File.ReadAllLines(TB_Complete.Text);
List<string> RemoveOutput = new List<string>(OutputArray);
//initializes a count which decides at what point a text file should be created
int shortcount = 0;
//.ReadLine is initialized to ignore the first line in both text files
string ManifestLine = ManifestReader.ReadLine();
string OutputLine = OutputReader.ReadLine();
foreach (string mfile in ManifestArray)
{
ManifestLine = ManifestReader.ReadLine();
string ManifestElement = ManifestLine.Split(',')[6];
string ManifestElement2 = ManifestLine.Split(',')[5];
//value to be retreived and output to final text file
string ManifestElementDate = ManifestElement2.Replace("/", "-");
//value to be compared with the other text file
string ManifestNoExt = Regex.Replace(ManifestElement, ("(\\.\\w+$)"),"");
//resets OutpuReader reader to ensure no lines are being skipped
OutputReader.BaseStream.Position = 0;
//counting the mfile position in the ManifestArray
//int removeIndex = Array.IndexOf(ManifestArray, mfile);
//remove by resising the array
//Array.Resize(ref ManifestArray, ManifestArray.Length - 1);
foreach (string ofile in OutputArray)
{
OutputLine = OutputReader.ReadLine();
//value to be comapred with other text file
string OutputElement = OutputLine.Split('|')[2];
//if values equal then add the specified line of text to the list.
if (ManifestNoExt.Equals(OutputElement))
{
NewData.Add(OutputLine + "|" + ManifestElementDate);
RemoveManifest.RemoveAll(item => item == ManifestLine);
if (NewData.Count == 1000)
{
//if youve reached the count then output files into a new text file
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
NewData.Clear();
}
break;
}
}
}
//once all line of text have been searched combine all text files in directory
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
String[] SplitTextFiles = Directory.GetFiles(pathString, "*.*", SearchOption.AllDirectories);
using (var FinalIndexFile = File.Create(pathString + "\\FinalIndex.txt"))
{
foreach (var file in SplitTextFiles)
{
using (var input = File.OpenRead(file))
{
input.CopyTo(FinalIndexFile);
}
File.Delete(file);
}
}
//File.WriteAllLines("\\test.txt", Directory.EnumerateFiles(pathString, #"*.txt").SelectMany(file => File.ReadLines(file)));
}
You have an O(nm) algorithm here, and assuming that n and m are the same, its actually an O(n^2). That's not so good and is why its slowing to a crawl (for 150k rows in each file, you are looking at 22500000000 iterations of the inner loop. Not entirely certain what your code is trying to do, but based on the condition if (ManifestNoExt.Equals(OutputElement)), I think you can reduce the complexity drastically as follows:
Read in TextFileA, store values into a Dictionary based on ManifestNoExt as Key and mFile as value.
Next read in TextFileB and iterate over all rows in B and do a lookup in the dictionary that was constructed.
This will give you an algorithm that is O(n) + O(m), which will be fast.
Also, I am not sure why you are reading in the entire files and then reading them in again inside the loops (the contents of ManifestArray and OutputArray is the same as the files). That is certainly a cause for slow down as well since you are going to end up hammering the file system.
A completely untested version of this idea:
private void BT_Xref_Click(object sender, EventArgs e)
{
//grabs file path from text box
string ManifestPath = TB_Manifest.Text;
//grabs parent directory from file path
string directoryName = Path.GetDirectoryName(ManifestPath);
//creates a new folder for the final output text file
string pathString = Path.Combine(directoryName, "Final Index");
Directory.CreateDirectory(pathString);
//list for matching text lines which will eventually be output to the final text file
List<string> NewData = new List<string>();
String[] ManifestArray = File.ReadAllLines(ManifestPath);
List<string> RemoveManifest = new List<string>(ManifestArray);
String[] OutputArray = File.ReadAllLines(TB_Complete.Text);
List<string> RemoveOutput = new List<string>(OutputArray);
//initializes a count which decides at what point a text file should be created
int shortcount = 0;
//.ReadLine is initialized to ignore the first line in both text files
string ManifestLine = ManifestReader.ReadLine();
string OutputLine = OutputReader.ReadLine();
Dictionary<string, Tuple<string, string>> ManifestMap = new Dictionary<string, Tuple<string, string>>();
foreach (string mfile in ManifestArray.Skip(1))
{
string ManifestLine = mfile;
string ManifestElement = ManifestLine.Split(',')[6];
string ManifestElement2 = ManifestLine.Split(',')[5];
//value to be retreived and output to final text file
string ManifestElementDate = ManifestElement2.Replace("/", "-");
//value to be compared with the other text file
string ManifestNoExt = Regex.Replace(ManifestElement, ("(\\.\\w+$)"),"");
ManifestMap.Add(ManifestNoExt, Tuple.Create(ManifestElementDate, ManifestLine));
//counting the mfile position in the ManifestArray
//int removeIndex = Array.IndexOf(ManifestArray, mfile);
//remove by resising the array
//Array.Resize(ref ManifestArray, ManifestArray.Length - 1);
}
foreach (string ofile in OutputArray.Skip(1))
{
//value to be compared with other text file
string OutputElement = OutputLine.Split('|')[2];
//if values equal then add the specified line of text to the list.
if (ManifestMap.ContainsKey(OutputElement))
{
NewData.Add(OutputLine + "|" + ManifestMap[OutputElement].Item1);
RemoveManifest.RemoveAll(item => item == ManifestMap[OutputElement].Item2);
if (NewData.Count == 1000)
{
//if youve reached the count then output files into a new text file
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
NewData.Clear();
}
break;
}
}
//once all line of text have been searched combine all text files in directory
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
String[] SplitTextFiles = Directory.GetFiles(pathString, "*.*", SearchOption.AllDirectories);
using (var FinalIndexFile = File.Create(pathString + "\\FinalIndex.txt"))
{
foreach (var file in SplitTextFiles)
{
using (var input = File.OpenRead(file))
{
input.CopyTo(FinalIndexFile);
}
File.Delete(file);
}
}
//File.WriteAllLines("\\test.txt", Directory.EnumerateFiles(pathString, #"*.txt").SelectMany(file => File.ReadLines(file)));
}

C# Edit string in file - delete a character (000)

I am rookie in C#, but I need solve one Problem.
I have several text files in Folder and each text files has this structure:
IdNr 000000100
Name Name
Lastname Lastname
Sex M
.... etc...
Load all files from Folder, this is no Problem ,but i need delete "zero" in IdNr, so delete 000000 and 100 leave there. After this file save. Each files had other IdNr, Therefore, it is harder :(
Yes, it is possible each files manual edit, but when i have 3000 files, this is not good :)
Can C# one algorithm, which could this 000000 delete and leave only number 100?
Thank you All.
Vaclav
So, thank you ALL !
But in the End I have this Code :-) :
using System.IO;
namespace name
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Browse_Click(object sender, EventArgs e)
{
DialogResult dialog = folderBrowserDialog1.ShowDialog();
if (dialog == DialogResult.OK)
TP_zdroj.Text = folderBrowserDialog1.SelectedPath;
}
private void start_Click(object sender, EventArgs e)
{
try
{
foreach (string file in Directory.GetFiles(TP_zdroj.Text, "*.txt"))
{
string text = File.ReadAllText(file, Encoding.Default);
text = System.Text.RegularExpressions.Regex.Replace(text, "IdNr 000*", "IdNr ");
File.WriteAllText(file, text, Encoding.Default);
}
}
catch
{
MessageBox.Show("Warning...!");
return;
}
{
MessageBox.Show("Done");
}
}
}
}
Thank you ALL ! ;)
You can use int.Parse:
int number = int.Parse("000000100");
String withoutzeros = number.ToString();
According to your read/save file issue, do the files contain more than one record, is that the header or does each record is a list of key and value like "IdNr 000000100"? It's difficult to answer without these informations.
Edit: Here's a simple but efficient approach which should work if the format is strict:
var files = Directory.EnumerateFiles(path, "*.txt", SearchOption.TopDirectoryOnly);
foreach (var fPath in files)
{
String[] oldLines = File.ReadAllLines(fPath); // load into memory is faster when the files are not really huge
String key = "IdNr ";
if (oldLines.Length != 0)
{
IList<String> newLines = new List<String>();
foreach (String line in oldLines)
{
String newLine = line;
if (line.Contains(key))
{
int numberRangeStart = line.IndexOf(key) + key.Length;
int numberRangeEnd = line.IndexOf(" ", numberRangeStart);
String numberStr = line.Substring(numberRangeStart, numberRangeEnd - numberRangeStart);
int number = int.Parse(numberStr);
String withoutZeros = number.ToString();
newLine = line.Replace(key + numberStr, key + withoutZeros);
newLines.Add(line);
}
newLines.Add(newLine);
}
File.WriteAllLines(fPath, newLines);
}
}
Use TrimStart
var trimmedText = number.TrimStart('0');
This should do it. It assumes your files have a .txt extension, and it removes all occurrences of "000000" from each file.
foreach (string fileName in Directory.GetFiles("*.txt"))
{
File.WriteAllText(fileName, File.ReadAllText(fileName).Replace("000000", ""));
}
These are the steps you would want to take:
Loop each file
Read file line by line
for each line split on " " and remove leading zeros from 2nd element
write the new line back to a temp file
after all lines processed, delete original file and rename temp file
do next file
(you can avoid the temp file part by reading each file in full into memory, but depending on your file sizes this may not be practical)
You can remove the leading zeros with something like this:
string s = "000000100";
s = s.TrimStart('0');
Simply, read every token from the file and use this method:
var token = "000000100";
var result = token.TrimStart('0');
You can write a function similar to this one:
static IEnumerable<string> ModifiedLines(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
string[] tokens = line.Split(new char[] { ' ' });
line = string.Empty;
foreach (var token in tokens)
{
line += token.TrimStart('0') + " ";
}
yield return line;
}
}
}
Usage:
File.WriteAllLines(file, ModifiedLines(file));

Categories