Count rows in a text file that meet a condition

Count rows in a text file that meet a condition - c#

I have 2 scripts from Microsoft's LINQ samples. The first one will count all the lines of text in a text file. The second one will list only the records that meet a certain condition.
How can I apply the same condition to the first counting script?
string[] records = File.ReadAllLines(#"C:\Reports\MyReports.txt");
try
{
int numberOfRecords = records.Count();
Console.WriteLine(
"There are {0} records in the text file.",
numberOfRecords);
}
catch (OverflowException)
{
Console.WriteLine("The count is too large to store as an Int32.");
Console.WriteLine("Try using the LongCount() method instead.");
}
var targetLines = File.ReadAllLines(#"C:\Reports\MyReports.txt")
.Select((x, i) => new { Line = x, LineNumber = i })
.Where( x => x.Line.Contains(".dwg"))
.ToList();
foreach (var line in targetLines)
{
Console.WriteLine("{0} : {1}", line.LineNumber, line.Line);
}
File.WriteAllText (#"C:\Reports\MyReports2.txt", Util.ToCsvString (targetLines));

How can I apply the same condition to the first counting script?
Like this:
int numberOfRecords = records.Count(x => x.Line.Contains(".dwg"));
The idea is to change the method that you are calling: instead of the parameterless* one, call the overload that takes a condition.
* Technically, Count() takes a single parameter - the list to which it is applied. The parameter is not visible, because it is passed implicitly using the extension method syntax.

Related

Comparing Names in array to themselves

I have an Array of names I sorted alphabetically. Many names repeat themselves and I track each occurrence as Popularity I have been trying to figure out how I can compare each Index and the one next to it to see if its the same name or not. Each time the same name appears I have a counter that ticks up, when it reaches a different name it checks its occurrence vs "foundNamePop" it stores the counter in a separate variable and resets. The problem is that some Arrays as input have the same name repeating at the end of the array (i.e. Lane, Lane, Lane \0) it leaves out of my IF LOOP and doesn't store it because I just have only the "nameCounter++". I just can't seem to find the solution to making sure it reads every name and store it all no matter if there are multiple names at the end or single names that are different i.e.(Lane, Dane, Bane \0).
Let me also add these .txt files can contain ~50 thousand names and I have no idea what names are in there.
Why does that ending If statement not work it just enters like normal. I ran with debugging and i watched it just slip right into the function even when .ElementsAt(i).Value > (5 for this instance)
var dict = new ConcurrentDictionary<string,int>(StringComparer.OrdinalIgnoreCase);
foreach (var name in updatedName)
{
dict.AddOrUpdate(name, 1, (_, count) => ++count);
}
for (int i = 0; i < dict.Count; i++)
{
if (dict.ElementsAt(i).Value <= foundNamePop);
{
lessPopNameSum += dict.ElementAt(i).Value;
}
}

The simple solution is to add a check after the loop
if (foundNamePop >= nameCounter)
{
lessPopNameSum += nameCounter;
}
But it is not clear to me what you are actually computing, it looks like you are summing the duplicate names that have more duplicates than foundNamePop, but it is not clear what value this has, nor what actual meaning the result will have.
You should be able to use LINQ to get something similar with less code:
var lessPopNameSum = sameLengthName
.GroupBy(n => n)
.Select(group => group.Count())
.Where(c => c >= foundNamePop)
.Sum();

Although I like the elegance of the other posted solution another alternative could be to use a Dictionary to store a count of each of the names.
const int FoundNamePop = 2;
var names = new string[] { "Bill", "Jane", "Jeff", "Rebecca", "Bill" };
var count = FindPopularNames(names)
.Where(kvp => kvp.Value < FoundNamePop)
.Sum(kvp => kvp.Value);
// With 'FoundNamePop' set to two, the below line will print '3'.
Console.WriteLine($"Count: {count}");
static IDictionary<string, int> FindPopularNames(IEnumerable<string> names)
{
var dict = new ConcurrentDictionary<string, int>
(StringComparer.OrdinalIgnoreCase);
foreach (var name in names)
{
dict.AddOrUpdate(name, 1, (_, count) => ++count);
}
return dict;
}

How to check how many times a number appears in a specific line inside a text file

So this is my code so far. Its a mess I know, any ideas about how I can make it work?
I want to output how many times a number appears inside a textfile. I want to get the numbers from a specific line inside the code that starts with the letter Time
Count the total number of members across all 3 time slots
The text file is like this:
*****************************
Participant: 1
Location: UK
Name: George
Phone Number: 69347653633
Time Slot: 1
*****************************
*****************************
Participant: 2
Location: FR
Name: Alex
Phone Number: 69635343623
Time Slot: 2
*****************************
*****************************
Participant: 3
Location: gr
Name: Maria
Phone Number: 694785896
Time Slot: 3
*****************************
For example, I want an output like this:
Total Member Registered for Slot 1: 5
Total Member Registered for Slot 2: 1
Total Member Registered for Slot 3: 3
The numbers are in a range of 1 to 3
The Output that i get so far is:
1 was found 1 times
1 was found 1 times
1 was found 1 times
2 was found 1 times
3 was found 1 times
1 was found 1 times
1 was found 1 times
1 was found 1 times
1 was found 1 times
1 was found 1 times
Any ideas about how I can improve it and fix it?
public static void ReadNumbers()
{
// Declare list
string[] lines = File.ReadAllLines("text.TXT");
IEnumerable<string> selectLines = lines.Where(line => line.StartsWith("Time"));
foreach (var item in selectLines)
{
var getNumbers = (from num in item where char.IsNumber(num) select num).ToArray();
//Console.WriteLine(new string(getNumbers));
getNumbers.ToArray();
foreach (var group in getNumbers.GroupBy(n => n))
{
Console.WriteLine("{0} was found {1} times", group.Key, group.Count());
}
}
}

Updated Answer
I see that you have made a few edits to your original question and left comments on another answer which leaves this answer in need of a few adjustments. Specifically, you appear to want the results to be displayed in ascending order (from slot 1 to 3). You also said that:
I want also if a slot has not any appear in the file to display Number 2 appeared 0 times.
So here is my proposed solution:
public static void ReadNumbers()
{
string[] lines = File.ReadAllLines("text.TXT");
var groups = lines
.Where(line => line.StartsWith("Time"))
.Select(line => Int32.Parse(new String(line.Where(Char.IsDigit).ToArray())))
.GroupBy(number => number);
for(int i = 1; i <= 3; i ++)
{
var count = groups.FirstOrDefault(group => group.Key == i)?.Count() ?? 0;
Console.WriteLine($"Total Members Registered for Slot {i}: {count}");
}
}
Note: This code is untested but should work.
I would also like to add that it is generally not considered good etiquette to make changes to your question after accepting an answer, such that the changes require a change to said answer. Typically you would ask a new question in such a case.
Original Answer
Here's how I would do it:
public static void ReadNumbers()
{
string[] lines = File.ReadAllLines("text.TXT");
var groups = lines
.Where(line => line.StartsWith("Time"))
.Select(line => Int32.Parse(new string(line.Where(Char.IsDigit).ToArray())))
.GroupBy(number => number);
foreach(var group in groups)
Console.WriteLine($"{group.Key} appeared: {group.Count()} times");
}
Note that this approach assumes that your file follows the same format that you showed in your question.
It will also throw an error should your file have any occurrences of "Time" without also containing a number in the same line. For example, if your file contains a line like: "Time Slot: " or "Time Slot: SomeValueThatIsNotANumber" then it will throw.

If you want to print every time slot even if they are not used, a general solution would be:
public static void ReadNumbers()
{
string[] lines = File.ReadAllLines("text.TXT");
// timeSlots[i] - how many members are registered in i-th time slot
// 4 is number of time slots minus 1 (we skip the 0th element for convenience)
int[] timeSlots = new int[4];
var groups = lines
.Where(line => line.StartsWith("Time"))
.Select(line => Int32.Parse(new string(line.Where(Char.IsDigit).ToArray())))
.GroupBy(number => number);
foreach (var group in groups)
{
// group.Key - occupied time slot number
// group.Count() - how many members in the occupied time slot
if (group.Key < timeSlots.Count())
{
timeSlots[group.Key] = group.Count();
}
}
for (int i = 1; i < timeSlots.Count(); i++)
{
Console.WriteLine($"Time slot {i} appeared: {timeSlots[i]} times");
}
}

Here's an alternative solution using a different approach, not the best, but an alternative. A few notes:
Could use better error checking?
You have a lazy IEnumerable<Participant> with all their properties; you now can use them when ever.
You can create a new class that would represent your data (notes in code):
public class Participant
{
public IEnumerable<Participant> Participants {get; set;}
public int ParticipantNumber { get; set; }
public string ParticipantName { get; set; }
public string ParticipantLocation { get; set; }
public long ParticipantPhoneNumber { get; set; }
public int ParticipantTimeSlot { get; set; }
public void GetParticipants()
{
IEnumerable<string> lines = null;
using (OpenFileDialog ofd = new OpenFileDialog())
if (ofd.ShowDialog() == DialogResult.OK)
// remove empty lines and lines that start with *
lines = File.ReadLines(ofd.FileName).Where(line => !string.IsNullOrEmpty(line) && !line.StartsWith("***"));
// if we don't have anything return
if (lines == null)
return;
// get all our participants we can based on just 5 fields
Participants = lines.Select((value, index) => new { value, index})
.GroupBy(grp => grp.index / 5, myVal => myVal.value)
.Select(val => new Participant()
{
ParticipantNumber = int.TryParse(val.Select(s => s).Where(s=> s.StartsWith("Participant:"))
.FirstOrDefault().Replace("Participant:", string.Empty).Trim(), out int parNumber) ? parNumber : 0,
ParticipantLocation = val.Select(s => s).Where(s => s.StartsWith("Location:"))
.FirstOrDefault().Replace("Location:", string.Empty).Trim(),
ParticipantName = val.Select(s => s).Where(s => s.StartsWith("Name:"))
.FirstOrDefault().Replace("Name:", string.Empty).Trim(),
ParticipantPhoneNumber = long.TryParse(val.Select(s => s).Where(s => s.StartsWith("Phone Number:"))
.FirstOrDefault().Replace("Phone Number:", string.Empty).Trim(), out long parPhone) ? parPhone : 0,
ParticipantTimeSlot = int.TryParse(val.Select(s => s).Where(s => s.StartsWith("Time Slot:"))
.FirstOrDefault().Replace("Time Slot:", string.Empty).Trim(), out int parTime) ? parTime : 0
}) ;
}
}
public static class LinqExtentions
{
// Extension method by: Chris St Clair
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(this IEnumerable<T> seq, Func<T, T, bool> condition)
{
T prev = seq.First();
List<T> list = new List<T>() { prev };
foreach (T item in seq.Skip(1))
{
if (condition(prev, item) == false)
{
yield return list;
list = new List<T>();
}
list.Add(item);
prev = item;
}
yield return list;
}
}
Next where ever you want to load these participants at you can throw this in:
Participant participant = new Participant(); // create new instance
participant.GetParticipants(); // actual grab the file and parse it
// here we actually group our participants based on your condition
var query = participant.Participants.GroupBy(p => p.ParticipantTimeSlot).Select(pNew => new { SlotNumber = pNew.ToList()[0].ParticipantTimeSlot, Count = pNew.Count() });
// finally write all the data out
Console.WriteLine(string.Join(Environment.NewLine, query.Select(a => $"Total Member Registered for Slot {a.SlotNumber}: {a.Count}")));
Here's my output:
Based on this file structure:
Update
Here's a query to print their id's and names:
var getNames = participant.Participants.Select(pNew => new { PartName = pNew.ParticipantName, PartNumber = pNew.ParticipantNumber });
Console.WriteLine(string.Join(Environment.NewLine, getNames.Select(a => $"Participant {a.PartNumber} name: {a.PartName}")));
The output of this:
If there's something you don't understand please let my self know, again, comments are through-out code.
Side Note:
You may need to make sure to import a few namespaces as well:
using System.Linq;
using System.Collections.Generic;
References:
Grouping sequential blocks of data using Linq - Chris St Clair

Parsing multiple string lines to numbers

I have such code:
string[] list_lines = System.IO.File.ReadAllLines(#"F:\VS\WriteLines.xls");
System.Console.WriteLine("Contents of Your Database = ");
foreach (var line in list_lines.OrderBy(line => line.Split(';')[3]))
{
Console.WriteLine("\t" + line);
}
I would like to TryParse the list_lines so they are numbers, not strings.
Is it possible to 'bulk' it somehow?
Each line consists of 5 strings after they are Split.
EDIT
I wrote this:
string[] list_lines = System.IO.File.ReadAllLines(#"F:\VS\WriteLines.xls");
int[] newList;
// Display the file contents by using a foreach loop.
System.Console.WriteLine("Contents of Your Database = ");
int.TryParse(list_lines[], out newList);
foreach (var line in newList.OrderBy(line => line.Split(';')[3]))
{
// Use a tab to indent each line of the file.
Console.WriteLine("\t" + line);
}
But I get error on list_lines[] , it says that there must be a value.

Based on your previous question, it seems that you want to order the lines by the 3rd split result as int, then you can do this way :
foreach (var line in list_lines.OrderBy(line =>
{
int lineNo;
var success = int.TryParse(line.Split(';')[3], out lineNo);
if(success) return lineNo;
return int.MaxValue;
}))
{
Console.WriteLine("\t" + line);
}
I'm using int.MaxValue as default for when TryParse fails. This way, failed lines will come last. You can change the default to int.MinValue instead, if you want the failed lines to come first.
By the way, C# naming convention uses camel-case for variables, like lineNo and listLines instead of line_no and list_lines.
To get int[] that corresponds to each line, you can use similar logic, but now in a Select() method instead of OrderBy() :
int[] newList = list_lines.Select(line =>
{
int lineNo;
var success = int.TryParse(line.Split(';')[3], out lineNo);
if(success) return lineNo;
return int.MaxValue; //or whatever default value appropriate
})
.ToArray();

You can use SelectMany to flatten the list.
list_lines.SelectMany(line => line.Split(';')).Select(cell => int.Parse(cell));
If there can be non-number cells and you are looking for positive numbers you can add a Where clause
list_lines.SelectMany(line => line.Split(';')).Where(cell => cell.All(#char => char.IsDigit(#char))).Select(cell => int.Parse(cell));

One way of doing it:
int number;
var intList = list_lines.Select(s => s.Split(';')
.Where(p => Int32.TryParse(p, out number))
.Select(y => Int32.Parse(y)))
.SelectMany(d=>d).ToList();

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

My program creates a .csv file with a persons name and an integer next to them.
Occasionally there are two entries of the same name in the file, but with a different time. I only want one instance of each person.
I would like to take the mean of the two numbers to produce just one row for the name, where the number will be the average of the two existing.
So here Alex Pitt has two numbers. How can I take the mean of 105 and 71 (in this case) to produce a row that just includes Alex Pitt, 88?
Here is how I am creating my CSV file if reference is required.
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}

To start with, you can write your current CreateCsvFile much more simply like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
Now, it can easily be changed to work out the average pace if you have duplicate names, like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}

I would recommend to merge the duplicates before you put everything into the CSV file.
use:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
Now you can simply Iterrate through the new list and search their values in your NList. Use a foreach loop which is looking up the index of the Name value in Nlist. After that you can use the Index to merge the integers with a simple math method.
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
I hope you understand what I was trying to say. However I would recommend using a unique identifier for your persons. Sooner or later 2 persons will appear with the same name (like in a huge company).

Change the code below:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
To
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);

most matched field value

I have a DataTable. I can also use Linq.
In a DataTable have many columns, and rows. One of the column is called as feedCode. its type is string. in database it's length is 7 varchar, nullable.
feedCode may contain values as 9051245, 9051246, 9051247, 9031454, 9021447.
Method must return most matched (in this case starting with 905) value 905 (first 3 character of string)?
thanks.

Try to use this code:
var feedCodes = new string[] { "9051245", "9051246", "9051247", "9051245", "9031454", "9021447" };
var mostOccuring = feedCodes.Where(feedCode => feedCode != null)
.GroupBy(feedCode => feedCode.Length < 3 ? feedCode : feedCode.Substring(0, 3))
.OrderByDescending(group => group.Count())
.FirstOrDefault();
if(mostOccuring == null)
{
//some exception handling
}
else
{
//process mostoccuring.Key
}
this code also handle feedcodes with length less than 3 (even empty strings). If you don't want to use them just filter them out in where statement.

Maybe i didn't understand your question correctly but maybe this will be a starting point for your:
//The feedCodes (i put one in two times, to have one appearing most often)
var values = new string[] { "9051245", "9051246", "9051247", null, "", "9051245", "9031454", "9021447" };
//Just filter the list for filled up values
var query = values.Where(value => !String.IsNullOrEmpty(value))
//and group them by their starting text
.GroupBy(value => value.Substring(0, 3))
//order by the most occuring group first
.OrderByDescending(group => group.Count());
//Iterate over all groups or just take the first one with query.First() or query.FirstOrDefault()
foreach (var group in query)
{
Console.WriteLine(group.Key + " Count: " + group.Count());
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Count rows in a text file that meet a condition - c#

Related

Comparing Names in array to themselves

How to check how many times a number appears in a specific line inside a text file

Parsing multiple string lines to numbers

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

most matched field value

Categories

Resources