Parsing multiple string lines to numbers

Parsing multiple string lines to numbers - c#

I have such code:
string[] list_lines = System.IO.File.ReadAllLines(#"F:\VS\WriteLines.xls");
System.Console.WriteLine("Contents of Your Database = ");
foreach (var line in list_lines.OrderBy(line => line.Split(';')[3]))
{
Console.WriteLine("\t" + line);
}
I would like to TryParse the list_lines so they are numbers, not strings.
Is it possible to 'bulk' it somehow?
Each line consists of 5 strings after they are Split.
EDIT
I wrote this:
string[] list_lines = System.IO.File.ReadAllLines(#"F:\VS\WriteLines.xls");
int[] newList;
// Display the file contents by using a foreach loop.
System.Console.WriteLine("Contents of Your Database = ");
int.TryParse(list_lines[], out newList);
foreach (var line in newList.OrderBy(line => line.Split(';')[3]))
{
// Use a tab to indent each line of the file.
Console.WriteLine("\t" + line);
}
But I get error on list_lines[] , it says that there must be a value.

Based on your previous question, it seems that you want to order the lines by the 3rd split result as int, then you can do this way :
foreach (var line in list_lines.OrderBy(line =>
{
int lineNo;
var success = int.TryParse(line.Split(';')[3], out lineNo);
if(success) return lineNo;
return int.MaxValue;
}))
{
Console.WriteLine("\t" + line);
}
I'm using int.MaxValue as default for when TryParse fails. This way, failed lines will come last. You can change the default to int.MinValue instead, if you want the failed lines to come first.
By the way, C# naming convention uses camel-case for variables, like lineNo and listLines instead of line_no and list_lines.
To get int[] that corresponds to each line, you can use similar logic, but now in a Select() method instead of OrderBy() :
int[] newList = list_lines.Select(line =>
{
int lineNo;
var success = int.TryParse(line.Split(';')[3], out lineNo);
if(success) return lineNo;
return int.MaxValue; //or whatever default value appropriate
})
.ToArray();

You can use SelectMany to flatten the list.
list_lines.SelectMany(line => line.Split(';')).Select(cell => int.Parse(cell));
If there can be non-number cells and you are looking for positive numbers you can add a Where clause
list_lines.SelectMany(line => line.Split(';')).Where(cell => cell.All(#char => char.IsDigit(#char))).Select(cell => int.Parse(cell));

One way of doing it:
int number;
var intList = list_lines.Select(s => s.Split(';')
.Where(p => Int32.TryParse(p, out number))
.Select(y => Int32.Parse(y)))
.SelectMany(d=>d).ToList();

Related

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

My program creates a .csv file with a persons name and an integer next to them.
Occasionally there are two entries of the same name in the file, but with a different time. I only want one instance of each person.
I would like to take the mean of the two numbers to produce just one row for the name, where the number will be the average of the two existing.
So here Alex Pitt has two numbers. How can I take the mean of 105 and 71 (in this case) to produce a row that just includes Alex Pitt, 88?
Here is how I am creating my CSV file if reference is required.
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}

To start with, you can write your current CreateCsvFile much more simply like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
Now, it can easily be changed to work out the average pace if you have duplicate names, like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}

I would recommend to merge the duplicates before you put everything into the CSV file.
use:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
Now you can simply Iterrate through the new list and search their values in your NList. Use a foreach loop which is looking up the index of the Name value in Nlist. After that you can use the Index to merge the integers with a simple math method.
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
I hope you understand what I was trying to say. However I would recommend using a unique identifier for your persons. Sooner or later 2 persons will appear with the same name (like in a huge company).

Change the code below:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
To
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);

Count rows in a text file that meet a condition

I have 2 scripts from Microsoft's LINQ samples. The first one will count all the lines of text in a text file. The second one will list only the records that meet a certain condition.
How can I apply the same condition to the first counting script?
string[] records = File.ReadAllLines(#"C:\Reports\MyReports.txt");
try
{
int numberOfRecords = records.Count();
Console.WriteLine(
"There are {0} records in the text file.",
numberOfRecords);
}
catch (OverflowException)
{
Console.WriteLine("The count is too large to store as an Int32.");
Console.WriteLine("Try using the LongCount() method instead.");
}
var targetLines = File.ReadAllLines(#"C:\Reports\MyReports.txt")
.Select((x, i) => new { Line = x, LineNumber = i })
.Where( x => x.Line.Contains(".dwg"))
.ToList();
foreach (var line in targetLines)
{
Console.WriteLine("{0} : {1}", line.LineNumber, line.Line);
}
File.WriteAllText (#"C:\Reports\MyReports2.txt", Util.ToCsvString (targetLines));

How can I apply the same condition to the first counting script?
Like this:
int numberOfRecords = records.Count(x => x.Line.Contains(".dwg"));
The idea is to change the method that you are calling: instead of the parameterless* one, call the overload that takes a condition.
* Technically, Count() takes a single parameter - the list to which it is applied. The parameter is not visible, because it is passed implicitly using the extension method syntax.

Matching property/value data between two arrays

A certain API call is returning two arrays.
One array contains Property names, e.g
Properties[0] = "Heartbeat"
Properties[1] = "Heartbeat 2"
Properties[2] = "Some Other discovery method"
Another array contains values for the Properties array, e.g
Values[0] = "22/01/2007"
Values[1] = "23/02/2007"
Values[2] = "5/06/2008"
The values and properties array elements match up, e.g Values[0] is always the value of Properties[0], etc.
My aim is to get the most recent "Heartbeat*" value. Note that the Heartbeat properties and values are not always in elements 1 and 2 of the array, so I need to search through them.
Code looks like this:
static DateTime GetLatestHeartBeat(string[] Properties, string[] Values)
{
DateTime latestDateTime = new DateTime();
for(int i = 0; i < Properties.Length; i++)
{
if(Properties[i].Contains("Heart"))
{
DateTime currentDateTime;
DateTime.TryParse(Values[i],out currentDateTime);
if (currentDateTime > LatestDateTime)
latestDateTime = currentDateTime;
}
}
return latestDateTime
}
The above code gives me the desired result, only issue being the loop continues after there are no more Heartbeat values to find. Is there a more efficient way of performing the above?

While this doesnt address performance, I would optimize the query like this:
var latestDateTime = Properties.Select((p, index) =>
new {p, v = DateTime.Parse(Values[index])})
.Where(e => e.p.Contains("Heart"))
.OrderByDescending(e => e.v).First();
Perhaps moving the parse after the where clause would limit the times that it casts.
var latestDateTime = Properties.Select((p, index) =>
new {p, v = Values[index]})
.Where(e => e.p.Contains("Heart"))
.Select(e => DateTime.Parse(e.v))
.Max();
EDIT: Per #dbc's comments, changed .OrderByDescending(e => e).First(); to Max();

I'd find the indices that contain "Heart" (or other key word) in a parallel for loop to speed it up. Then iterate over those indices to find the latest one.
static DateTime GetLatestHeartBeat(string[] props, string[] vals)
{
ConcurrentBag<int> heartIndxs = new ConcurrentBag<int>();
// find the indices of "Heart" in parallel
Parallel.For(0, props.Length,
index =>
{
if (props[index].Contains("Heart"))
{
heartIndxs.Add(index);
}
});
// loop over each heart index to find the latest one
DateTime latestDateTime = new DateTime();
foreach (int i in heartIndxs)
{
DateTime currentDateTime;
if (DateTime.TryParse(vals[i], out currentDateTime) && (currentDateTime > latestDateTime))
latestDateTime = currentDateTime;
}
return latestDateTime;
}
If using DateTime.TryParse is really too slow you could use RegEx to parse the date string and do your own comparison. Honestly I'm not sure that is faster than just using the DateTime.TryParse. Here is a discussion of that topic: Which is Quicker: DateTime.TryParse or Regex

LINQ: Collapsing a series of strings into a set of "ranges"

I have an array of strings similar to this (shown on separate lines to illustrate the pattern):
{ "aa002","aa003","aa004","aa005","aa006","aa007", // note that aa008 is missing
"aa009"
"ba023","ba024","ba025"
"bb025",
"ca002","ca003",
"cb004",
...}
...and the goal is to collapse those strings into this comma-separated string of "ranges":
"aa002-aa007,aa009,ba023-ba025,bb025,ca002-ca003,cb004, ... "
I want to collapse them so I can construct a URL. There are hundreds of elements, but I can still convey all the information if I collapse them this way - putting them all into a URL "longhand" (it has to be a GET, not a POST) isn't feasible.
I've had the idea to separate them into groups using the first two characters as the key - but does anyone have any clever ideas for collapsing those sequences (without gaps) into ranges? I'm struggling with it, and everything I've come up with looks like spaghetti.

So the first thing that you need to do is parse the strings. It's important to have the alphabetic prefix and the integer value separately.
Next you want to group the items on the prefix.
For each of the items in that group, you want to order them by number, and then group items while the previous value's number is one less than the current item's number. (Or, put another way, while the previous item plus one is equal to the current item.)
Once you've grouped all of those items you want to project that group out to a value based on that range's prefix, as well as the first and last number. No other information from these groups is needed.
We then flatten the list of strings for each group into just a regular list of strings, since once we're all done there is no need to separate out ranges from different groups. This is done using SelectMany.
When that's all said and done, that, translated into code, is this:
public static IEnumerable<string> Foo(IEnumerable<string> data)
{
return data.Select(item => new
{
Prefix = item.Substring(0, 2),
Number = int.Parse(item.Substring(2))
})
.GroupBy(item => item.Prefix)
.SelectMany(group => group.OrderBy(item => item.Number)
.GroupWhile((prev, current) =>
prev.Number + 1 == current.Number)
.Select(range =>
RangeAsString(group.Key,
range.First().Number,
range.Last().Number)));
}
The GroupWhile method can be implemented like so:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
And then the simple helper method to convert each range into a string:
private static string RangeAsString(string prefix, int start, int end)
{
if (start == end)
return prefix + start;
else
return string.Format("{0}{1}-{0}{2}", prefix, start, end);
}

Here's a LINQ version without the need to add new extension methods:
var data2 = data.Skip(1).Zip(data, (d1, d0) => new
{
value = d1,
jump = d1.Substring(0, 2) == d0.Substring(0, 2)
? int.Parse(d1.Substring(2)) - int.Parse(d0.Substring(2))
: -1,
});
var agg = new { f = data.First(), t = data.First(), };
var query2 =
data2
.Aggregate(new [] { agg }.ToList(), (a, x) =>
{
var last = a.Last();
if (x.jump == 1)
{
a.RemoveAt(a.Count() - 1);
a.Add(new { f = last.f, t = x.value, });
}
else
{
a.Add(new { f = x.value, t = x.value, });
}
return a;
});
var query3 =
from q in query2
select (q.f) + (q.f == q.t ? "" : "-" + q.t);
I get these results:

most matched field value

I have a DataTable. I can also use Linq.
In a DataTable have many columns, and rows. One of the column is called as feedCode. its type is string. in database it's length is 7 varchar, nullable.
feedCode may contain values as 9051245, 9051246, 9051247, 9031454, 9021447.
Method must return most matched (in this case starting with 905) value 905 (first 3 character of string)?
thanks.

Try to use this code:
var feedCodes = new string[] { "9051245", "9051246", "9051247", "9051245", "9031454", "9021447" };
var mostOccuring = feedCodes.Where(feedCode => feedCode != null)
.GroupBy(feedCode => feedCode.Length < 3 ? feedCode : feedCode.Substring(0, 3))
.OrderByDescending(group => group.Count())
.FirstOrDefault();
if(mostOccuring == null)
{
//some exception handling
}
else
{
//process mostoccuring.Key
}
this code also handle feedcodes with length less than 3 (even empty strings). If you don't want to use them just filter them out in where statement.

Maybe i didn't understand your question correctly but maybe this will be a starting point for your:
//The feedCodes (i put one in two times, to have one appearing most often)
var values = new string[] { "9051245", "9051246", "9051247", null, "", "9051245", "9031454", "9021447" };
//Just filter the list for filled up values
var query = values.Where(value => !String.IsNullOrEmpty(value))
//and group them by their starting text
.GroupBy(value => value.Substring(0, 3))
//order by the most occuring group first
.OrderByDescending(group => group.Count());
//Iterate over all groups or just take the first one with query.First() or query.FirstOrDefault()
foreach (var group in query)
{
Console.WriteLine(group.Key + " Count: " + group.Count());
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing multiple string lines to numbers - c#

One way of doing it: int number; var intList = list_lines.Select(s => s.Split(';') .Where(p => Int32.TryParse(p, out number)) .Select(y => Int32.Parse(y))) .SelectMany(d=>d).ToList();

Related

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

Count rows in a text file that meet a condition

Matching property/value data between two arrays

LINQ: Collapsing a series of strings into a set of "ranges"

most matched field value

Categories

Resources