var [][] array remove specific words - c#

I got a little Problem. I have a .csv with "NaN" values and doubles (0.6034 for example) and I am trying to read just the doubles of the CSV into an array[y][x].
Currently, i read the whole .csv, but I can not manage to remove all "NaN" values afterward. (It should parse through the CSV and just add the Numbers to an array[y][x] and leave all "NaN" out)
My current Code:
var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray()).ToArray(); //reads WHOLE .CSV to array[][]
int max_Rows = 0, j, rank;
int max_Col = 0;
foreach (Array anArray in rows)
{
rank = anArray.Rank;
if (rank > 1)
{
// show the lengths of each dimension
for (j = 0; j < rank; j++)
{
}
}
else
{
}
// show the total length of the entire array or all dimensions
max_Col = anArray.Length; //displays columns
max_Rows++; //displays rows
}
I tried the search but couldn't really find anything that helped me.
I know this is probably really easy but I am new to C#.
The .CSV and the desired outcome:
NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN
This is a sample .csv i have. I should have been more clear, sorry! There is a NaN in every line. and i want it to display like this:
1;5
2;6
3;7
4;8
This is just a sample of the .csv the real csv has arround 60.000 Values... I need to get the input with [y][x] for example [0][0] should display "1" and [2][1] should displays "7" and so on.
Thanks again for all your help!

You could do a filter of your delimited values in the array.
I've modified your code a bit.
File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray().Where(y => y != "NaN").ToArray()).ToArray();

If you want to remove all the lines that contain NAN (typical task for CSV - clearing up all incomplete lines), e.g.
123.0; 456; 789
2.1; NAN; 35 <- this line should be removed (has NaN value)
-5; 3; 18
You can implement it like this
double[][] data = File
.ReadLines(filepath)
.Select(line => line.Split(new char[] {';', '\t'},
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items // Filter first...
.All(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray()) // ... materialize at the very end
.ToArray();
Use string.Join to display rows:
string report = string.Join(Environment.NewLine, data
.Select(line => string.Join(";", line)));
Console.Write(report);
Edit: The actual problem is to take 2nd and 3rd complete columns only from the CSV:
NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN
desired outcome is
[[1, 5], [2, 6], [3, 7], [4, 8]]
implmentation:
double[][] data = File
.ReadLines(filepath)
.Select(line => line
.Split(new char[] {';'},
StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Take(2)
.Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase))
.ToArray())
.Where(items => items.Length == 2)
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray())
.ToArray();
Tests
// 1
Console.Write(data[0][0]);
// 5
Console.Write(data[0][1]);
// 2
Console.Write(data[1][0]);
All values in one go:
string report = string.Join(Environment.NewLine, data
.Select(line => string.Join(";", line)));
Console.Write(report);
Outcome:
1;5
2;6
3;7
4;8
Edit 2: if you want to extract non NaN values only (please, notice that the initial CSV structure will be ruined):
1;2;3 1;2;3
NAN;4;5 4;5 <- please, notice that the structure is lost
6;NAN;7 -> 6;7
8;9;NAN; 8;9
NAN;10;NAN 10
NAN;NAN;11 11
then
double[][] data = File
.ReadLines(filepath)
.Select(line => line
.Split(new char[] {';'},
StringSplitOptions.RemoveEmptyEntries)
.Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
.Where(items => items.Any())
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray())
.ToArray();

Related

How To Repeat Split Method After X Times?

I have a .txt file which I would like to split using the split method. My current code is:
string[] alltext = File.ReadAllText(fullPath).Split(new[] { ',' }, 3);
The problem I now have is that I want it to loop through the whole in a way that it always splits the text into three pieces that belong together. If I have a text with:
testing, testing,
buenooo diasssss
testing, testing,
buenooo diasssss
testing, testing,
buenooo diasssss
(the format here is hard to display, but want to show that they are on different lines, so reading line by line will most likely not be possible)
I want "testing", "testing", "buenooo diasssss" to be dispalyed on my console althought they are on different lines.
If I would do it with lines I would simply loop through each line, but this does not work in this case.
You can first remove "\r\n"(new line) from the text, then split and select the first three items.
var alltext = File.ReadAllText(fullPath).Replace("\r\n","").Split(',').ToList().Take(3);
foreach(var item in alltext)
Console.WriteLine(item);
Edit
If you want all three items to be displayed in one line in the console:
int lineNumber = 0;
var alltext = File.ReadAllText(fullPath).Split(new string[] { "\r\n", "," }, StringSplitOptions.None).ToList();
alltext.RemoveAll(item => item == "");
while (lineNumber * 3 < alltext.Count)
{
var tempList = alltext.Skip(lineNumber * 3).Take(3).ToList(); ;
lineNumber++;
Console.WriteLine("line {0} => {1}, {2}, {3}",lineNumber, tempList[0], tempList[1], tempList[2]);
}
result:
Try this:
var data =
File.ReadLines(fullpath)
.Select((x, n) => (line: x, group: n / 3))
.GroupBy(x => x.group, x => x.line)
.Select(x =>
String
.Concat(x)
.Split(',', StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim()));
That gives me:

Iterating multiple txt files in folder to read them in C#

Problem: I need to iterate through multiple files in a folder and read them. They are .txt files. While reading I need to note what words occured in each file.
For example:
File 1 text: "John is my friend friend" -> words: John, is, my, friend
File 2 text: "John is Mark" -> words: John, is, Mark
Currently I was reading files and then making it one big file, but it does not work like this so I have to read them separately. Old idea:
string[] filesZ = { "1.txt", "2.txt" };
var allLinesZ = filesZ.SelectMany(i => System.IO.File.ReadAllLines(i));
System.IO.File.WriteAllLines("n.txt", allLinesZ.ToArray());
var logFileZ = File.ReadAllLines("n.txt");
So this is the first question, how to iterate through them and reading all of them without making a big file.
The second one will be how to make a counter to all of the words for seperate files, currently for one big file I am using:
var logFileZ = File.ReadAllLines("n.txt");
List<string> LogListZ = new List<string>(logFileZ);
var fi = new Dictionary<string, int>();
LogListZ.ForEach(str => AddToDictionary(fi, str));
foreach (var entry in fi)
{
Console.WriteLine(entry.Key + ": " + entry.Value);
}
This is AddToDictionary:
static void AddToDictionary(Dictionary<string, int> dictionary, string input)
{
input.Split(new[] { ' ', ',', '.', '?', '!', '.' }, StringSplitOptions.RemoveEmptyEntries).ToList().ForEach(n =>
{
if (dictionary.ContainsKey(n))
dictionary[n]++;
else
dictionary.Add(n, 1);
});
}
I was thinking about making a loop through all the files (is it possible?) and inside make a counter that counts word for example John in how many files it was. I don't need a specific file number, just a number of occurence of a word, without counting (like in example file 1) words twice (friend).
You don't have to do much for part one of your question: remove WriteAllLines, remove the ReadAllLines for "n.txt", rename allLinesZ variable to logFileZ, and add ToList or ToArray call:
var logFileZ = filesZ
.SelectMany(i => System.IO.File.ReadAllLines(i))
.ToList();
You can make a counter in one go as well: split each string as you go, feed it to SelectMany, use GroupBy, and convert to dictionary using Count() as the value:
var counts = filesZ
.SelectMany(i => System.IO.File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ', ',', '.', '?', '!', '.' })
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
The call of Distinct() ensures that the same word will not be counted twice if it's in a single file.

Groupby Linq C#

You guys are ruthless, makes a new programmer like me feel real welcome. :)
Alright let me try this one more time if I can correctly explain my situation. Like one of answer below I have a string that contains the following information. This was created using a while loop where each line ends with an Environment.Newline (There is no mistake in the first line, there is actually a blank line).
var s = #"
ABC-123, 80000, 1400
ABC-123, 70000, 1250
ABC-123, 65000, 1200
BCD-234, 90000, 1300
BCD-234, 95000, 1100
XYZ-111, 24000, 1000
XYZ-111, 24000, 1000"
I originally asked if there is a way to group by the first column ie. all ABC-123 are grouped together, the second is summed and the third column is averaged. Please ignore the sum and average, I just need to understand how to group first.
Here's where I get confused, by using this statement for one of the answers below:
var ss = s.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(", ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
.GroupBy(y => y[0]);
I understand what the answer is trying to do but I need help writing the result back into a string (maybe that's not the right choice, I don't know, always open to suggestions)so that I can use StreamWriter to save the result as a csv.
I've tried to understand IEnumerables but all the videos/websites just confuse the hell out of me. I've also tried outputting the results of ss so that maybe if I got a visual representation, then I could rewrite it but when I do I get the following results:
Console.WriteLine(ss);
Console.Read();
System.Linq.GroupedEnumerable3[<>f__AnonymousType01[System.Char],System.Char,<
f__AnonymousType0`1[System.Char]]
The output I would want would be a string that looked like this.
output = "ABC-123, 215000, 1283
BCD-234, 185000, 1200
XYZ-111, 48000, 1000"
Assuming all your inputdata is one long string:
var s = #"ABC-123, 80000, 1400
ABC-123, 70000, 1250
ABC-123, 65000, 1200
BCD-234, 90000, 1300
BCD-234, 95000, 1100
XYZ-111, 24000, 1000
XYZ-111, 24000, 1000";
var ss = s.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries) //split by newlines
.Select(x => x.Split(", ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)) //Split each line by ,
.GroupBy(y => y[0]); //group by first element of array
If it is an array of string already, you can ignore the first split by newlines
EDIT: Well, we cannot explain the theory behind IEnumerable, LINQ and grouping here. There are plenty of excellent tutorials on that. Maybe you should learn some basics first before jumping into such stuff.
But for your particular problem this should do it:
var lines = ss.Select(z => new {cKey = z.Key,
c2sum = z.Select(a=> Convert.ToInt32(a[1])).Sum(),
c3avg = z.Select(a=> Convert.ToInt32(a[2])).Average()});
foreach (var l in lines)
Console.WriteLine("{0}, {1}, {2}", l.cKey, l.c2sum, l.c3avg); //or whatever stream you want to write to
Assuming that your output is an array of strings, one way would be to split the strings by commas and group by the first result:
list.Select(s => s.Split(','))
.GroupBy(a => a[0])
Note that the output will be an IEnumerable<string[]> - if you want the original string just keep it in the original select:
list.Select(s => new {S = s, Parts = s.Split(',')})
.GroupBy(a => a.Parts[0])
.Select(g => new {Key = g.Key, lines = g.Select(a => a.S) } );

C# Parsing text file to extract specific line

I have a text file which contains lot of data each on new line
but i want to extract the lines, which start with the values:
coordinates=(111,222,333)
There are several instances of this line and i would actually want to extract the part "111,222,333"
how can i do this?
Something like
var result = File.ReadAllLines(#"C:\test.txt")
.Where(p => p.StartsWith("coordinates=("))
.Select(p => p.Substring(13, p.IndexOf(')') - 13));
The first line is quite clear, the second line filters for only the lines that starts with coordinates=(, the third line extract the substring (13 is the length of coordinates=()
result is an IEnumerable<string>. You can convert it to an array with result.ToArray()
var text = File.ReadAllText(path);
var result = Regex.Matches(text, #"coordinates=\((\d+),(\d+),(\d+)\)")
.Cast<Match>()
.Select(x => new
{
X = x.Groups[1].Value,
Y = x.Groups[2].Value,
Z = x.Groups[3].Value
})
.ToArray();

C# - How to parse text file (space delimited numbers)?

Given a data file delimited by space,
10 10 10 10 222 331
2 3 3 4 45
4 2 2 4
How to read this file and load into an Array
Thank you
var fileContent = File.ReadAllText(fileName);
var array = fileContent.Split((string[])null, StringSplitOptions.RemoveEmptyEntries);
if you have numbers only and need a list of int as a result, you can do this:
var numbers = array.Select(arg => int.Parse(arg)).ToList();
It depends on the kind of array you want. If you want to flatten everything into a single-dimensional array, go with Alex Aza's answer, otherwise, if you want a 2-dimensional array that maps to the lines and elements within the text file:
var array = File.ReadAllLines(filename)
.Select(line => line.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
.Where(line => !string.IsNullOrWhiteSpace(line)) // Use this to filter blank lines.
.Select(int.Parse) // Assuming you want an int array.
.ToArray();
Be aware that there is no error handling, so if parsing fails, the above code will throw an exception.
You will be interested in StreamReader.ReadLine() and String.Split()
I couldn't get Quick Joe Smith's answer to work, so I modified it. I put the modified code into a static method within a "FileReader" class:
public static double[][] readWhitespaceDelimitedDoubles(string[] input)
{
double[][] array = input.Where(line => !String.IsNullOrWhiteSpace(line)) // Use this to filter blank lines.
.Select(line => line.Split((string[])null, StringSplitOptions.RemoveEmptyEntries))
.Select(line => line.Select(element => double.Parse(element)))
.Select(line => line.ToArray())
.ToArray();
return array;
}
For my application, I was parsing for double as opposed to int. To call the code, try using something like this:
string[] fileContents = System.IO.File.ReadAllLines(openFileDialog1.FileName);
double[][] fileContentsArray = FileReader.readWhitespaceDelimitedDoubles(fileContents);
Console.WriteLine("Number of Rows: {0,3}", fileContentsArray.Length);
Console.WriteLine("Number of Cols: {0,3}", fileContentsArray[0].Length);

Categories