C# - How to parse text file (space delimited numbers)? - c#

Given a data file delimited by space,
10 10 10 10 222 331
2 3 3 4 45
4 2 2 4
How to read this file and load into an Array
Thank you

var fileContent = File.ReadAllText(fileName);
var array = fileContent.Split((string[])null, StringSplitOptions.RemoveEmptyEntries);
if you have numbers only and need a list of int as a result, you can do this:
var numbers = array.Select(arg => int.Parse(arg)).ToList();

It depends on the kind of array you want. If you want to flatten everything into a single-dimensional array, go with Alex Aza's answer, otherwise, if you want a 2-dimensional array that maps to the lines and elements within the text file:
var array = File.ReadAllLines(filename)
.Select(line => line.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
.Where(line => !string.IsNullOrWhiteSpace(line)) // Use this to filter blank lines.
.Select(int.Parse) // Assuming you want an int array.
.ToArray();
Be aware that there is no error handling, so if parsing fails, the above code will throw an exception.

You will be interested in StreamReader.ReadLine() and String.Split()

I couldn't get Quick Joe Smith's answer to work, so I modified it. I put the modified code into a static method within a "FileReader" class:
public static double[][] readWhitespaceDelimitedDoubles(string[] input)
{
double[][] array = input.Where(line => !String.IsNullOrWhiteSpace(line)) // Use this to filter blank lines.
.Select(line => line.Split((string[])null, StringSplitOptions.RemoveEmptyEntries))
.Select(line => line.Select(element => double.Parse(element)))
.Select(line => line.ToArray())
.ToArray();
return array;
}
For my application, I was parsing for double as opposed to int. To call the code, try using something like this:
string[] fileContents = System.IO.File.ReadAllLines(openFileDialog1.FileName);
double[][] fileContentsArray = FileReader.readWhitespaceDelimitedDoubles(fileContents);
Console.WriteLine("Number of Rows: {0,3}", fileContentsArray.Length);
Console.WriteLine("Number of Cols: {0,3}", fileContentsArray[0].Length);

Related

Lists, String Split Options and logic

I am in need of help a little help from someone more advanced. The code i have questions about is the following :
static void Main(string[] args)
{
List<string> numbersAsStrings = Console.ReadLine()
.Split('|')
.Reverse()
.ToList();
List<int> numbers = new List<int>();
foreach (var str in numbersAsStrings)
{
numbers.AddRange(str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(int.Parse)
.ToList()
);
// Zapiswam stoinostite ot stariq List w nov List
// Smeneni sa oshte gore s .Reverse
}
Console.WriteLine(string.Join(" ", numbers));
}
The exercise was as following: Write a program to append several array of numbers.
arrays are separated by ‘|’.
Values are separated by spaces (‘ ’,one or several)
Order the arrays from the last to the first, and their values from left to right.
Can someone explain to me how the code reads the entries on this particular code please. I could not find the solution myself.
Kind regards
Console.ReadLine() will read all the characters typed in after the program has been run, until the user presses enter.
List<string> numbersAsStrings =
Console.ReadLine() // read the input as a string
.Split('|') // split the string into an array (delimited by |)
.Reverse() // reverse the array
.ToList(); // convert the array into a List

var [][] array remove specific words

I got a little Problem. I have a .csv with "NaN" values and doubles (0.6034 for example) and I am trying to read just the doubles of the CSV into an array[y][x].
Currently, i read the whole .csv, but I can not manage to remove all "NaN" values afterward. (It should parse through the CSV and just add the Numbers to an array[y][x] and leave all "NaN" out)
My current Code:
var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray()).ToArray(); //reads WHOLE .CSV to array[][]
int max_Rows = 0, j, rank;
int max_Col = 0;
foreach (Array anArray in rows)
{
rank = anArray.Rank;
if (rank > 1)
{
// show the lengths of each dimension
for (j = 0; j < rank; j++)
{
}
}
else
{
}
// show the total length of the entire array or all dimensions
max_Col = anArray.Length; //displays columns
max_Rows++; //displays rows
}
I tried the search but couldn't really find anything that helped me.
I know this is probably really easy but I am new to C#.
The .CSV and the desired outcome:
NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN
This is a sample .csv i have. I should have been more clear, sorry! There is a NaN in every line. and i want it to display like this:
1;5
2;6
3;7
4;8
This is just a sample of the .csv the real csv has arround 60.000 Values... I need to get the input with [y][x] for example [0][0] should display "1" and [2][1] should displays "7" and so on.
Thanks again for all your help!
You could do a filter of your delimited values in the array.
I've modified your code a bit.
File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray().Where(y => y != "NaN").ToArray()).ToArray();
If you want to remove all the lines that contain NAN (typical task for CSV - clearing up all incomplete lines), e.g.
123.0; 456; 789
2.1; NAN; 35 <- this line should be removed (has NaN value)
-5; 3; 18
You can implement it like this
double[][] data = File
.ReadLines(filepath)
.Select(line => line.Split(new char[] {';', '\t'},
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items // Filter first...
.All(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray()) // ... materialize at the very end
.ToArray();
Use string.Join to display rows:
string report = string.Join(Environment.NewLine, data
.Select(line => string.Join(";", line)));
Console.Write(report);
Edit: The actual problem is to take 2nd and 3rd complete columns only from the CSV:
NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN
desired outcome is
[[1, 5], [2, 6], [3, 7], [4, 8]]
implmentation:
double[][] data = File
.ReadLines(filepath)
.Select(line => line
.Split(new char[] {';'},
StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Take(2)
.Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase))
.ToArray())
.Where(items => items.Length == 2)
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray())
.ToArray();
Tests
// 1
Console.Write(data[0][0]);
// 5
Console.Write(data[0][1]);
// 2
Console.Write(data[1][0]);
All values in one go:
string report = string.Join(Environment.NewLine, data
.Select(line => string.Join(";", line)));
Console.Write(report);
Outcome:
1;5
2;6
3;7
4;8
Edit 2: if you want to extract non NaN values only (please, notice that the initial CSV structure will be ruined):
1;2;3 1;2;3
NAN;4;5 4;5 <- please, notice that the structure is lost
6;NAN;7 -> 6;7
8;9;NAN; 8;9
NAN;10;NAN 10
NAN;NAN;11 11
then
double[][] data = File
.ReadLines(filepath)
.Select(line => line
.Split(new char[] {';'},
StringSplitOptions.RemoveEmptyEntries)
.Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
.Where(items => items.Any())
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray())
.ToArray();

C# Removing Last Element of a string Array?

I am parsing a set of coordinates from an XML file. Each node will have coordinates like:
-82.5,34.1,0.000 -82.6,34.2,0.000
In the code below, the coords_raw variable is already assigned the above value and I am trying to split into array lnglatset --which does look okay.
string[] lnglatset = raw_coords.Split(' ');//will yield like [0]=-82.00,34.00,00000 // Will need to get rid of the last set of zeros
foreach (string lnglat in lnglatset)
{
Console.WriteLine(lnglat);//-82.5,34.1,0.000; looks fine
}
From the above, the final value needed would be:
coords = "-82.5 34.1, -82.6 34.2";//note the space between lng/lat
But how do remove the junk values of 0.000 from each element of the array and put a space, instead of a comma between the lng and lat values in each element? I have tried some remove() function on lnglat but that was not allowed within the foreach loop. Thanks!
You can take all parts except the last one using Take method:
var parts = raw_coords.Split(' ')
.Select(x => x.Split(','))
.Select(x => string.Join(" ", x.Take(x.Length - 1)));
var result = string.Join(",", parts);
In a single line :
String result = String.Join(" ", raw_coords.Split(' ', ',')
.Select(i => double.Parse(i))
.Where(i => i != 0).Select( i => i.ToString()));
it removes each 0.000 element and removes the space and the comma.
You can't alter the members of IEnumerable during a ForEach. Instead, you can just skip the last member when splitting the raw coordinate input.
raw_coords.split(' ').Take(2).ToArray()
Like others mentioned, you can not modify iterating variable with foreach. I learnt it the hard way and ended up using simple "for" loop instead of foreach:
for(int index=0; i<lnglatset.length-2; i++)
{
}
You can use IEnumerable.Last() extension method from System.Linq.
string lastItemOfSplit = aString.Split(new char[] {#"\"[0], "/"[0]}).Last();

reading in text file and spliting by comma in c#

I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}

C# Parsing text file to extract specific line

I have a text file which contains lot of data each on new line
but i want to extract the lines, which start with the values:
coordinates=(111,222,333)
There are several instances of this line and i would actually want to extract the part "111,222,333"
how can i do this?
Something like
var result = File.ReadAllLines(#"C:\test.txt")
.Where(p => p.StartsWith("coordinates=("))
.Select(p => p.Substring(13, p.IndexOf(')') - 13));
The first line is quite clear, the second line filters for only the lines that starts with coordinates=(, the third line extract the substring (13 is the length of coordinates=()
result is an IEnumerable<string>. You can convert it to an array with result.ToArray()
var text = File.ReadAllText(path);
var result = Regex.Matches(text, #"coordinates=\((\d+),(\d+),(\d+)\)")
.Cast<Match>()
.Select(x => new
{
X = x.Groups[1].Value,
Y = x.Groups[2].Value,
Z = x.Groups[3].Value
})
.ToArray();

Categories