I have a .txt file which I would like to split using the split method. My current code is:
string[] alltext = File.ReadAllText(fullPath).Split(new[] { ',' }, 3);
The problem I now have is that I want it to loop through the whole in a way that it always splits the text into three pieces that belong together. If I have a text with:
testing, testing,
buenooo diasssss
testing, testing,
buenooo diasssss
testing, testing,
buenooo diasssss
(the format here is hard to display, but want to show that they are on different lines, so reading line by line will most likely not be possible)
I want "testing", "testing", "buenooo diasssss" to be dispalyed on my console althought they are on different lines.
If I would do it with lines I would simply loop through each line, but this does not work in this case.
You can first remove "\r\n"(new line) from the text, then split and select the first three items.
var alltext = File.ReadAllText(fullPath).Replace("\r\n","").Split(',').ToList().Take(3);
foreach(var item in alltext)
Console.WriteLine(item);
Edit
If you want all three items to be displayed in one line in the console:
int lineNumber = 0;
var alltext = File.ReadAllText(fullPath).Split(new string[] { "\r\n", "," }, StringSplitOptions.None).ToList();
alltext.RemoveAll(item => item == "");
while (lineNumber * 3 < alltext.Count)
{
var tempList = alltext.Skip(lineNumber * 3).Take(3).ToList(); ;
lineNumber++;
Console.WriteLine("line {0} => {1}, {2}, {3}",lineNumber, tempList[0], tempList[1], tempList[2]);
}
result:
Try this:
var data =
File.ReadLines(fullpath)
.Select((x, n) => (line: x, group: n / 3))
.GroupBy(x => x.group, x => x.line)
.Select(x =>
String
.Concat(x)
.Split(',', StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim()));
That gives me:
Related
I got a little Problem. I have a .csv with "NaN" values and doubles (0.6034 for example) and I am trying to read just the doubles of the CSV into an array[y][x].
Currently, i read the whole .csv, but I can not manage to remove all "NaN" values afterward. (It should parse through the CSV and just add the Numbers to an array[y][x] and leave all "NaN" out)
My current Code:
var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray()).ToArray(); //reads WHOLE .CSV to array[][]
int max_Rows = 0, j, rank;
int max_Col = 0;
foreach (Array anArray in rows)
{
rank = anArray.Rank;
if (rank > 1)
{
// show the lengths of each dimension
for (j = 0; j < rank; j++)
{
}
}
else
{
}
// show the total length of the entire array or all dimensions
max_Col = anArray.Length; //displays columns
max_Rows++; //displays rows
}
I tried the search but couldn't really find anything that helped me.
I know this is probably really easy but I am new to C#.
The .CSV and the desired outcome:
NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN
This is a sample .csv i have. I should have been more clear, sorry! There is a NaN in every line. and i want it to display like this:
1;5
2;6
3;7
4;8
This is just a sample of the .csv the real csv has arround 60.000 Values... I need to get the input with [y][x] for example [0][0] should display "1" and [2][1] should displays "7" and so on.
Thanks again for all your help!
You could do a filter of your delimited values in the array.
I've modified your code a bit.
File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray().Where(y => y != "NaN").ToArray()).ToArray();
If you want to remove all the lines that contain NAN (typical task for CSV - clearing up all incomplete lines), e.g.
123.0; 456; 789
2.1; NAN; 35 <- this line should be removed (has NaN value)
-5; 3; 18
You can implement it like this
double[][] data = File
.ReadLines(filepath)
.Select(line => line.Split(new char[] {';', '\t'},
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items // Filter first...
.All(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray()) // ... materialize at the very end
.ToArray();
Use string.Join to display rows:
string report = string.Join(Environment.NewLine, data
.Select(line => string.Join(";", line)));
Console.Write(report);
Edit: The actual problem is to take 2nd and 3rd complete columns only from the CSV:
NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN
desired outcome is
[[1, 5], [2, 6], [3, 7], [4, 8]]
implmentation:
double[][] data = File
.ReadLines(filepath)
.Select(line => line
.Split(new char[] {';'},
StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Take(2)
.Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase))
.ToArray())
.Where(items => items.Length == 2)
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray())
.ToArray();
Tests
// 1
Console.Write(data[0][0]);
// 5
Console.Write(data[0][1]);
// 2
Console.Write(data[1][0]);
All values in one go:
string report = string.Join(Environment.NewLine, data
.Select(line => string.Join(";", line)));
Console.Write(report);
Outcome:
1;5
2;6
3;7
4;8
Edit 2: if you want to extract non NaN values only (please, notice that the initial CSV structure will be ruined):
1;2;3 1;2;3
NAN;4;5 4;5 <- please, notice that the structure is lost
6;NAN;7 -> 6;7
8;9;NAN; 8;9
NAN;10;NAN 10
NAN;NAN;11 11
then
double[][] data = File
.ReadLines(filepath)
.Select(line => line
.Split(new char[] {';'},
StringSplitOptions.RemoveEmptyEntries)
.Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
.Where(items => items.Any())
.Select(items => items
.Select(item => double.Parse(item, CultureInfo.InvariantCulture))
.ToArray())
.ToArray();
I tried a lot of possible solutions to this problem but it never seems to work. My problem is the following: I have a txt file with several lines. Each line has something like:
xxxxx yyyyyy
xxxxx yyyyyy
xxxxx yyyyyy
xxxxx yyyyyy
...
I want to store in one array of strings the xxxxx and in another array the yyyyy, for each line on the txt file, something like
string[] x;
string[] y;
string[1] x = xxxxx; // the x from the first line of the txt
string[2] x = xxxxx; // the x from the second line of the txt
string[3] x = xxxxx; // the x from the third line of the txt
...
and the same for string[] y;
... but i have no idea how to...
I would very much appreciate if someone showed me how to make the cycle for this problem i have.
You can use linq for this:
string test = "xxxxx yyyyyy xxxxx yyyyyy xxxxx yyyyyy xxxxx yyyyyy";
string[] testarray = test.Split(' ');
string[] arrayx= testarray.Where((c, i) => i % 2 == 0).ToArray<string>();
string[] arrayy = testarray.Where((c, i) => i % 2 != 0).ToArray<string>();
Basically,this code splits the string by a space, and then puts the even strings in one array and the odd ones in another.
Edit
You say in the comments you don't understand this: Where((c, i) => i % 2 == 0). What it does is taking the position of each string (i) and does a mod of it with 2. This means, it divides the position by 2 and checks if the remain equals 0. It is the way to get if a number is odd or even.
Edit2
My first answer only works for one line. For several ones(as your input source is a file with several lines), you'll need to do a foreach loop. Or you can do something like the next sample code: read all the lines, join them in a single string, and then run the prevously showed code on the result:
string[] file=File.ReadAllLines(#"yourfile.txt");
string allLines = string.Join(" ", file); //this joins all the lines into one
//Alternate way of joining the lines
//string allLines=file.Aggregate((i, j) => i + " " + j);
string[] testarray = allLines.Split(' ');
string[] arrayx= testarray.Where((c, i) => i % 2 == 0).ToArray<string>();
string[] arrayy = testarray.Where((c, i) => i % 2 != 0).ToArray<string>();
If I understand your question correctly, xxxxx and yyyyyy show up repeatly, which in case something like that 11111 222222 11111 222222 11111 222222
There is an ' ' space between them, so
1. you may split the line one by one within a loop
2. use ' ' as delimiter when split the line
3. use a counter to differentiate whether the string is odd or even and store them separately within another loop
If I understood correctly, you have multiple lines, each line with two strings. Then, here is an answer that uses a plain old for:
public static void Main()
{
// This is just an example. In your case you would read the text from a file
const string text = #"x y
xx yy
xxx yyy";
var lines = text.Split(new[]{'\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
var xs = new string[lines.Length];
var ys = new string[lines.Length];
for(int i = 0; i < lines.Length; i++)
{
var parts = lines[i].Split(' ');
xs[i] = parts[0];
ys[i] = parts[1];
}
}
I am attempting to sort xls lines by fourth string in lines.
string[] list_lines = System.IO.File.ReadAllLines(#"E:\VS\WriteLines.xls");
// Display the file contents by using a foreach loop.
System.Console.WriteLine("Contents of Your Database = ");
foreach (var line in list_lines)
{
// Use a tab to indent each line of the file.
Console.WriteLine("\t" + line);
}
I am having problems creating algorithm that will identify the fourth element of each line and list content in alphabetical order.
The words in each line are separated by ' '.
Can anyone put me on a right direction please?
EDIT--------------------------
ok,
foreach (var line in list_lines.OrderBy(line => line.Split(' ')[3]))
sorted the problem. Lines are sorted as I need. Excel changes ' ' spaces with ';'. That's why when compiled it was giving error.
Now, I guess, I need to parse each part of string to int since it sorts by first digit and not by a number.
You can split the lines and then use the third item in an OrderBy:
foreach (var line in list_lines.OrderBy(line => line.Split(' ')[3]))
{
}
Well, just sort the array:
string[] list_lines = ...;
// General case: not all strings have 4 parts
Array.Sort(list_lines, (left, right) => {
String[] partsLeft = left.Split(' ');
String[] partsRight = right.Split(' ');
if (partsLeft.Length < 4)
if (partsRight.Length < 4)
return String.Compare(left, right, StringComparison.OrdinalIgnoreCase)
else
return -1;
else if (partsRight.Length < 4)
return 1;
return String.Compare(partsLeft[3], partsRight[3], StringComparison.OrdinalIgnoreCase);
});
If all the lines guaranteed to have 4 items at least it can be simplfied into
Array.Sort(list_lines, (left, right) =>
String.Compare(left.Split(' ')[3],
right.Split(' ')[3],
StringComparison.OrdinalIgnoreCase));
I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}
I have a bunch of text files that has a custom format, looking like this:
App Name
Export Layout
Produced at 24/07/2011 09:53:21
Field Name Length
NAME 100
FULLNAME1 150
ADDR1 80
ADDR2 80
Any whitespaces may be tabs or spaces. The file may contain any number of field names and lengths.
I want to get all the field names and their corresponding field lengths and perhaps store them in a dictionary. This information will be used to process a corresponding fixed width data file having the mentioned field names and field lengths.
I know how to skip lines using ReadLine(). What I don't know is how to say: "When you reach the line that starts with 'Field Name', skip one more line, then starting from the next line, grab all the words on the left column and the numbers on the right column."
I have tried String.Trim() but that doesn't remove the whitespaces in between.
Thanks in advance.
You can use SkipWhile(l => !l.TrimStart().StartsWith("Field Name")).Skip(1):
Dictionary<string, string> allFieldLengths = File.ReadLines("path")
.SkipWhile(l => !l.TrimStart().StartsWith("Field Name")) // skips lines that don't start with "Field Name"
.Skip(1) // go to next line
.SkipWhile(l => string.IsNullOrWhiteSpace(l)) // skip following empty line(s)
.Select(l =>
{ // anonymous method to use "real code"
var line = l.Trim(); // remove spaces or tabs from start and end of line
string[] token = line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new { line, token }; // return anonymous type from
})
.Where(x => x.token.Length == 2) // ignore all lines with more than two fields (invalid data)
.Select(x => new { FieldName = x.token[0], Length = x.token[1] })
.GroupBy(x => x.FieldName) // groups lines by FieldName, every group contains it's Key + all anonymous types which belong to this group
.ToDictionary(xg => xg.Key, xg => string.Join(",", xg.Select(x => x.Length)));
line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries) will split by space and tabs and ignores all empty spaces. Use GroupBy to ensure that all keys are unique in the dictionary. In the case of duplicate field-names the Length will be joined with comma.
Edit: since you have requested a non-LINQ version, here is it:
Dictionary<string, string> allFieldLengths = new Dictionary<string, string>();
bool headerFound = false;
bool dataFound = false;
foreach (string l in File.ReadLines("path"))
{
string line = l.Trim();
if (!headerFound && line.StartsWith("Field Name"))
{
headerFound = true;
// skip this line:
continue;
}
if (!headerFound)
continue;
if (!dataFound && line.Length > 0)
dataFound = true;
if (!dataFound)
continue;
string[] token = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if (token.Length != 2)
continue;
string fieldName = token[0];
string length = token[1];
string lengthInDict;
if (allFieldLengths.TryGetValue(fieldName, out lengthInDict))
// append this length
allFieldLengths[fieldName] = lengthInDict + "," + length;
else
allFieldLengths.Add(fieldName, length);
}
I like the LINQ version more because it's much more readable and maintainable (imo).
Based on the assumption that the position of the header line is fixed, we may consider actual key-value pairs to start from the 9th line. Then, using the ReadAllLines method to return a String array from the file, we just start processing from index 8 onwards:
string[] lines = File.ReadAllLines(filepath);
Dictionary<string,int> pairs = new Dictionary<string,int>();
for(int i=8;i<lines.Length;i++)
{
string[] pair = Regex.Replace(lines[i],"(\\s)+",";").Split(';');
pairs.Add(pair[0],int.Parse(pair[1]));
}
This is a skeleton, not accounting for exception handling, but I guess it should get you started.
You can use String.StartsWith() to detect "FieldName". Then String.Split() with a parameter of null to split by whitespace. This will get you your fieldname and length strings.