I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}
Related
I have a big file and for the simplicity I am just showing a small part of it. The data looks like following:
NPSER NASER NQSER
10 5 3
TSSR MPSER JDNSR
15 10 6
What I need to do is to find for example NPSER and NASER and then assign the values NPSER as 10, NASER as 5 and NQSER as 3. For this small data set I could do as following:
TextReader infile = new StreamReader(fileName);
string line;
int NPSER, NASER, NQSER;
line = infile.ReadLine();
string[] words = line.Split('\t');
NPSER = Convert.ToInt32(words[0]);
NASER = Convert.ToInt32(words[1]);
NQSER = Convert.ToInt32(words[2]);
infile.Close();
Instead of reading each line and assigning values, I want to write a function which will automatically fetch the line when I search upto three words in a line which would be easier and efficient for longer application.
I would appreciate other methods as well.
It would be easier if you can use LINQ:
var line = File.ReadLines("path")
.SkipWhile(line => !line.Contains("NPSER")) // change this condition to suit your needs
.Skip(1)
.First();
var values = line.Split(new[] { ' '},StringSplitOptions.RemoveEmptyEntries)
.Select(int.Parse)
.ToArray();
int NPSER = values[0];
int NASER = values[1];
int NQSER = values[2];
I have a bunch of text files that has a custom format, looking like this:
App Name
Export Layout
Produced at 24/07/2011 09:53:21
Field Name Length
NAME 100
FULLNAME1 150
ADDR1 80
ADDR2 80
Any whitespaces may be tabs or spaces. The file may contain any number of field names and lengths.
I want to get all the field names and their corresponding field lengths and perhaps store them in a dictionary. This information will be used to process a corresponding fixed width data file having the mentioned field names and field lengths.
I know how to skip lines using ReadLine(). What I don't know is how to say: "When you reach the line that starts with 'Field Name', skip one more line, then starting from the next line, grab all the words on the left column and the numbers on the right column."
I have tried String.Trim() but that doesn't remove the whitespaces in between.
Thanks in advance.
You can use SkipWhile(l => !l.TrimStart().StartsWith("Field Name")).Skip(1):
Dictionary<string, string> allFieldLengths = File.ReadLines("path")
.SkipWhile(l => !l.TrimStart().StartsWith("Field Name")) // skips lines that don't start with "Field Name"
.Skip(1) // go to next line
.SkipWhile(l => string.IsNullOrWhiteSpace(l)) // skip following empty line(s)
.Select(l =>
{ // anonymous method to use "real code"
var line = l.Trim(); // remove spaces or tabs from start and end of line
string[] token = line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new { line, token }; // return anonymous type from
})
.Where(x => x.token.Length == 2) // ignore all lines with more than two fields (invalid data)
.Select(x => new { FieldName = x.token[0], Length = x.token[1] })
.GroupBy(x => x.FieldName) // groups lines by FieldName, every group contains it's Key + all anonymous types which belong to this group
.ToDictionary(xg => xg.Key, xg => string.Join(",", xg.Select(x => x.Length)));
line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries) will split by space and tabs and ignores all empty spaces. Use GroupBy to ensure that all keys are unique in the dictionary. In the case of duplicate field-names the Length will be joined with comma.
Edit: since you have requested a non-LINQ version, here is it:
Dictionary<string, string> allFieldLengths = new Dictionary<string, string>();
bool headerFound = false;
bool dataFound = false;
foreach (string l in File.ReadLines("path"))
{
string line = l.Trim();
if (!headerFound && line.StartsWith("Field Name"))
{
headerFound = true;
// skip this line:
continue;
}
if (!headerFound)
continue;
if (!dataFound && line.Length > 0)
dataFound = true;
if (!dataFound)
continue;
string[] token = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if (token.Length != 2)
continue;
string fieldName = token[0];
string length = token[1];
string lengthInDict;
if (allFieldLengths.TryGetValue(fieldName, out lengthInDict))
// append this length
allFieldLengths[fieldName] = lengthInDict + "," + length;
else
allFieldLengths.Add(fieldName, length);
}
I like the LINQ version more because it's much more readable and maintainable (imo).
Based on the assumption that the position of the header line is fixed, we may consider actual key-value pairs to start from the 9th line. Then, using the ReadAllLines method to return a String array from the file, we just start processing from index 8 onwards:
string[] lines = File.ReadAllLines(filepath);
Dictionary<string,int> pairs = new Dictionary<string,int>();
for(int i=8;i<lines.Length;i++)
{
string[] pair = Regex.Replace(lines[i],"(\\s)+",";").Split(';');
pairs.Add(pair[0],int.Parse(pair[1]));
}
This is a skeleton, not accounting for exception handling, but I guess it should get you started.
You can use String.StartsWith() to detect "FieldName". Then String.Split() with a parameter of null to split by whitespace. This will get you your fieldname and length strings.
How can I remove a whole line from a text file if the first word matches to a variable I have?
What I'm currently trying is:
List<string> lineList = File.ReadAllLines(dir + "textFile.txt").ToList();
lineList = lineList.Where(x => x.IndexOf(user) <= 0).ToList();
File.WriteAllLines(dir + "textFile.txt", lineList.ToArray());
But I can't get it to remove.
The only mistake that you have is you are checking <= 0 with indexOf, instead of = 0.
-1 is returned when the string does not contain the searched for string.
<= 0 means either starts with or does not contain
=0 means starts with <- This is what you want
This method will read the file line-by-line instead of all at once. Also note that this implementation is case-sensitive.
It also assumes you aren't subjected to leading spaces.
using (var writer = new StreamWriter("temp.file"))
{
//here I only write back what doesn't match
foreach(var line in File.ReadLines("file").Where(x => !x.StartsWith(user)))
writer.WriteLine(line); // not sure if this will cause a double-space ?
}
File.Move("temp.file", "file");
You were pretty close, String.StartsWith handles that nicely:
// nb: if you are case SENSITIVE remove the second argument to ll.StartsWith
File.WriteAllLines(
path,
File.ReadAllLines(path)
.Where(ll => ll.StartsWith(user, StringComparison.OrdinalIgnoreCase)));
For really large files that may not be well performing, instead:
// Write our new data to a temp file and read the old file On The Fly
var temp = Path.GetTempFileName();
try
{
File.WriteAllLines(
temp,
File.ReadLines(path)
.Where(
ll => ll.StartsWith(user, StringComparison.OrdinalIgnoreCase)));
File.Copy(temp, path, true);
}
finally
{
File.Delete(temp);
}
Another issue noted was that both IndexOf and StartsWith will treat ABC and ABCDEF as matches if the user is ABC:
var matcher = new Regex(
#"^" + Regex.Escape(user) + #"\b", // <-- matches the first "word"
RegexOptions.CaseInsensitive);
File.WriteAllLines(
path,
File.ReadAllLines(path)
.Where(ll => matcher.IsMatch(ll)));
Use `= 0` instead of `<= 0`.
Hi i know the Title might sound a little confusing but im reading in a text file with many lines of data
Example
12345 Test
34567 Test2
i read in the text 1 line at a time and add to a list
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line);
}
}
how do i then separate the 1234 from the test so i can pull only the first column of data if i need like list(1).pars[1] would be 12345 and list(2).pars[2] would be test2
i know this sounds foggy but i hope someone out there understands
Maybe something like this:
string test="12345 Test";
var ls= test.Split(' ');
This will get you a array of string. You can get them with ls[0] and ls[1].
If you just what the 12345 then ls[0] is the one to choose.
If you're ok with having a list of string[]'s you can simply do this:
var list = new List<string[]>();
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line.Split(' '));
}
}
string firstWord = list[0][0]; //12345
string secondWord = list[0][1]; //Test
When you have a string of text you can use the Split() method to split it in many parts. If you're sure every word (separated by one or more spaces) is a column you can simply write:
string[] columns = line.Split(' ');
There are several overloads of that function, you can specify if blank fields are skipped (you may have, for example columns[1] empty in a line composed by 2 words but separated by two spaces). If you're sure about the number of columns you can fix that limit too (so if any text after the last column will be treated as a single field).
In your case (add to the list only the first column) you may write:
if (String.IsNullOrWhiteSpace(line))
continue;
string[] columns = line.TrimLeft().Split(new char[] { ' ' }, 2);
list.Add(columns[0]);
First check is to skip empty or lines composed just of spaces. The TrimLeft() is to remove spaces from beginning of the line (if any). The first column can't be empty (because the TrimLeft() so yo do not even need to use StringSplitOptions.RemoveEmptyEntries with an additional if (columns.Length > 1). Finally, if the file is small enough you can read it in memory with a single call to File.ReadAllLines() and simplify everything with a little of LINQ:
list.Add(
File.ReadAllLines("test.txt")
.Where(x => !String.IsNullOrWhiteSpace(x))
.Select(x => x.TrimLeft().Split(new char[] { ' ' }, 2)[0]));
Note that with the first parameter you can specify more than one valid separator.
When you have multiple spaces
Regex r = new Regex(" +");
string [] splitString = r.Split(stringWithMultipleSpaces);
var splitted = System.IO.File.ReadAllLines("Test.txt")
.Select(line => line.Split(' ')).ToArray();
var list1 = splitted.Select(split_line => split_line[0]).ToArray();
var list2 = splitted.Select(split_line => split_line[1]).ToArray();
I have a text file which contains lot of data each on new line
but i want to extract the lines, which start with the values:
coordinates=(111,222,333)
There are several instances of this line and i would actually want to extract the part "111,222,333"
how can i do this?
Something like
var result = File.ReadAllLines(#"C:\test.txt")
.Where(p => p.StartsWith("coordinates=("))
.Select(p => p.Substring(13, p.IndexOf(')') - 13));
The first line is quite clear, the second line filters for only the lines that starts with coordinates=(, the third line extract the substring (13 is the length of coordinates=()
result is an IEnumerable<string>. You can convert it to an array with result.ToArray()
var text = File.ReadAllText(path);
var result = Regex.Matches(text, #"coordinates=\((\d+),(\d+),(\d+)\)")
.Cast<Match>()
.Select(x => new
{
X = x.Groups[1].Value,
Y = x.Groups[2].Value,
Z = x.Groups[3].Value
})
.ToArray();