Issue with empty columns in tab delimited text file

Issue with empty columns in tab delimited text file - c#

i am trying to create a data table from a tab delimited text file.I am getting the values from the file easily.The problem is that when there is a empty column in text file same empty column is not created in data table instead the next non-empty column contents get replaced in the empty columns area
format of data in textfile
id name product cost company name
1 abc shoe xxx
2 xyz chain yyy
Data table obtained
id name product cost company name
1 abc shoe xxx
2 xyz chain yyy
my code to getdata
var reader = new StreamReader(File.OpenRead(#"d:\d.txt"));
var table = new DataTable("SampleTable");
string[] fieldValues = reader.ReadLine().Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < fieldValues.Length; i++)
{
table.Columns.Add(new DataColumn(fieldValues[i].ToString().Trim()));
}
while (!reader.EndOfStream)
{
var line = reader.ReadLine().Trim();
var values = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
string[] ee = values;
var newRow = table.NewRow();
for (int i = 0; i < ee.Length; i++)
{
newRow[i] = ee[i].Trim().ToString(); // like sample [0,0]
}
table.Rows.Add(newRow);
}

You told Split to do exactly what you observe by setting the option StringSplitOptions.RemoveEmptyEntries - it removes empty entries.
Remove this option and it will keep the empty column.

The problem might be that you split the line with the RemoveEmptyEntries option:
var values = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
The empty cell gets removed. Omit this parameter and it should work...

If you have an empty column, then you should read an empty string, not a null string.
In other words....
1,abc,shoe,,xxx
The reason your getting the result your getting is because of this: StringSplitOptions.RemoveEmptyEntries

Related

Removing copy by reference in CSV

I have a CSV file, I am trying to replace 1st column details with 2nd column values using String.Replace in C#,This is working fine. But next when I try replacing 2nd column with 6th column values, it is affecting the 1st column values also..?
string[] lines = File.ReadAllLines(file);
for(int i=1;i<lines.Length;i++)
{
if (lines[i].Split(',')[1].Contains('.'))
{
lines[i] = lines[i].Replace(lines[i].Split(',')[0], lines[i].Split(',')[1]);
lines[i] = lines[i].Replace(lines[i].Split(',')[1], lines[i].Split(',')[6]);
}
}
File.WriteAllLines(file,lines);

There is a miss understanding of what lines[i].Replace does. If you click on it and press F12 or F1 you will see that it's actually String.Replace(String, String).
From the documentation: "Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string."
In your code you want to move values from column 2 to column 1 etc. Not remplace the all occurence of the value of column 1.
for (int i = 1; i < lines.Length; i++)
{
// your if here.
var columns = lines[i].Split(',');
columns[0] = columns[1];
columns[1] = columns[5];
lines[i] = string.Join(",", columns);
}
In the following Live demo, I removed the if and emulate fileRead and Write with simple string [] and string.

Reading data form a .txt file and adding it to my Database

I've been working on a school assignment for a while and I have noticed I need a little help finishing it up. Specifically, I need help converting/importing a FlatFile.txt into my normalized DataBase in .
My flatfile has many rows, and in each row there are many attributes separated by a ' | '.
How can I add each line and take every element in the line to its unique corresponding attributes in my DataBase (assuming I already have my connection with my Database established)?
I know I might need to use something like:
hint*: comm.Parameters.AddWithValue("#FirstName", firstname);
How can make the first element of .txt file equal to firstname?
I hope this made sense.
Here is what I have so far:
string [] lines = File.ReadAllLines(pathString);
string[][] myRecords = new string[lines.Count() + 1][];
int k = 1;
foreach (var line in lines)
{
var values = line.Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 1; i < values.Length; i++)
{
if (myRecords[k] == null)
{
myRecords[k] = new string[values.Length + 1];
}
myRecords[k][i] = values[i];
Console.WriteLine(myRecords[k][i]);
}
k++;

You should be able to add each item from a line to a corresponding name in the database, by using it's position in the line; something like:
comm.Parameters.AddWithValue("#FirstName", values[i]);
(Note: The exact syntax needed here will depend on what you use for this, but I'm assuming you'll be able to figure that part out).
...However, for something like that to work, you'll want to change the code parsing the line and remove StringSplitOptions.RemoveEmptyEntrie, since that will affect the resulting number of data-parts from each line.
An example:
var line = "one|two|||five|six";
// Will produce: {one, two, five, six}
var nonEmptyValues = line.Split(new char[] { '|' },
StringSplitOptions.RemoveEmptyEntries);
// Will produce {one, two, null, null, five, six}
var allValues = line.Split(new char[] { '|' });
The latter will allow your resulting array (values in your code) to always contain the correct number of "columns", which should make mapping from positions to DB column names easier.

Eead text file with fixed columns in C#

is there any way to read text files with fixed columns in C # without using regex and substring?
I want to read a file with fixed columns and transfer the column to an excel file (.xlsx)
example 1
POPULACAO
MUNICIPIO UF CENSO 2010
AC 78.507
AC 15.100
Rio Branco AC 336.038
Sena Madureira AC 38.029
example 2
POPULACAO
MUNICIPIO UF CENSO 2010
AC 78.507
Epitaciolândia AC 15.100
Rio Branco AC 336.038
Sena Madureira AC 38.029
remembering that I have a case as in the second example where a column is blank, I can get the columns and the values using regex and / or substring, but if it appears as a file in Example 2, with the regex line of the file is ignored, so does substring.

Assuming you mean "fixed columns" extremely literally, and every single non-terminal column is exactly the same width, each column is separated by exactly one space, yes, you can get away with using neither regex or substring. If that's the case - and bear in mind that's also suggesting that every single person in the database has a name that's exactly four letters long - then you can just read the file in by lines. Id would be line[0].ToString(), name would be new string(new char[] { line[2], line[3], line[4], line[5]), etc.
Or, for any given value:
var str = new StringBuilder();
for (int i = firstIndex; i < lastIndex; i++)
{
str.Append(line[i]);
}
But this is basically just performing the exact function of Substring. Substring isn't your problem - handling empty values in the first (city) column is. So, for any given line, you need to check whether the line is empty:
foreach (line in yourLines)
{
if (line.Substring(cityStartIndex, cityEndIndex).IsNullOrWhitespace) == "")
{
continue;
}
}
Alternately, if you're sure the city name will always be at the very first index of the line:
foreach (line in yourLines)
{
if (line[0] == ' ') { continue; }
}
And if the value you got from the city cell was valid, you'd store that value and continue on to using Substring with the indices of the rest of the values in the row.

If for whatever reason you don't want to use a regular expression or Substring(), you have a couple of other options:
String.Split, e.g. var columns = line.Split(' ');
String.Chars, using the known widths of each column to build your output;

Why not just use string.Split()?
Something like:
using (StreamReader stream = new StreamReader(file)) {
while (!stream.EndOfStream) {
string line = stream.ReadLine();
if (string.IsNullOrWhitespace(line))
continue;
string[] fields = line.Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
int ID = -1, age = -1;
string name = null, training = null;
ID = int.Parse(fields[0]);
if (fields.Length > 1)
name = fields[1];
if (fields.Length > 2)
age = int.Parse(fields[2]);
if (fields.Length > 3)
training = fields[3];
// do stuff
}
}
Only downside to this is that it will allow fields of arbitrary length. And spaces in fields will break the fields.
As for regular expressions being ignored in the last case, try something like:
Match m = Regex.Match(line, #"^(.{2}) (.{4}) (.{2})( +.+?)?$");

First - define a variable for each column in the file. Then go through the file line by line and assign each column to the correct variable. Substitute the correct start positions and lengths. This should be enough information to get you started parsing your file.
private string id;
private string name;
private string age;
private string training;
while((line = file.ReadLine()) != null)
{
id = line.Substring(0, 3)
name = line.Substring(3, 10)
age = line.Substring(12, 2)
training = line.Substring(14, 10)
...
if (string.IsNullOrWhiteSpace(name))
{
// ignore this line if the name is blank
}
else
{
// do something useful
}
counter++;
}

Parsing a text file with a custom format in C#

I have a bunch of text files that has a custom format, looking like this:
App Name
Export Layout
Produced at 24/07/2011 09:53:21
Field Name Length
NAME 100
FULLNAME1 150
ADDR1 80
ADDR2 80
Any whitespaces may be tabs or spaces. The file may contain any number of field names and lengths.
I want to get all the field names and their corresponding field lengths and perhaps store them in a dictionary. This information will be used to process a corresponding fixed width data file having the mentioned field names and field lengths.
I know how to skip lines using ReadLine(). What I don't know is how to say: "When you reach the line that starts with 'Field Name', skip one more line, then starting from the next line, grab all the words on the left column and the numbers on the right column."
I have tried String.Trim() but that doesn't remove the whitespaces in between.
Thanks in advance.

You can use SkipWhile(l => !l.TrimStart().StartsWith("Field Name")).Skip(1):
Dictionary<string, string> allFieldLengths = File.ReadLines("path")
.SkipWhile(l => !l.TrimStart().StartsWith("Field Name")) // skips lines that don't start with "Field Name"
.Skip(1) // go to next line
.SkipWhile(l => string.IsNullOrWhiteSpace(l)) // skip following empty line(s)
.Select(l =>
{ // anonymous method to use "real code"
var line = l.Trim(); // remove spaces or tabs from start and end of line
string[] token = line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new { line, token }; // return anonymous type from
})
.Where(x => x.token.Length == 2) // ignore all lines with more than two fields (invalid data)
.Select(x => new { FieldName = x.token[0], Length = x.token[1] })
.GroupBy(x => x.FieldName) // groups lines by FieldName, every group contains it's Key + all anonymous types which belong to this group
.ToDictionary(xg => xg.Key, xg => string.Join(",", xg.Select(x => x.Length)));
line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries) will split by space and tabs and ignores all empty spaces. Use GroupBy to ensure that all keys are unique in the dictionary. In the case of duplicate field-names the Length will be joined with comma.
Edit: since you have requested a non-LINQ version, here is it:
Dictionary<string, string> allFieldLengths = new Dictionary<string, string>();
bool headerFound = false;
bool dataFound = false;
foreach (string l in File.ReadLines("path"))
{
string line = l.Trim();
if (!headerFound && line.StartsWith("Field Name"))
{
headerFound = true;
// skip this line:
continue;
}
if (!headerFound)
continue;
if (!dataFound && line.Length > 0)
dataFound = true;
if (!dataFound)
continue;
string[] token = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if (token.Length != 2)
continue;
string fieldName = token[0];
string length = token[1];
string lengthInDict;
if (allFieldLengths.TryGetValue(fieldName, out lengthInDict))
// append this length
allFieldLengths[fieldName] = lengthInDict + "," + length;
else
allFieldLengths.Add(fieldName, length);
}
I like the LINQ version more because it's much more readable and maintainable (imo).

Based on the assumption that the position of the header line is fixed, we may consider actual key-value pairs to start from the 9th line. Then, using the ReadAllLines method to return a String array from the file, we just start processing from index 8 onwards:
string[] lines = File.ReadAllLines(filepath);
Dictionary<string,int> pairs = new Dictionary<string,int>();
for(int i=8;i<lines.Length;i++)
{
string[] pair = Regex.Replace(lines[i],"(\\s)+",";").Split(';');
pairs.Add(pair[0],int.Parse(pair[1]));
}
This is a skeleton, not accounting for exception handling, but I guess it should get you started.

You can use String.StartsWith() to detect "FieldName". Then String.Split() with a parameter of null to split by whitespace. This will get you your fieldname and length strings.

split a string from a text file into another list

Hi i know the Title might sound a little confusing but im reading in a text file with many lines of data
Example
12345 Test
34567 Test2
i read in the text 1 line at a time and add to a list
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line);
}
}
how do i then separate the 1234 from the test so i can pull only the first column of data if i need like list(1).pars[1] would be 12345 and list(2).pars[2] would be test2
i know this sounds foggy but i hope someone out there understands

Maybe something like this:
string test="12345 Test";
var ls= test.Split(' ');
This will get you a array of string. You can get them with ls[0] and ls[1].
If you just what the 12345 then ls[0] is the one to choose.

If you're ok with having a list of string[]'s you can simply do this:
var list = new List<string[]>();
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line.Split(' '));
}
}
string firstWord = list[0][0]; //12345
string secondWord = list[0][1]; //Test

When you have a string of text you can use the Split() method to split it in many parts. If you're sure every word (separated by one or more spaces) is a column you can simply write:
string[] columns = line.Split(' ');
There are several overloads of that function, you can specify if blank fields are skipped (you may have, for example columns[1] empty in a line composed by 2 words but separated by two spaces). If you're sure about the number of columns you can fix that limit too (so if any text after the last column will be treated as a single field).
In your case (add to the list only the first column) you may write:
if (String.IsNullOrWhiteSpace(line))
continue;
string[] columns = line.TrimLeft().Split(new char[] { ' ' }, 2);
list.Add(columns[0]);
First check is to skip empty or lines composed just of spaces. The TrimLeft() is to remove spaces from beginning of the line (if any). The first column can't be empty (because the TrimLeft() so yo do not even need to use StringSplitOptions.RemoveEmptyEntries with an additional if (columns.Length > 1). Finally, if the file is small enough you can read it in memory with a single call to File.ReadAllLines() and simplify everything with a little of LINQ:
list.Add(
File.ReadAllLines("test.txt")
.Where(x => !String.IsNullOrWhiteSpace(x))
.Select(x => x.TrimLeft().Split(new char[] { ' ' }, 2)[0]));
Note that with the first parameter you can specify more than one valid separator.

When you have multiple spaces
Regex r = new Regex(" +");
string [] splitString = r.Split(stringWithMultipleSpaces);

var splitted = System.IO.File.ReadAllLines("Test.txt")
.Select(line => line.Split(' ')).ToArray();
var list1 = splitted.Select(split_line => split_line[0]).ToArray();
var list2 = splitted.Select(split_line => split_line[1]).ToArray();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Issue with empty columns in tab delimited text file - c#

You told Split to do exactly what you observe by setting the option StringSplitOptions.RemoveEmptyEntries - it removes empty entries. Remove this option and it will keep the empty column.

The problem might be that you split the line with the RemoveEmptyEntries option: var values = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries); The empty cell gets removed. Omit this parameter and it should work...

If you have an empty column, then you should read an empty string, not a null string. In other words.... 1,abc,shoe,,xxx The reason your getting the result your getting is because of this: StringSplitOptions.RemoveEmptyEntries

Related

Removing copy by reference in CSV

Reading data form a .txt file and adding it to my Database

Eead text file with fixed columns in C#

Parsing a text file with a custom format in C#

split a string from a text file into another list

Categories

Resources