Reading text data from tab delimited file without headers in C# - c#

I want to read text data from a .txt file in which the data is tab separated.
Following is the sample data:
How can I read the data into one single List of type string without headers.
I currently have the following code:
var sepList = new List<string>();
// Read the file and display it line by line.
using (var file = new StreamReader(docPath))
{
string line;
while ((line = file.ReadLine()) != null)
{
var delimiters = '\t';
var segments = line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var segment in segments)
{
//Console.WriteLine(segment);
sepList.Add(segment);
}
}
}

You can get your result with a single line.
var data = File.ReadLines(docPath)
.Skip(1)
.Select(x => x.Split(new char[] {'\t'},StringSplitOptions.RemoveEmptyEntries))
.SelectMany(k => k);
First, we use File.ReadLines that enumerates the lines from the file creating a sequence that we can feed to the following commands. Then we Skip the first line from the sequence and on the remaining items we apply the Split operation getting another sequence of two or more values that we can add as single items to the IEnumerable assigned to the data variable. Of course a ToList will materialize the IEnumerable into the final processing data.

Related

C# List: Add double quotes when the field is string with LinQ [duplicate]

This question already has answers here:
Writing data into CSV file in C#
(15 answers)
Closed 3 months ago.
I don't think my question is a duplicate because I am not asking how to convert Lists in to CSVs. But:
I am trying to convert a list into a comma-delimited csv file.
However, some fields contain commas and semicolons.
A column will be split into at least two columns when there is a comma.
My codes:
public void SaveToCsv<T>(List<T> listToBeConverted)
{
var lines = new List<string>();
IEnumerable<PropertyDescriptor> props = TypeDescriptor.GetProperties(typeof(T)).OfType<PropertyDescriptor>();
//Get headers
var header = string.Join(",", props.ToList().Select(x => x.Name));
//Add all headers
lines.Add(header);
//LinQ to get all row data and add commas to serperate them
var valueLines = listToBeConverted.Select(row => string.Join(",", header.Split(',').Select(a => row.GetType().GetProperty(a).GetValue(row, null))));
//add row data to List
lines.AddRange(valueLines);
...
}
How do I modify the LinQ statment to add double quotes to the start and the end of the string when it is System.String?
Use property info for this purpose.
void SaveToCsv<T>(List<T> listToBeConverted)
{
var lines = new List<string>();
IEnumerable<PropertyDescriptor> props = TypeDescriptor.GetProperties(typeof(T)).OfType<PropertyDescriptor>();
//Get headers
var header = string.Join(",", props.ToList().Select(x => x.Name));
//Add all headers
lines.Add(header);
//LinQ to get all row data and add commas to serperate them
var valueLines = listToBeConverted.Select(row => string.Join(",", props.Select(x =>
{
var property = row.GetType().GetProperty(x.Name);
if (property.PropertyType == typeof(string))
return $"\"{property.GetValue(row, null)}\"";
return property.GetValue(row, null);
})));
//add row data to List
lines.AddRange(valueLines);
...
}
CSV is Comma-Separated Values, but the separator is based on your default system separator!
Use a system separator for columns and use the end-line for rows.
you can find your system default in this way:
Open Control panel
open clock and region
Click on Aditional Setting
You can see and change the system's default separator

How can I read a text file with StreamReader selectively?

I am trying to use a loop to collect all the elements in a text file and select out of those elements specific ones to display.
{
string lines = File.ReadLines(path).Where(line => line.StartsWith("Name: ")).ToString();
foreach (string line in lines)
{
MessageList.Items.Add(lines);
}
}
The idea with this code is for the file stream to parse the entire document and only select the lines that start with Name:, ignoring all other ones so I can add the
I can't seem to get around the syntax error within the condition of the foreach loop. It says I'm trying to convert between char and string and the compiler is confused by my request. I've tried doing this with and without invoking ToString(), I've also tried it by declaring lines as a var instead of a string. I tried to do this without the lambda expression
lines is not a string. It will be of type
IEnumerable<string>
if you let it.
If you use var instead the compiler will figure out the type for you. If you hover your mouse over the lines or line variable in your IDE it will let you know the type.
var lines = File.ReadLines(path).Where(line => line.StartsWith("Name: "));
foreach (var line in lines)
{
MessageList.Items.Add(line);
}
If you want to be explicit about the type, this would be the code.
IEnumerable<string> lines = File.ReadLines(path).Where(line =>
line.StartsWith("Name: "));
foreach (string line in lines)
{
MessageList.Items.Add(line);
}
About var - https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/var
About File.Readlines including the return type - https://learn.microsoft.com/en-us/dotnet/api/system.io.file.readlines?view=net-6.0
You could also use:-
File.ReadLines(path).Where(line => line.StartsWith("Name: ")).ToList()
.ForEach(line => { MessageList.Items.Add(line); });

Sorting files in a directory by date in c# using directory.get files()

At the moment I have my code to get some files from a Dir.
foreach (var file in
Directory.GetFiles(MainForm.DIRECTORY_PATH, "*.csv"))
{
//Process File
string[] values = File.ReadAllLines(file)
.SelectMany(lineRead => lineRead.Split(',')
.Select(s => s.Trim()))
.ToArray();
I want to be able to order these file by date order first before i start reading them and processing them.
I looked at a suggestion on MDSN to use DirectoryInfo:
DirectoryInfo DirInfo = new DirectoryInfo(MainForm.DIRECTORY_PATH);
var filesInOrder = from f in DirInfo.EnumerateFiles()
orderby f.CreationTime
select f;
foreach (var item in filesInOrder)
{
//Process File
string[] values = File.ReadAllLines(item )
.SelectMany(lineRead => lineRead.Split(',')
.Select(s => s.Trim()))
.ToArray();
}
this doesnt work however as the System.IO.File.ReadAllLine(file) seems to red line with the error as item is a string and not an actual file. :(
Does anyone know a solution to this or has had a similar issue? :)
Regards
J.
From MSDN File.ReadAllLines(string path) takes file path as input.
Opens a text file, reads all lines of the file, and then closes the file.
You have to pass file path:
string[] values = File.ReadAllLines(item.FullName)
your code:
foreach (var item in filesInOrder)
{
string[] values = File.ReadAllLines(item.FullName)
...............................
...............................
}
You can replace all of your chunk with following code via lambda expressions:
var values = DirInfo.EnumerateFiles().OrderBy(f => f.CreationTime)
.Select(x => File.ReadAllLines(x.FullName)
.SelectMany(lineRead => lineRead.Split(',')
.Select(s => s.Trim())).ToArray()
);
Your first code snippet reads all lines in one file, where as the second one reads from all files in the directory. So it is not very clear what you want to do.
The second code snippet cannot work, because the variable values is declared inside the loop. Its visibility scope is limited to the code block of the loop. The result will therefore never be visible outside of the loop.
var filesInOrder = from f in DirInfo.EnumerateFiles() ...;
var items = new List<string>();
foreach (FileInfo f in filesInOrder) {
using (StreamReader sr = f.OpenText()) {
while (!sr.EndOfStream) {
items.AddRange(sr.ReadLine().Split(','));
}
}
}
Here I define a List<string> before the loop that will hold all the items of all files. We need two loops: one that loops over the files (foreach) and one that reads the lines in each file and successively adds items to the list (while).

c# Compare 2 CSV files and delete if it exists in second file

Basically i want to delete a row from List.csv if it exists in the ListToDelete.csv and output the results to a different file named newList.csv.
List.csv
1,A,V
2,B,W
3,C,X
4,D,Y
5,E,z
ListToDelete.csv
3
4
NewList.csv
1,A,V
2,B,W
5,E,z
I understand about using streamreader and writer to read and write to files but i can't see how to store only the first column of List.csv to compare it to the 1st column of ListToDelete.csv.
I initially stripped out everything in the first column using the split method to do the comparison but i also need to copy over the other 2 columns and i can't see how to compare or loop through it correctly.
string list = "List.txt";
string listDelete = "ListToDelete.txt";
string newList = "newList.txt";
//2 methods to store all the text in a string array so we can match the arrays. Using ReadAllLines instead of screenreader so it populates array automatically
var array1 = File.ReadAllLines(list);
var array2 = File.ReadAllLines(listDelete);
// Sets all the first columns from the CSV into an array
var firstcolumn = array1.Select(x => x.Split(',')[0]).ToArray();
//Matches whats in firstcolumn and array 2 to find duplicates and non duplicates
var duplicates = Array.FindAll(firstcolumn, line => Array.Exists(array2, line2 => line2 == line));
var noduplicates = Array.FindAll(firstcolumn, line => !Array.Exists(duplicates, line2 => line2 == line));
//Writes all the non duplicates to a different file
File.WriteAllLines(newList, noduplicates);
So that above code produces
1
2
5
But i also need the second and third columns to be written to a new file to look like
NewList.csv
1,A,V
2,B,W
5,E,z
You had almost done it right. The problem is because noduplicates is selected from firstcolumn, which is only the first column {1,2,3,4,5}. noduplicates should be selected from the original list (array1), excluding the lines that start with one of the duplicates.
Correct one single line as following should fix the problem. The output has 3 rows and each row has 3 columns.
var noduplicates = Array.FindAll(array1, line => !Array.Exists(duplicates, line2 => line.StartsWith(line2)));
Furthermore, you don't need to parse the first column from the original array for matching. The code can be cleaned up like this
string list = "List.csv";
string listDelete = "ListToDelete.csv";
string newList = "newList.txt";
var array1 = File.ReadAllLines(list);
var array2 = File.ReadAllLines(listDelete);
var noduplicates = Array.FindAll(array1, line => !Array.Exists(array2, line2 => line.StartsWith(line2)));
//Writes all the non duplicates to a different file
File.WriteAllLines(newList, noduplicates);

Is there a better method of calling a comparision over a list of objects in C#?

I am reading in lines from a large text file. Amongst these file are occasional strings, which are in a preset list of possibilities, and I wish to check the line currently being read for a match to any of the strings in the possibilities list. If there is a match I want to simply append them to a different list, and continue the loop I am using to read the file.
I was just wondering if there is a more efficent way to do a line.Contains() or equivilance check against say the first element in the list, then the second, etc. without using a nested loop or a long if statement filled with "or"s.
Example of what I have now:
List<string> possible = new List<string> {"Cat", "Dog"}
using(StreamReader sr = new StreamReader(someFile))
{
string aLine;
while ((aLine = sr.Readline()) != null)
{
if (...)
{
foreach (string element in possible)
{
if line.Contains(element) == true
{
~add to some other list
continue
}
}
~other stuff
}
}
I don't know about more efficient run-time wise, but you can eliminate a lot of code by using LINQ:
otherList.AddRange(File.ReadAllLines(somefile).
.Where(line => possible.Any(p => line.Contains(p)));
I guess you are looking for:
if(possible.Any(r=> line.Contains(r)))
{
}
You can separate your work to Get Data and then Analyse Data. You don't have to do it in the same loop.
After reading lines, there are many ways to filter them. The most readable and maintenable IMO is to use Linq.
You can change your code to this:
// get lines
var lines = File.ReadLines("someFile");
// what I am looking for
var clues = new List<string> { "Cat", "Dog" };
// filter 1. Are there clues? This is if you only want to know
var haveCluesInLines = lines.Any(l => clues.Any(c => l.Contains(c)));
// filter 2. Get lines with clues
var linesWithClues = lines.Where(l => clues.Any(c => l.Contains(c)));
Edit:
Most likely you will have little clues and many lines. This example checks each line with every clue, saving time.

Categories