Simple csv reader? - c#

all,
I started out with what i thought was going to be a pretty simple task. (convert a csv to "wiki" format) but im hitting a few snags that im having trouble working through
I have 3 main problems
1) some of the cells contain \r\n ( so when reading line by line this treats each new line as a new cell
2) some of the rows contain "," ( i tried switching to \t delemited files but im still running into a problem escaping when its between two "")
3) some rows are completely blank except for the delmiter ("," or "\t") others are incomplete (which is fine i just need to make sure that the cell goes in the correct place)
I've tried a few of the CSV reader classes but they would bump up agenst of teh problems listed above
I'm trying to keep this app as small as possible so i am also trying to avoid dlls and large classes that only a small portion do what i want.
so far i have two "attempts that are not working
Atempt 1 (doesn't handel \r\n in a cell)
OpenFileDialog openFileDialog1 = new OpenFileDialog();
openFileDialog1.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
openFileDialog1.Filter = "tab sep file (*.txt)|*.txt|All files (*.*)|*.*";
openFileDialog1.FilterIndex = 1;
openFileDialog1.RestoreDirectory = true;
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
if (cb_sortable.Checked)
{
header = "{| class=\"wikitable sortable\" border=\"1\" \r\n|+ Sortable table";
}
StringBuilder sb = new StringBuilder();
string line;
bool firstline = true;
StreamReader sr = new StreamReader(openFileDialog1.FileName);
sb.AppendLine(header);
while ((line = sr.ReadLine()) != null)
{
if (line.Replace("\t", "").Length > 1)
{
string[] hold;
string lead = "| ";
if (firstline && cb_header.Checked == true)
{
lead = "| align=\"center\" style=\"background:#f0f0f0;\"| ";
}
hold = line.Split('\t');
sb.AppendLine(table);
foreach (string row in hold)
{
sb.AppendLine(lead + row.Replace("\"", ""));
}
firstline = false;
}
}
sb.AppendLine(footer);
Clipboard.SetText(sb.ToString());
MessageBox.Show("Done!");
}
}
string header = "{| class=\"wikitable\" border=\"1\" ";
string footer = "|}";
string table = "|-";
attempt 2 ( can handle \r\n but shifts cells over blank cells) (its not complete yet)
OpenFileDialog openFileDialog1 = new OpenFileDialog();
openFileDialog1.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
openFileDialog1.Filter = "txt file (*.txt)|*.txt|All files (*.*)|*.*";
openFileDialog1.FilterIndex = 1;
openFileDialog1.RestoreDirectory = true;
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
if (cb_sortable.Checked)
{
header = "{| class=\"wikitable sortable\" border=\"1\" \r\n|+ Sortable table";
}
using (StreamReader sr = new StreamReader(openFileDialog1.FileName))
{
string text = sr.ReadToEnd();
string[] cells = text.Split('\t');
int columnCount = 0;
foreach (string cell in cells)
{
if (cell.Contains("\r\n"))
{
break;
}
columnCount++;
}
}
basically all I needs is a "split if not between \" " but im just at a loss right now
any tips or tricks would be greatly appreciated

Checkout this project instead of rolling your own CSV parser.

You might take a look at http://www.filehelpers.com/ as well...
Don't try to do it by yourself if you can use libraries!

Try taking a look here. Your code doesn't make web requests, but effectively this shows you how to parse a csv that is returned from a web service.

There's a decent implementation here...
A Fast CSV Reader by Sébastien Lorion
It makes much more sense in this case to use tried-and-tested code rather than trying to roll your own.

For a specification that's essentially two pages long, the CSV format is deceptive in its simplicity. The majority of short parser implementations that can be found on the internet are blatantly incorrect in one way or another. That notwithstanding, the format hardly seems to call for 1k+ SLOC implementations.
public static class CsvImport {
/// <summary>
/// Parse a Comma Separated Value (CSV) source into rows of strings. [1]
///
/// The header row (if present) is not treated specially. No checking is
/// performed to ensure uniform column lengths among rows. If no input
/// is available, a single row containing String.Empty is returned. No
/// support is provided for debugging invalid CSV files. Callers who
/// desire such assistance are encouraged to use a TextReader that can
/// report the current line and column position.
///
/// [1] https://www.rfc-editor.org/rfc/rfc4180
/// </summary>
public static IEnumerable<string[]> Deserialize(TextReader input) {
if (input.Peek() == Sentinel) yield return new [] { String.Empty };
while (input.Peek() != Sentinel) {
// must read in entire row *now* to see if we're at end of input
yield return DeserializeRow(input).ToArray();
}
}
const int Sentinel = -1;
const char Quote = '"';
const char Separator = (char)System.Globalization.CultureInfo.CurrentCulture.TextInfo.ListSeparator;
static IEnumerable<string> DeserializeRow(TextReader input) {
var field = new StringBuilder();
while (true) {
var c = input.Read();
if (c == Separator) {
yield return field.ToString();
field = new StringBuilder();
} else if (c == '\r') {
if (input.Peek() == '\n') {
input.Read();
}
yield return field.ToString();
yield break;
} else if (new [] { '\n', Sentinel }.Contains(c)) {
yield return field.ToString();
yield break;
} else if (c == Quote) {
field.Append(DeserializeQuoted(input));
} else {
field.Append((char) c);
}
}
}
static string DeserializeQuoted(TextReader input) {
var quoted = new StringBuilder();
while (input.Peek() != Sentinel) {
var c = input.Read();
if (c == Quote) {
if (input.Peek() == Quote) {
quoted.Append(Quote);
input.Read();
} else {
return quoted.ToString();
}
} else {
quoted.Append((char) c);
}
}
throw new UnexpectedEof("End-of-file inside quoted section.");
}
public class UnexpectedEof : Exception {
public UnexpectedEof(string message) : base(message) { }
}
}

Related

C#, adding chars to string including symbols

Im converting csv files to xml files via c#. I'm saving the csv file in a List of string, but it doesn't take symbols symbols such as ä, á, ê.
public Lesson CsvToLesson(List<string> csv)
{
string lesName = csv[0][csv[0].Length - 3].ToString();
List<Word> words = new List<Word>();
for(int i = 3; i < csv.Count; i++)
{
string lang1 = "";
string lang2 = "";
bool firstWord = true;
foreach (char c in csv[i])
{
if (firstWord)
{
if(c != ';')
{
lang1 += c;
} else
{
firstWord = false;
}
} else {
if (c != ';')
{
lang2 += c;
}
else
{
break;
}
}
}
words.Add(new Word(lang1, lang2, 1, i));
}
return new Lesson(lesName, words);
}
to return them as an Object called Lesson.
<Word kasten="1" id="24">
<lang1>eine Sekret�rin</lang1>
<lang2>une secr�taire</lang2>
</Word>
The reading method:
public void saveCsv(string path)
{
string line;
List<string> csv = new List<string>();
StreamReader file = new StreamReader(path);
while((line = file.ReadLine()) != null)
{
csv.Add(line);
}
file.Close();
AddLesson(controller.CsvToLesson(csv));
}
How can I fix this?
Thanks to Cid.
The Problem was that the csv file wasn't saved as utf-8 csv file. There are multiple options in Excel.
Take a look at that post, if you have the same problem:
How to check encoding of a CSV file

Deleting Text In Between Two Lines

I'm trying to remove lines that are in between two different lines. Currently, I have:
string s = "";
String path = #"C:\TextFile";
StreamWriter sw = new StreamWriter(path, true);
StreamReader sr = new StreamReader(path, true);
s = sr.ReadLine();
if (s=="#Start")
{
while (s != "#End")
{
sw.WriteLine(s);
//need something here to overwrite existing data with s not just add s
}
}
sr.Close();
sw.Close();
The content of my text file looks like this:
#Start
facebook.com
google.com
youtube.com
#End
I tried to follow Efficient way to delete a line from a text file however it deletes any file containing a certain character, whereas there are other lines outside of the range containing .com that I don't want to remove
I want to delete all the contents in between start and end so after the method runs the remains of the text file is
#Start
#End
You have two problems:
You're only reading the first line, and then you're using that one value everywhere. Clearly if s == "#Start", it can't also satisfy the condition s == "#End", etc.
Even if you were reading each line, you expect that after #End there will be no more data - you don't loop through the rest of the lines, you just stop writing. Based on your question, I think you want to write all lines from the file and only change those between #Start and #End.
-
Perhaps a constant loop as below would be better?:
string s;
bool inRewriteBlock = false;
while ((s = sr.ReadLine()) != null)
{
if (s == "#Start")
{
inRewriteBlock = true;
}
else if (s == "#End")
{
inRewriteBlock = false;
}
else if (inRewriteBlock)
{
sw.WriteLine(s);
//need something here to overwrite existing data with s not just add s
}
else
{
sw.WriteLine(s);
}
}
By default, the code will output every line it reads verbatim. However, if it reads #Start it will enter a special mode (inRewriteBlock == true) where you can rewrite those lines however you want. Once it reaches #End it will transition back into the default mode (inRewriteBlock == false).
You can simply do this: (This assumes file can be stored in memory)
string path = #"C:\\Users\\test\\Desktop\\Test.txt";
List<string> fileData = File.ReadAllLines(path).ToList();
// File.ReadAllLines(path).ToList().Select(y => y.Trim()).ToArray().ToList(); will remove all trailing/preceding spaces in data
int startsWith = fileData.IndexOf("#Start");
int endsWith = fileData.IndexOf("#End");
if(startsWith != -1 && endsWith != -1)
fileData.RemoveRange(startsWith+1, endsWith-1);
File.WriteAllLines("C:\\Test\\Test1.txt", fileData.ToArray());
It doesnt account for special scenarios like startsWith is at the end of the file with no endswith.
You should check and rewrite every thing between #Start and #End instead of file is start with "#Start" only.
You can try this:
//Read all text lines first
string[] readText = File.ReadAllLines(path);
//Open the text file to write
var oStream = new FileStream(path, FileMode.Truncate, FileAccess.Write, FileShare.Read);
StreamWriter sw = new System.IO.StreamWriter(oStream);
bool inRewriteBlock = false;
foreach (var s in readText)
{
if (s.Trim() == "#Start")
{
inRewriteBlock = true;
sw.WriteLine(s);
}
else if (s.Trim() == "#End")
{
inRewriteBlock = false;
sw.WriteLine(s);
}
else if (inRewriteBlock)
{
//REWRITE DATA HERE (IN THIS CASE IS DELETE LINE THEN DO NOTHING)
}
else
{
sw.WriteLine(s);
}
}
sw.Close();

C# Edit string in file - delete a character (000)

I am rookie in C#, but I need solve one Problem.
I have several text files in Folder and each text files has this structure:
IdNr 000000100
Name Name
Lastname Lastname
Sex M
.... etc...
Load all files from Folder, this is no Problem ,but i need delete "zero" in IdNr, so delete 000000 and 100 leave there. After this file save. Each files had other IdNr, Therefore, it is harder :(
Yes, it is possible each files manual edit, but when i have 3000 files, this is not good :)
Can C# one algorithm, which could this 000000 delete and leave only number 100?
Thank you All.
Vaclav
So, thank you ALL !
But in the End I have this Code :-) :
using System.IO;
namespace name
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Browse_Click(object sender, EventArgs e)
{
DialogResult dialog = folderBrowserDialog1.ShowDialog();
if (dialog == DialogResult.OK)
TP_zdroj.Text = folderBrowserDialog1.SelectedPath;
}
private void start_Click(object sender, EventArgs e)
{
try
{
foreach (string file in Directory.GetFiles(TP_zdroj.Text, "*.txt"))
{
string text = File.ReadAllText(file, Encoding.Default);
text = System.Text.RegularExpressions.Regex.Replace(text, "IdNr 000*", "IdNr ");
File.WriteAllText(file, text, Encoding.Default);
}
}
catch
{
MessageBox.Show("Warning...!");
return;
}
{
MessageBox.Show("Done");
}
}
}
}
Thank you ALL ! ;)
You can use int.Parse:
int number = int.Parse("000000100");
String withoutzeros = number.ToString();
According to your read/save file issue, do the files contain more than one record, is that the header or does each record is a list of key and value like "IdNr 000000100"? It's difficult to answer without these informations.
Edit: Here's a simple but efficient approach which should work if the format is strict:
var files = Directory.EnumerateFiles(path, "*.txt", SearchOption.TopDirectoryOnly);
foreach (var fPath in files)
{
String[] oldLines = File.ReadAllLines(fPath); // load into memory is faster when the files are not really huge
String key = "IdNr ";
if (oldLines.Length != 0)
{
IList<String> newLines = new List<String>();
foreach (String line in oldLines)
{
String newLine = line;
if (line.Contains(key))
{
int numberRangeStart = line.IndexOf(key) + key.Length;
int numberRangeEnd = line.IndexOf(" ", numberRangeStart);
String numberStr = line.Substring(numberRangeStart, numberRangeEnd - numberRangeStart);
int number = int.Parse(numberStr);
String withoutZeros = number.ToString();
newLine = line.Replace(key + numberStr, key + withoutZeros);
newLines.Add(line);
}
newLines.Add(newLine);
}
File.WriteAllLines(fPath, newLines);
}
}
Use TrimStart
var trimmedText = number.TrimStart('0');
This should do it. It assumes your files have a .txt extension, and it removes all occurrences of "000000" from each file.
foreach (string fileName in Directory.GetFiles("*.txt"))
{
File.WriteAllText(fileName, File.ReadAllText(fileName).Replace("000000", ""));
}
These are the steps you would want to take:
Loop each file
Read file line by line
for each line split on " " and remove leading zeros from 2nd element
write the new line back to a temp file
after all lines processed, delete original file and rename temp file
do next file
(you can avoid the temp file part by reading each file in full into memory, but depending on your file sizes this may not be practical)
You can remove the leading zeros with something like this:
string s = "000000100";
s = s.TrimStart('0');
Simply, read every token from the file and use this method:
var token = "000000100";
var result = token.TrimStart('0');
You can write a function similar to this one:
static IEnumerable<string> ModifiedLines(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
string[] tokens = line.Split(new char[] { ' ' });
line = string.Empty;
foreach (var token in tokens)
{
line += token.TrimStart('0') + " ";
}
yield return line;
}
}
}
Usage:
File.WriteAllLines(file, ModifiedLines(file));

Fastest way to find strings in a file

I have a log file that is not more than 10KB (File size can go up to 2 MB max) and I want to find if atleast one group of these strings occurs in the files. These strings will be on different lines like,
ACTION:.......
INPUT:...........
RESULT:..........
I need to know atleast if one group of above exists in the file. And I have do this about 100 times for a test (each time log is different, so I have reload and read the log), so I am looking for fastest and bets way to do this.
I looked up in the forums for finding the fastest way, but I dont think my file is too big for those silutions.
Thansk for looking.
I would read it line by line and check the conditions. Once you have seen a group you can quit. This way you don't need to read the whole file into memory. Like this:
public bool ContainsGroup(string file)
{
using (var reader = new StreamReader(file))
{
var hasAction = false;
var hasInput = false;
var hasResult = false;
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (!hasAction)
{
if (line.StartsWith("ACTION:"))
hasAction = true;
}
else if (!hasInput)
{
if (line.StartsWith("INPUT:"))
hasInput = true;
}
else if (!hasResult)
{
if (line.StartsWith("RESULT:"))
hasResult = true;
}
if (hasAction && hasInput && hasResult)
return true;
}
return false;
}
}
This code checks if there is a line starting with ACTION then one with INPUT and then one with RESULT. If the order of those is not important then you can omit the if () else if () checks. In case the line does not start with the strings replace StartsWith with Contains.
Here's one possible way to do it:
StreamReader sr;
string fileContents;
string[] logFiles = Directory.GetFiles(#"C:\Logs");
foreach (string file in logFiles)
{
using (StreamReader sr = new StreamReader(file))
{
fileContents = sr.ReadAllText();
if (fileContents.Contains("ACTION:") || fileContents.Contains("INPUT:") || fileContents.Contains("RESULT:"))
{
// Do what you need to here
}
}
}
You may need to do some variation based on your exact implementation needs - for example, what if the word spans two lines, does the line need to start with the word, etc.
Added
Alternate line-by-line check:
StreamReader sr;
string[] lines;
string[] logFiles = Directory.GetFiles(#"C:\Logs");
foreach (string file in logFiles)
{
using (StreamReader sr = new StreamReader(file)
{
lines = sr.ReadAllLines();
foreach (string line in lines)
{
if (line.Contains("ACTION:") || line.Contains("INPUT:") || line.Contains("RESULT:"))
{
// Do what you need to here
}
}
}
}
Take a look at How to Read Text From a File. You might also want to take a look at the String.Contains() method.
Basically you will loop through all the files. For each file read line-by-line and see if any of the lines contains 1 of your special "Sections".
You don't have much of a choice with text files when it comes to efficiency. The easiest way would definitely be to loop through each line of data. When you grab a line in a string, split it on the spaces. Then match those words to your words until you find a match. Then do whatever you need.
I don't know how to do it in c# but in vb it would be something like...
Dim yourString as string
Dim words as string()
Do While objReader.Peek() <> -1
yourString = objReader.ReadLine()
words = yourString.split(" ")
For Each word in words()
If Myword = word Then
do stuff
End If
Next
Loop
Hope that helps
This code sample searches for strings in a large text file. The words are contained in a HashSet. It writes the found lines in a temp file.
if (File.Exists(#"temp.txt")) File.Delete(#"temp.txt");
String line;
String oldLine = "";
using (var fs = File.OpenRead(largeFileName))
using (var sr = new StreamReader(fs, Encoding.UTF8, true))
{
HashSet<String> hash = new HashSet<String>();
hash.Add("house");
using (var sw = new StreamWriter(#"temp.txt"))
{
while ((line = sr.ReadLine()) != null)
{
foreach (String str in hash)
{
if (oldLine.Contains(str))
{
sw.WriteLine(oldLine);
// write the next line as well (optional)
sw.WriteLine(line + "\r\n");
}
}
oldLine = line;
}
}
}

Delete specific line from a text file?

I need to delete an exact line from a text file but I cannot for the life of me workout how to go about doing this.
Any suggestions or examples would be greatly appreciated?
Related Questions
Efficient way to delete a line from a text file (C#)
If the line you want to delete is based on the content of the line:
string line = null;
string line_to_delete = "the line i want to delete";
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
if (String.Compare(line, line_to_delete) == 0)
continue;
writer.WriteLine(line);
}
}
}
Or if it is based on line number:
string line = null;
int line_number = 0;
int line_to_delete = 12;
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number == line_to_delete)
continue;
writer.WriteLine(line);
}
}
}
The best way to do this is to open the file in text mode, read each line with ReadLine(), and then write it to a new file with WriteLine(), skipping the one line you want to delete.
There is no generic delete-a-line-from-file function, as far as I know.
One way to do it if the file is not very big is to load all the lines into an array:
string[] lines = File.ReadAllLines("filename.txt");
string[] newLines = RemoveUnnecessaryLine(lines);
File.WriteAllLines("filename.txt", newLines);
Hope this simple and short code will help.
List linesList = File.ReadAllLines("myFile.txt").ToList();
linesList.RemoveAt(0);
File.WriteAllLines("myFile.txt"), linesList.ToArray());
OR use this
public void DeleteLinesFromFile(string strLineToDelete)
{
string strFilePath = "Provide the path of the text file";
string strSearchText = strLineToDelete;
string strOldText;
string n = "";
StreamReader sr = File.OpenText(strFilePath);
while ((strOldText = sr.ReadLine()) != null)
{
if (!strOldText.Contains(strSearchText))
{
n += strOldText + Environment.NewLine;
}
}
sr.Close();
File.WriteAllText(strFilePath, n);
}
You can actually use C# generics for this to make it real easy:
var file = new List<string>(System.IO.File.ReadAllLines("C:\\path"));
file.RemoveAt(12);
File.WriteAllLines("C:\\path", file.ToArray());
This can be done in three steps:
// 1. Read the content of the file
string[] readText = File.ReadAllLines(path);
// 2. Empty the file
File.WriteAllText(path, String.Empty);
// 3. Fill up again, but without the deleted line
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (!s.Equals(lineToBeRemoved))
{
writer.WriteLine(s);
}
}
}
Read and remember each line
Identify the one you want to get rid
of
Forget that one
Write the rest back over the top of
the file
I cared about the file's original end line characters ("\n" or "\r\n") and wanted to maintain them in the output file (not overwrite them with what ever the current environment's char(s) are like the other answers appear to do). So I wrote my own method to read a line without removing the end line chars then used it in my DeleteLines method (I wanted the option to delete multiple lines, hence the use of a collection of line numbers to delete).
DeleteLines was implemented as a FileInfo extension and ReadLineKeepNewLineChars a StreamReader extension (but obviously you don't have to keep it that way).
public static class FileInfoExtensions
{
public static FileInfo DeleteLines(this FileInfo source, ICollection<int> lineNumbers, string targetFilePath)
{
var lineCount = 1;
using (var streamReader = new StreamReader(source.FullName))
{
using (var streamWriter = new StreamWriter(targetFilePath))
{
string line;
while ((line = streamReader.ReadLineKeepNewLineChars()) != null)
{
if (!lineNumbers.Contains(lineCount))
{
streamWriter.Write(line);
}
lineCount++;
}
}
}
return new FileInfo(targetFilePath);
}
}
public static class StreamReaderExtensions
{
private const char EndOfFile = '\uffff';
/// <summary>
/// Reads a line, similar to ReadLine method, but keeps any
/// new line characters (e.g. "\r\n" or "\n").
/// </summary>
public static string ReadLineKeepNewLineChars(this StreamReader source)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
char ch = (char)source.Read();
if (ch == EndOfFile)
return null;
var sb = new StringBuilder();
while (ch != EndOfFile)
{
sb.Append(ch);
if (ch == '\n')
break;
ch = (char)source.Read();
}
return sb.ToString();
}
}
Are you on a Unix operating system?
You can do this with the "sed" stream editor. Read the man page for "sed"
What?
Use file open, seek position then stream erase line using null.
Gotch it? Simple,stream,no array that eat memory,fast.
This work on vb.. Example search line culture=id where culture are namevalue and id are value and we want to change it to culture=en
Fileopen(1, "text.ini")
dim line as string
dim currentpos as long
while true
line = lineinput(1)
dim namevalue() as string = split(line, "=")
if namevalue(0) = "line name value that i want to edit" then
currentpos = seek(1)
fileclose()
dim fs as filestream("test.ini", filemode.open)
dim sw as streamwriter(fs)
fs.seek(currentpos, seekorigin.begin)
sw.write(null)
sw.write(namevalue + "=" + newvalue)
sw.close()
fs.close()
exit while
end if
msgbox("org ternate jua bisa, no line found")
end while
that's all..use #d

Categories