Dealing with fields containing unescaped double quotes with TextFieldParser - c#

I am trying to import a CSV file using TextFieldParser. A particular CSV file is causing me problems due to its nonstandard formatting. The CSV in question has its fields enclosed in double quotes. The problem appears when there is an additional set of unescaped double quotes within a particular field.
Here is an oversimplified test case that highlights the problem. The actual CSV files I am dealing with are not all formatted the same and have dozens of fields, any of which may contain these possibly tricky formatting issues.
TextReader reader = new StringReader("\"Row\",\"Test String\"\n" +
"\"1\",\"This is a test string. It is parsed correctly.\"\n" +
"\"2\",\"This is a test string with a comma, which is parsed correctly\"\n" +
"\"3\",\"This is a test string with double \"\"double quotes\"\". It is parsed correctly\"\n" +
"\"4\",\"This is a test string with 'single quotes'. It is parsed correctly\"\n" +
"5,This is a test string with fields that aren't enclosed in double quotes. It is parsed correctly.\n" +
"\"6\",\"This is a test string with single \"double quotes\". It can't be parsed.\"");
using (TextFieldParser parser = new TextFieldParser(reader))
{
parser.Delimiters = new[] { "," };
while (!parser.EndOfData)
{
string[] fields= parser.ReadFields();
Console.WriteLine("This line was parsed as:\n{0},{1}",
fields[0], fields[1]);
}
}
Is there anyway to properly parse a CSV with this type of formatting using TextFieldParser?

I agree with Hans Passant's advice that it is not your responsibility to parse malformed data. However, in accord with the Robustness Principle, some one faced with this situation may attempt to handle specific types of malformed data. The code I wrote below works on the data set specified in the question. Basically it detects the parser error on the malformed line, determines if it is double-quote wrapped based on the first character, and then splits/strips all the wrapping double-quotes manually.
using (TextFieldParser parser = new TextFieldParser(reader))
{
parser.Delimiters = new[] { "," };
while (!parser.EndOfData)
{
string[] fields = null;
try
{
fields = parser.ReadFields();
}
catch (MalformedLineException ex)
{
if (parser.ErrorLine.StartsWith("\""))
{
var line = parser.ErrorLine.Substring(1, parser.ErrorLine.Length - 2);
fields = line.Split(new string[] { "\",\"" }, StringSplitOptions.None);
}
else
{
throw;
}
}
Console.WriteLine("This line was parsed as:\n{0},{1}", fields[0], fields[1]);
}
}
I'm sure it is possible to concoct a pathological example where this fails (e.g. commas adjacent to double-quotes within a field value) but any such examples would probably be unparseable in the strictest sense, whereas the problem line given in the question is decipherable despite being malformed.

Jordan's solution is quite good, but it makes an incorrect assumption that the error line will always begin with a double-quote. My error line was this:
170,"CMS ALT",853,,,NON_MOVEX,COM,NULL,"2014-04-25","" 204 Route de Trays"
Notice the last field had extra/unescaped double quotes, but the first field was fine. So Jordan's solution didn't work. Here is my modified solution based on Jordan's:
using(TextFieldParser parser = new TextFieldParser(new StringReader(csv))) {
parser.Delimiters = new [] {","};
while (!parser.EndOfData) {
string[] fields = null;
try {
fields = parser.ReadFields();
} catch (MalformedLineException ex) {
string errorLine = SafeTrim(parser.ErrorLine);
fields = errorLine.Split(',');
}
}
}
You may want to handle the catch block differently, but the general concept works great for me.

It may be easier to just do this manually, and it would certainly give you more control:
Edit:
For your clarified example, i still suggest manually handling the parsing:
using System.IO;
string[] csvFile = File.ReadAllLines(pathToCsv);
foreach (string line in csvFile)
{
// get the first comma in the line
// everything before this index is the row number
// everything after is the row value
int firstCommaIndex = line.IndexOf(',');
//Note: SubString used here is (startIndex, length)
string row = line.Substring(0, firstCommaIndex+1);
string rowValue = line.Substring(firstCommaIndex+1).Trim();
Console.WriteLine("This line was parsed as:\n{0},{1}",
row, rowValue);
}
For a generic CSV that does not allow commas in the fields:
using System.IO;
string[] csvFile = File.ReadAllLines(pathToCsv);
foreach (string line in csvFile)
{
string[] fields = line.Split(',');
Console.WriteLine("This line was parsed as:\n{0},{1}",
fields[0], fields[1]);
}

Working Solution :
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = false;
string[] colFields = csvReader.ReadFields();
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
for (i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
else
{
if (fieldData[i][0] == '"' && fieldData[i][fieldData[i].Length - 1] == '"')
{
fieldData[i] = fieldData[i].Substring(1, fieldData[i].Length - 2);
}
}
}
csvData.Rows.Add(fieldData);
}
}

If you dont set HasFieldsEnclosedInQuotes = true the resultant list of columns will be more if the data contains (,) comma.
e.g
"Col1","Col2","Col3"
"Test1", 100, "Test1,Test2"
"Test2", 200, "Test22"
This file should have 3 columns but while parsing you will get 4 fields which is wrong.

Please set HasFieldsEnclosedInQuotes = true on TextFieldParser object before you start reading file.

Related

What delimiter can be used for .csv that has "," in it's column

I want to read .csv file and write to other .csv file.
I use streamwriter and use string.split with delimiter (',').
using (StreamWriter file = new StreamWriter(destFile, true))
{
string lines = System.IO.File.ReadAllLines(inputFile);
foreach (string line in lines)
{
if (line != lines[0])
{
string[] values = line.Split(',');
file.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9}",
values[43], values[0], values[11], values[12], values[13], values[15], values[14], values[28], values[22], values[9]);
}}}
But, there are few column that has , in it's data such as shown below, thus producing incorrect output because the program has count it as the delimiter.
I have tried using tinyCSVParser library but it also using delimiter which will produce same result. When I change to CSVHelper library, it does not use delimiter but because the input file has column with name public double B/S, thus I stuck there because the properties cannot accept that name.
[Name("B/S")]
private double p = 0;
public double B/S
{
get
{
return p;
}
set
{
double result;
result = double.Parse(Principal) * value / Day / 100;
p = Math.Abs(result);
}
}
What should I replace the delimiter with?
If i understand you well...
Try this:
using (StreamWriter file = new StreamWriter(destFile, true))
{
string lines = System.IO.File.ReadAllLines(inputFile);
foreach (string line in lines)
{
string[] delimiter = line==lines[0] ? new string[]{","} : new string[]{"\",\"", "\""};
string[] values = line.Split(delimiter, StringSplitOptions.RemoveEmptyEntries);
//...
}
}
EDIT
Check this out: .NET Fiddle
Note: if you want to split string by qoutation mark, you need to use backslash in a string. See: How to: Put Quotation Marks in a String (Windows Forms)
I recommend you use a CSV-library to write csv-file. But you can check how to format csv in the definition RFC-4180.
Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields.

C# Split a string and build a stringarray out of the string [duplicate]

I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?
To split on a string you need to use the overload that takes an array of strings:
string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:
string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
What about using a StringReader?
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...
Instead, use an iterator like this;
public static IEnumerable<string> SplitToLines(this string input)
{
if (input == null)
{
yield break;
}
using (System.IO.StringReader reader = new System.IO.StringReader(input))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
This will allow you to do a more memory efficient loop around your data;
foreach(var line in document.SplitToLines())
{
// one line at a time...
}
Of course, if you want it all in memory, you can do this;
var allTheLines = document.SplitToLines().ToArray();
You should be able to split your string pretty easily, like so:
aString.Split(Environment.NewLine.ToCharArray());
Based on Guffa's answer, in an extension class, use:
public static string[] Lines(this string source) {
return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Regex is also an option:
private string[] SplitStringByLineFeed(string inpString)
{
string[] locResult = Regex.Split(inpString, "[\r\n]+");
return locResult;
}
For a string variable s:
s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)
This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.
This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:
var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);
What not to do:
Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.
The following block of code extends the string object so that it is available as a natural method when working with strings.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;
namespace System
{
public static class StringExtensions
{
public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
{
return s.Split(new string[] { delimiter }, options);
}
}
}
You can now use the .Split() function from any string as follows:
string[] result;
// Pass a string, and the delimiter
result = string.Split("My simple string", " ");
// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");
// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);
To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.
Comment: It would be nice if Microsoft implemented this overload.
Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:
var lines = input
.ReplaceLineEndings()
.Split(Environment.NewLine, StringSplitOptions.None);
I'm currently using this function (based on other answers) in VB.NET:
Private Shared Function SplitLines(text As String) As String()
Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function
It tries to split on the platform-local newline first, and then falls back to each possible newline.
I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.
Here's how to join the lines back up, for good measure:
Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
Return String.Join(Environment.NewLine, lines)
End Function
Well, actually split should do:
//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);
//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
Console.WriteLine("{0}: {1}", i, splitted[i]);
}
string[] lines = text.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyStrings);
The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r
(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.
I did not know about Environment.Newline, but I guess this is a very good solution.
My try would have been:
string str = "Test Me\r\nTest Me\nTest Me";
var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();
The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.
EDIT:
As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.
Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:
string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
{
//Spit each string into a n-line length list of strings
var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
//Check if there are any characters left after split, if so add the rest
if(txt.Length > ((txt.Length / n)*n) )
Lines.Add(txt.Substring((txt.Length/n)*n));
//Create return text, with extras
string txtReturn = "";
foreach (string Line in Lines)
txtReturn += AddBefore + Line + AddAfterExtra + Environment.NewLine;
return txtReturn;
}
Presenting a RSA-key with 33 chars width and quotes are then simply
Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));
Output:
Hopefully someone find it usefull...
Silly answer: write to a temporary file so you can use the venerable
File.ReadLines
var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
writer.Write(s);
}
var lines = File.ReadLines(path);
using System.IO;
string textToSplit;
if (textToSplit != null)
{
List<string> lines = new List<string>();
using (StringReader reader = new StringReader(textToSplit))
{
for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
lines.Add(line);
}
}
}
Very easy, actually.
VB.NET:
Private Function SplitOnNewLine(input as String) As String
Return input.Split(Environment.NewLine)
End Function
C#:
string splitOnNewLine(string input)
{
return input.split(environment.newline);
}

Find string in txt file using a list c# [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to find out if a .txt file contains words stored in a list named Abreviated. This list is filled by reading values from a csv file as shown below;
StreamReader sr = new StreamReader(#"C:\textwords.csv");
string TxtWrd = sr.ReadLine();
while ((TxtWrd = sr.ReadLine()) != null)
{
Words = TxtWrd.Split(Seperators, StringSplitOptions.None);
Abreviated.Add(Words[0]);
Expanded.Add(Words[1]);
}
I would like to use this list to check if a .txt file contains any of the words in the list. The .txt file is being read using a streamreader and is stored as a string FileContent. the code i have to try and find the matches is below;
if (FC.Contains(Abreviated.ToString()))
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
This will always return the else statement even though one of the words is in the text file.
any advice on how to get this working?
Thanks in advance!
You can use key-value pair data structure for storing abbreviated word and respective full word as key-value pair. In C#, Dictionary has generic implementation for storing key value pair.
I've refactored your code which makes easy to reuse.
internal class FileParser
{
internal Dictionary<string, string> WordDictionary = new Dictionary<string, string>();
private string _filePath;
private char Seperators => ',';
internal FileParser(string filePath)
{
_filePath = filePath;
}
internal void Parse()
{
StreamReader sr = new StreamReader(_filePath);
string TxtWrd = sr.ReadLine();
while ((TxtWrd = sr.ReadLine()) != null)
{
var words = TxtWrd.Split(Seperators, StringSplitOptions.None);
//WordDictionary.TryAdd(Words[0], Words[1]); // available in .NET corefx https://github.com/dotnet/corefx/issues/1942
if (!WordDictionary.ContainsKey(words[0]))
WordDictionary.Add(words[0], words[1]);
}
}
internal bool IsWordAvailable(string word)
{
return WordDictionary.ContainsKey(word);
}
}
Now, you can reuse above class within your assembly like in following way :
public class Program
{
public static void Main(string[] args)
{
var fileParser = new FileParser(#"C:\textwords.csv");
if(fileParser.IsWordAvailable("abc"))
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
}
}
You are comparing your entire file's content to the string representation of a collections of words. You need to compare each individual word found in the file content to your abbreviated list. One way you could do the comparison is to split the file content into individual words and then look those up individually against your abbreviated list.
string[] fileWords = FC.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
bool hasMatch = false;
for(string fileWord : fileWords)
{
if(Abbreviated.Contains(fileWord))
{
hasMatch = true;
break;
}
}
if (hasMatch)
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
I would recommend switching your abbreviated collection to a HashSet or a Dictionary that also includes your matching expanded text for the abbreviation. Also, there are probably alternate ways to do the search you are looking for with regex.
I'm unsure on what some of your variables are so this may be slightly different to what you have, but gives the same functionality.
static void Main(string[] args)
{
List<string> abbreviated = new List<string>();
List<string> expanded = new List<string>();
StreamReader sr = new StreamReader("textwords.csv");
string TxtWrd = "";
while ((TxtWrd = sr.ReadLine()) != null)
{
Debug.WriteLine("line: " + TxtWrd);
string[] Words = TxtWrd.Split(new char[] { ',' } , StringSplitOptions.None);
abbreviated.Add(Words[0]);
expanded.Add(Words[1]);
}
if (abbreviated.Contains("wuu2"))
{
//show message box
} else
{
//don't
}
}
As mentioned in one of the comments, a Dictionary might be better suited for this.
This assumes that the data in your file is in the following format, with a new set on each line.
wuu2,what are you up to
If all you want to do is check if a text file contains words in your list, you can read the entire contents of the file into a string (instead of line by line), split the string on your separators, and then check if the intersection of the words in the text file and your list of words has any items:
// Get the "separators" into a list
var wordsFile = #"c:\public\temp\textWords.csv"; // (#"C:\textwords.csv");
var separators = File.ReadAllText(wordsFile).Split(',');
// Get the words of the file into a list (add more delimeters as necessary)
var txtFile = #"c:\public\temp\temp.txt";
var allWords = File.ReadAllText(txtFile).Split(new[] {' ', '.', ',', ';', ':', '\r', '\n'});
// Get the intersection of the file words and the separator words
var commonWords = allWords.Intersect(separators).ToList().Distinct();
if (commonWords.Any())
{
Console.WriteLine("The text file contains the following matching words:");
Console.WriteLine(string.Join(", ", commonWords));
}
else
{
Console.WriteLine("The file did not contain any matching words.");
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();

Use continue key word to processed with the loop

I am reading data from excel file(which is actually a comma separated csv file) columns line-by-line, this file gets send by an external entity.Among the columns to be read is the time, which is in 00.00 format, so a split method is used read all the different columns, however the file sometimes comes with extra columns(commas between the elements) so the split elements are now always correct. Below is the code used to read and split the different columns, this elements will be saved in the database.
public void SaveFineDetails()
{
List<string> erroredFines = new List<string>();
try
{
log.Debug("Start : SaveFineDetails() - Saving Downloaded files fines..");
if (!this.FileLines.Any())
{
log.Info(string.Format("End : SaveFineDetails() - DataFile was Empty"));
return;
}
using (RAC_TrafficFinesContext db = new RAC_TrafficFinesContext())
{
this.FileLines.RemoveAt(0);
this.FileLines.RemoveAt(FileLines.Count - 1);
int itemCnt = 0;
int errorCnt = 0;
int duplicateCnt = 0;
int count = 0;
foreach (var line in this.FileLines)
{
count++;
log.DebugFormat("Inserting {0} of {1} Fines..", count.ToString(), FileLines.Count.ToString());
string[] bits = line.Split(',');
int bitsLength = bits.Length;
if (bitsLength == 9)
{
string fineNumber = bits[0].Trim();
string vehicleRegistration = bits[1];
string offenceDateString = bits[2];
string offenceTimeString = bits[3];
int trafficDepartmentId = this.TrafficDepartments.Where(tf => tf.DepartmentName.Trim().Equals(bits[4], StringComparison.InvariantCultureIgnoreCase)).Select(tf => tf.DepartmentID).FirstOrDefault();
string proxy = bits[5];
decimal fineAmount = GetFineAmount(bits[6]);
DateTime fineCreatedDate = DateTime.Now;
DateTime offenceDate = GetOffenceDate(offenceDateString, offenceTimeString);
string username = Constants.CancomFTPServiceUser;
bool isAartoFine = bits[7] == "1" ? true : false;
string fineStatus = "Sent";
try
{
var dupCheck = db.GetTrafficFineByNumber(fineNumber);
if (dupCheck != null)
{
duplicateCnt++;
string ExportFileName = (base.FileName == null) ? string.Empty : base.FileName;
DateTime FileDate = DateTime.Now;
db.CreateDuplicateFine(ExportFileName, FileDate, fineNumber);
}
else
{
var adminFee = db.GetAdminFee();
db.UploadFTPFineData(fineNumber, fineAmount, vehicleRegistration, offenceDate, offenceDateString, offenceTimeString, trafficDepartmentId, proxy, false, "Imported", username, adminFee, isAartoFine, dupCheck != null, fineStatus);
}
itemCnt++;
}
catch
{
errorCnt++;
}
}
else
{
erroredFines.Add(line);
continue;
}
}
Now the problem is, this file doesn't always come with 9 elements as we expect, for example on this image, the lines are not the same(ignore first line, its headers)
On first line FM is supposed to be part of 36DXGP instead of being two separated elements. This means the columns are now extra. Now this brings us to the issue at hand, which is the time element, beacuse of extra coma, the time is now something else, is now read as 20161216, so the split on the time element is not working at all. So what I did was, read the incorrect line, check its length, if the length is not 9 then, add it to the error list and continue.
But my continue key word doesn't seem to work, it gets into the else part and then goes back to read the very same error line.
I have checked answers on Break vs Continue and they provide good example on how continue works, I introduced the else because the format on this examples did not work for me(well the else did not made any difference neither). Here is the sample data,
NOTE the first line to be read starts with 96
H,1789,,,,,,,,
96/17259/801/035415,FM,36DXGP,20161216,17.39,city hall-cape town,Makofane,200,0,0
MA/80/034808/730,CA230721,20170117,17.43,malmesbury,PATEL,200,0,0,
what is it that I am doing so wrong here
I have found a way to solve my problem, there was an issue with the length of the line because of the trailing comma which caused an empty element, I then got rid of this empty element with this code and determined the new length
bits = bits.Where(x => !string.IsNullOrEmpty(x)).ToArray();
int length = bits.Length
All is well now
I suggest you use the following overload for performance and readability reasons:
line.Split(new char[] {','}, StringSplitOptions.RemoveEmptyEntries)l

Parse Text File Into Dictionary

I have a text file that has several hundred configuration values. The general format of the configuration data is "Label:Value". Using C# .net, I would like to read these configurations, and use the Values in other portions of the code. My first thought is that I would use a string search to look for the Labels then parse out the values following the labels and add them to a dictionary, but this seems rather tedious considering the number of labels/values that I would have to search for. I am interested to hear some thoughts on a possible architecture to perform this task. I have included a small section of a sample text file that contains some of the labels and values (below). A couple of notes: The Values are not always numeric (as seen in the AUX Serial Number); For whatever reason, the text files were formatted using spaces (\s) rather than tabs (\t). Thanks in advance for any time you spend thinking about this.
Sample Text:
AUX Serial Number: 445P000023 AUX Hardware Rev: 1
Barometric Pressure Slope: -1.452153E-02
Barometric Pressure Intercept: 9.524336E+02
This is a nice little brain tickler. I think this code might be able to point you in the right direction. Keep in mind, this fills a Dictionary<string, string>, so there are no conversions of values into ints or the like. Also, please excuse the mess (and the poor naming conventions). It was a quick write-up based on my train of thought.
Dictionary<string, string> allTheThings = new Dictionary<string, string>();
public void ReadIt()
{
// Open the file into a streamreader
using (System.IO.StreamReader sr = new System.IO.StreamReader("text_path_here.txt"))
{
while (!sr.EndOfStream) // Keep reading until we get to the end
{
string splitMe = sr.ReadLine();
string[] bananaSplits = splitMe.Split(new char[] { ':' }); //Split at the colons
if (bananaSplits.Length < 2) // If we get less than 2 results, discard them
continue;
else if (bananaSplits.Length == 2) // Easy part. If there are 2 results, add them to the dictionary
allTheThings.Add(bananaSplits[0].Trim(), bananaSplits[1].Trim());
else if (bananaSplits.Length > 2)
SplitItGood(splitMe, allTheThings); // Hard part. If there are more than 2 results, use the method below.
}
}
}
public void SplitItGood(string stringInput, Dictionary<string, string> dictInput)
{
StringBuilder sb = new StringBuilder();
List<string> fish = new List<string>(); // This list will hold the keys and values as we find them
bool hasFirstValue = false;
foreach (char c in stringInput) // Iterate through each character in the input
{
if (c != ':') // Keep building the string until we reach a colon
sb.Append(c);
else if (c == ':' && !hasFirstValue)
{
fish.Add(sb.ToString().Trim());
sb.Clear();
hasFirstValue = true;
}
else if (c == ':' && hasFirstValue)
{
// Below, the StringBuilder currently has something like this:
// " 235235 Some Text Here"
// We trim the leading whitespace, then split at the first sign of a double space
string[] bananaSplit = sb.ToString()
.Trim()
.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
// Add both results to the list
fish.Add(bananaSplit[0].Trim());
fish.Add(bananaSplit[1].Trim());
sb.Clear();
}
}
fish.Add(sb.ToString().Trim()); // Add the last result to the list
for (int i = 0; i < fish.Count; i += 2)
{
// This for loop assumes that the amount of keys and values added together
// is an even number. If it comes out odd, then one of the lines on the input
// text file wasn't parsed correctly or wasn't generated correctly.
dictInput.Add(fish[i], fish[i + 1]);
}
}
So the only general approach that I can think of, given the format that you're limited to, is to first find the first colon on the line and take everything before it as the label. Skip all whilespace characters until you get to the first non-whitespace character. Take all non-whitespace characters as the value of the label. If there is a colon after the end of that value take everything after the end of the previous value to the colon as the next value and repeat. You'll also probably need to trim whitespace around the labels.
You might be able to capture that meaning with a regex, but it wouldn't likely be a pretty one if you could; I'd avoid it for something this complex unless you're entire development team is very proficient with them.
I would try something like this:
While string contains triple space, replace it with double space.
Replace all ": " and ": " (: with double space) with ":".
Replace all " " (double space) with '\n' (new line).
If line don't contain ':' than skip the line. Else, use string.Split(':'). This way you receive arrays of 2 strings (key and value). Some of them may contain empty characters at the beginning or at the end.
Use string.Trim() to get rid of those empty characters.
Add received key and value to Dictionary.
I am not sure if it solves all your cases but it's a general clue how I would try to do it.
If it works you could think about performance (use StringBuilder instead of string wherever it is possible etc.).
This is probably the dirtiest function I´ve ever written, but it works.
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] data = line.Split(':');
if (line != String.Empty)
{
for (int i = 0; i < data.Length - 1; i++)
{
if (i != 0)
{
bool isPair;
if (i % 2 == 0)
{
isPair = true;
}
else
{
isPair = false;
}
if (isPair)
{
string keyOdd = data[i].Trim();
try { keyOdd = keyOdd.Substring(keyOdd.IndexOf(' ')).TrimStart(); }
catch { }
string valueOdd = data[i + 1].TrimStart();
try { valueOdd = valueOdd.Remove(valueOdd.IndexOf(' ')); } catch{}
yourDic.Add(keyOdd, valueOdd);
}
else
{
string keyPair = data[i].TrimStart();
keyPair = keyPair.Substring(keyPair.IndexOf(' ')).Trim();
string valuePair = data[i + 1].TrimStart();
try { valuePair = valuePair.Remove(valuePair.IndexOf(' ')); } catch { }
yourDic.Add(keyPair, valuePair);
}
}
else
{
string key = data[i].Trim();
string value = data[i + 1].TrimStart();
try { value = value.Remove(value.IndexOf(' ')); } catch{}
yourDic.Add(key, value);
}
}
}
}
How does it works?, well splitting the line you can know what you can get in every position of the array, so I just play with the even and odd values.
You will understand me when you debug this function :D. It fills the Dictionary that you need.
I have another idea. Does values contain spaces? If not you could do like this:
Ignore white spaces until you read some other char (first char of key).
Read string until ':' occures.
Trim key that you get.
Ignore white spaces until you read some other char (first char of value).
Read until you get empty char.
Trim value that you get.
If it is the end than stop. Else, go back to step 1.
Good luck.
Maybe something like this would work, be careful with the ':' character
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
yourDic.Add(line.Split(':')[0], line.Split(':')[1]);
}
Anyway, I recommend to organize that file in some way that you´ll always know in what format it comes.

Categories