C# parsing and arrays - c#

I'm making a program that parses some data, and somehow I'm not receiving what I need.
I have data in a file in the following order:
1111
username
email#email.com
IMAGE01: http://www.1234567890.net/image/cc_141019050341.png
So I made an array named "lines" with one data per line of text in the file, and then:
this.videoId = lines[0];
this.clientUser = lines[1];
this.clientEmail = lines[2];
this.textLines = new List<string>();
this.imageLines = new Dictionary<int,string>();
for (int i = 3; i < lines.Length; i++)
{
if (lines[i].Contains("IMAGE"))
{
int imgNumber = Int32.Parse(
lines[i].Substring(Math.Max(0, lines[i].Length - 10), 2)
);
this.imageLines.Add(imgNumber, lines[i].Substring(Math.Max(0, lines[i].Length - 7)));
}
else
{
this.textLines.Add(lines[i]);
}
}
Then I put each parsed data into a different .txt file:
using (StreamWriter emailTxt = new StreamWriter(#"txt/" + "user_email.txt"))
{
emailTxt.Write(nek.clientEmail);
}
using (StreamWriter userTxt = new StreamWriter(#"txt/" + "user_data.txt"))
{
userTxt.Write(nek.clientUser + Environment.NewLine + unixTime);
}
using (StreamWriter imageTxt = new StreamWriter(#"txt/" + "user_images.txt"))
{
foreach (KeyValuePair<int, string> kp in nek.imageLines)
{
imageTxt.WriteLine(string.Format("{0:00}: {1}", kp.Key, kp.Value));
}
}
But, somehow I'm retrieving all data good, except imageTxt which should be:
http://www.1234567890.net/image/cc_141019050341.png
I'm receiving:
05: 341.png
Any ideas why? Thank you for your time.

your substrings are hitting cc_141019050341.png
cc_141019(first one extracts this 05)0(second one extracts this 341.png )
I would suggest you use regex to extract the parts you want
something like
IMAGE(?<num>\d+).*?:\s(?<url>.*)
you can use it in your code like
var match = new Regex(#"IMAGE(?<num>\d+).*?:\s(?<url>.*)").Match(line[i]]);
if (match.Success)
{
var url = match.Groups["url"];
var strNum = match.Groups["num"];
}

Related

C# - Split CSV File by Removing Bad Rows

I have a csv file with 2 million rows and file size of 2 GB. But due to a couple of free text form columns, these contain redundant CRLF and cause the file to not load in the SQL Server table. I get an error that the last column does not end with ".
I have the following code, but it gives an OutOfMemoryException when reading from fileName. The line is:
var lines = File.ReadAllLines(fileName);
How can I fix it? Ideally, I would like to split the file into two good and bad rows. Or delete rows that do not end with "CRLF.
int goodRow = 0;
int badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
String lineOut = string.Empty;
String str = string.Empty;
var lines = File.ReadAllLines(fileName);
StringBuilder sbGood = new StringBuilder();
StringBuilder sbBad = new StringBuilder();
foreach (string line in lines)
{
if (line.Contains(charGood))
{
goodRow++;
sbGood.AppendLine(line);
}
else
{
badRow++;
sbBad.AppendLine(line);
}
}
if (badRow > 0)
{
File.WriteAllText(badRowFileName, sbBad.ToString());
}
if (goodRow > 0)
{
File.WriteAllText(goodRowFileName, sbGood.ToString());
}
sbGood.Clear();
sbBad.Clear();
msg = msg + "Good Rows - " + goodRow.ToString() + " Bad Rows - " + badRow.ToString() + " Done.";
You can translate that code like this to be much more efficient:
int goodRow = 0, badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
using (var lines = File.ReadLines(fileName))
using (var swGood = new StreamWriter(goodRowFileName))
using (var swBad = new StreamWriter(badRowFileName))
{
foreach (string line in lines)
{
if (line.Contains(charGood))
{
goodRow++;
swGood.WriteLine(line);
}
else
{
badRow++;
swBad.WriteLine(line);
}
}
}
msg += $"Good Rows: {goodRow,9} Bad Rows: {badRow,9} Done.";
But I'd also look at using a real csv parser for this. There are plenty on NuGet. That might even let you clean up the data on the fly.
I would not suggest reading the entire file into memory, then processing the file, then writing all modified contents out to the new file.
Instead using file streams:
using (var rdr = new StreamReader(fileName))
using (var wrtrGood = new StreamWriter(goodRowFileName))
using (var wrtrBad = new StreamWriter(badRowFileName))
{
string line = null;
while ((line = rdr.ReadLine()) != null)
{
if (line.Contains(charGood))
{
goodRow++;
wrtr.WriteLine(line);
}
else
{
badRow++;
wrtrBad.WriteLine(line);
}
}
}

Using a CSV parser, I am trying to select multiple items from the CSV. It won't let me choose more than two however

Like I said, I'm using a CSV parser which can be found here.The CSV Parser
I am able to successfully run the code using this
using (var stream = File.OpenRead(fileSaveLocation))
using (var reader = new StreamReader(stream))
{
var data = CsvParser.ParseHeadAndTail(reader, ',', '"');
var header2 = data.Item1;
var lines = data.Item2;
foreach (var line in lines.Take(5))
{
for (var i = 0; i < header2.Count; i++)
if (!string.IsNullOrEmpty(line[i]))
{
sb.Append(header2[i] + "=" + line[i]);
sb.Append(Environment.NewLine);
}
}
}
But I want to be able to select about 10 items. So if I try to add a new variable like:
var test = data.Item3;
It won't work.
When I do try to run it, it tells me this:
Error 1 'System.Tuple,System.Collections.Generic.IEnumerable>>'
does not contain a definition for 'Item3' and no extension method
'Item3' accepting a first argument of type
'System.Tuple,System.Collections.Generic.IEnumerable>>'
could be found (are you missing a using directive or an assembly
reference?) C:\repo\Scriptalizer\default.aspx.cs 82 37 Scriptalizer(1)
It will throw an error before I ever try to run the program. It says it cannot resolve Item3. How can I get it to let me put as many columns as I want?
Also, is there a way to dynamically select items? Say the user can input "ignore the first 3 lines" for example, how could I declare these and get the correct columns?
Haven't used this CSV parsing library in particular, but it sounds like you need the following:
using (var stream = File.OpenRead(fileSaveLocation))
{
using (var reader = new StreamReader(stream))
{
// Get the header and rows as a two-item tuple
var data = CsvParser.ParseHeadAndTail(reader, ',', '"');
// Get header and and rows into separate variables
var header2 = data.Item1;
var lines = data.Item2;
// This is where you get the rows you want
// So in this example, we'll skip the first
// 3 lines and then get the next 10 lines
var filteredLines = lines.Skip(3).Take(10);
// Iterate through the lines and do whatever you need to do
foreach (var line in filteredLines)
{
for (var i = 0; i < header2.Count; i++)
{
if (!string.IsNullOrEmpty(line[i]))
{
sb.Append(header2[i] + "=" + line[i]);
sb.Append(Environment.NewLine);
}
}
}
}
}
This is what I ended up doing which worked.
using (var stream = File.OpenRead(fileSaveLocation))
using (var reader = new StreamReader(stream))
{
var data = CsvParser.ParseHeadAndTail(reader, ',', '"');
var header2 = data.Item1;
var lines = data.Item2;
for (int i = 4; i < header2.Count; i++)
sb.Append(("\"" + header2[i] + "\"" + ","));
sb.Append(sb.ToString().TrimEnd(','));
sb.Append(Environment.NewLine);
foreach (var line in lines)
{
for (var i = 4; i < header2.Count; i++)
{
sb.Append("\"" + line[i] + "\"");
if (i != header2.Count - 1)
{
sb.Append(",");
}
}
sb.Append(Environment.NewLine);
}
}

Change the name of headers in CSV file using CSVHelper in C#

I am using CSV Helper library to produce CSV files for the user to
to populate and upload into the system. My issue is that the WriteHeader method just writes the attributes of a class with names like "PropertyValue", which is not user friendly. Is there a method I can use to make the text produced user friendly but is still able to successfully map the class to the files data?
My code looks like the following:
public ActionResult UploadPropertyCSV(HttpPostedFileBase file)
{
List<PropertyModel> properties = new List<PropertyModel>();
RIMEDb dbContext = new RIMEDb();
bool success = false;
foreach (string requestFiles in Request.Files)
{
if (file != null && file.ContentLength > 0 && file.FileName.EndsWith(".csv"))
{
using(StreamReader str = new StreamReader(file.InputStream))
{
using(CsvHelper.CsvReader theReader = new CsvHelper.CsvReader(str))
{
while (theReader.Read())
{
RIMUtil.PropertyUploadCSVRowHelper row = new RIMUtil.PropertyUploadCSVRowHelper()
{
UnitNumber = theReader.GetField(0),
StreetNumber = theReader.GetField(1),
StreetName = theReader.GetField(2),
AlternateAddress = theReader.GetField(3),
City = theReader.GetField(4)
};
Property property = new Property();
property.UnitNumber = row.UnitNumber;
property.StreetNumber = row.StreetNumber;
property.StreetName = row.StreetName;
property.AlternateAddress = row.AlternateAddress;
property.City = dbContext.PostalCodes.Where(p => p.PostalCode1 == row.PostalCode).FirstOrDefault().City;
dbContext.Properties.Add(property);
try
{
dbContext.SaveChanges();
success = true;
}
catch(System.Data.Entity.Validation.DbEntityValidationException ex)
{
success = false;
RIMUtil.LogError("Ptoblem validating fields in database. Please check your CSV file for errors.");
}
catch(Exception e)
{
RIMUtil.LogError("Error saving property to database. Please check your CSV file for errors.");
}
}
}
}
}
}
return Json(success);
}
I'm wondering if theres some metadata tag or something I can put on top of each attribute in my PropertyUploadCSVRowHelper class to put the text I want produced in the file
Thanks in advance
Not sure if this existed 2 years ago but now, we can change the property/column name by using the following attribute function:
[CsvHelper.Configuration.Attributes.Name("Column/Field Name")]
Full code:
using CsvHelper;
using System.Collections.Generic;
using System.IO;
namespace Test
{
class Program
{
class CsvColumns
{
private string column_01;
[CsvHelper.Configuration.Attributes.Name("Column 01")] // changes header/column name Column_01 to Column 01
public string Column_01 { get => column_01; set => column_01 = value; }
}
static void Main(string[] args)
{
List<CsvColumns> csvOutput = new List<CsvColumns>();
CsvColumns rows = new CsvColumns();
rows.Column_01 = "data1";
csvOutput.Add(rows);
string filename = "test.csv";
using (StreamWriter writer = File.CreateText(filename))
{
CsvWriter csv = new CsvWriter(writer);
csv.WriteRecords(csvOutput);
}
}
}
}
This might not be answering your question directly as you said you wanted to use csvhelper, but if you're only writing small size files (this is a simple function that I use to generate csv. Note, csvhelper will be much better for larger files as this is just building a string and not streaming the data.
Just customise the columns array in the code below variable to suit your needs.
public string GetCsv(string[] columns, List<object[]> data)
{
StringBuilder CsvData = new StringBuilder();
//add column headers
string[] s = new string[columns.Length];
for (Int32 j = 0; j < columns.Length; j++)
{
s[j] = columns[j];
if (s[j].Contains("\"")) //replace " with ""
s[j].Replace("\"", "\"\"");
if (s[j].Contains("\"") || s[j].Contains(" ")) //add "'s around any string with space or "
s[j] = "\"" + s[j] + "\"";
}
CsvData.AppendLine(string.Join(",", s));
//add rows
foreach (var row in data)
{
for (int j = 0; j < columns.Length; j++)
{
s[j] = row[j] == null ? "" : row[j].ToString();
if (s[j].Contains("\"")) //replace " with ""
s[j].Replace("\"", "\"\"");
if (s[j].Contains("\"") || s[j].Contains(" ")) //add "'s around any string with space or "
s[j] = "\"" + s[j] + "\"";
}
CsvData.AppendLine(string.Join(",", s));
}
return CsvData.ToString();
}
Here is a fiddle example of how to use it: https://dotnetfiddle.net/2WHf6o
Good luck.

Copying CSV file while reordering/adding empty columns

Copying CSV file while reordering/adding empty columns.
For example if ever line of incoming file has values for 3 out of 10 columns in order different from output like (except first which is header with column names):
col2,col6,col4 // first line - column names
2, 5, 8 // subsequent lines - values for 3 columns
and output expected to have
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
then output should be "" for col0,col1,col3,col5,col7,col8,col9,and values from col2,col4,col4 in the input file. So for the shown second line (2,5,8) expected output is ",,2,,5,,8,,,,,"
Below code I've tried and it is slower than I want.
I have two lists.
The first list filecolumnnames is created by splitting a delimited string (line) and this list gets recreated for every line in the file.
The second list list has the order in which the first list needs to be rearranged and re concatenated.
This works
string fileName = "F:\\temp.csv";
//file data has first row col3,col2,col1,col0;
//second row: 4,3,2,1
//so on
string fileName_recreated = "F:\\temp_1.csv";
int count = 0;
const Int32 BufferSize = 1028;
using (var fileStream = File.OpenRead(fileName))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
List<int> list = new List<int>();
string orderedcolumns = "\"\"";
string tableheader = "col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10";
List<string> tablecolumnnames = new List<string>();
List<string> filecolumnnames = new List<string>();
while ((line = streamReader.ReadLine()) != null)
{
count = count + 1;
StringBuilder sb = new StringBuilder("");
tablecolumnnames = tableheader.Split(',').ToList();
if (count == 1)
{
string fileheader = line;
//fileheader=""col2,col1,col0"
filecolumnnames = fileheader.Split(',').ToList();
foreach (string col in tablecolumnnames)
{
int index = filecolumnnames.IndexOf(col);
if (index == -1)
{
sb.Append(",");
// orderedcolumns=orderedcolumns+"+\",\"";
list.Add(-1);
}
else
{
sb.Append(filecolumnnames[index] + ",");
//orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\"";
list.Add(index);
}
// MessageBox.Show(orderedcolumns);
}
}
else
{
filecolumnnames = line.Split(',').ToList();
foreach (int items in list)
{
//MessageBox.Show(items.ToString());
if (items == -1)
{
sb.Append(",");
}
else
{
sb.Append(filecolumnnames[items] + ",");
}
}
//expected format sb.Append(filecolumnnames[3] + "," + filecolumnnames[2] + "," + filecolumnnames[2] + ",");
//sb.Append(orderedcolumns);
var result = String.Join (", ", list.Select(index => filecolumnnames[index]));
}
using (FileStream fs = new FileStream(fileName_recreated, FileMode.Append, FileAccess.Write))
using (StreamWriter sw = new StreamWriter(fs))
{
sw.WriteLine(sb.ToString());
}
}
I am trying to make it faster by constructing a string orderedcolumns and remove the second for each loop which happens for every row and replace it with constructed string.
so if you uncomment the orderedcolumns string construction orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\""; and uncomment the append sb.Append(orderedcolumns); I am expecting the value inside the constructed string but when I append the orderedcolumns it is appending the text i.e.
""+","+filecolumnnames[3]+","+filecolumnnames[2]+","+filecolumnnames[1]+","+filecolumnnames[0]+","+","+","+","+","+","+","
i.e. I instead want it to take the value inside the filecolumnnames[3] list and not the filecolumnnames[3] name itself.
Expected value: if that line has 1,2,3,4
I want the output to be 4,3,2,1 as filecolumnnames[3] will have 4, filecolumnnames[2] will have 3..
String.Join is the way to construct comma/space delimited strings from sequence.
var result = String.Join (", ", list.Select(index => filecolumnnames[index]);
Since you are reading only subset of columns and orders in input and output don't match I'd use dictionary to hold each row of input.
var row = tablecolumnnames
.Zip(line.Split(','), (Name,Value)=> new {Name,Value})
.ToDictionary(x => x.Name, x.Value);
For output I'd fill sequence from defaults or input row:
var outputLine = String.Join(",",
filecolumnnames
.Select(name => row.ContainsKey(name) ? row[name] : ""));
Note code is typed in and not compiled.
orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\""; "
should be
orderedcolumns = orderedcolumns+ filecolumnnames[index] + ",";
you should however use join as others have pointed out. Or
orderedcolumns.AppendFormat("{0},", filecolumnnames[index]);
you will have to deal with the extra ',' on the end

How to create a generic text file parser for any find of text file?

Want to create a generic text file parser in c# for any find of text file.Actually i have 4 application all 4 getting input data from txt file format but text files are not homogeneous in nature.i have tried fixedwithdelemition.
private static DataTable FixedWidthDiliminatedTxtRead()
{
string[] fields;
StringBuilder sb = new StringBuilder();
List<StringBuilder> lst = new List<StringBuilder>();
DataTable dtable = new DataTable();
ArrayList aList;
using (TextFieldParser tfp = new TextFieldParser(testOCC))
{
tfp.TextFieldType = FieldType.FixedWidth;
tfp.SetFieldWidths(new int[12] { 2,25,8,12,13,5,6,3,10,11,10,24 });
for (int col = 1; col < 13; ++col)
dtable.Columns.Add("COL" + col);
while (!tfp.EndOfData)
{
fields = tfp.ReadFields();
aList = new ArrayList();
for (int i = 0; i < fields.Length; ++i)
aList.Add(fields[i] as string);
if (dtable.Columns.Count == aList.Count)
dtable.Rows.Add(aList.ToArray());
}
}
return dtable;
}
but i feel its very rigid one and really varies application to application making it configgurable .any better way ..
tfp.SetFieldWidths(new int[12] { 2,25,8,12,13,5,6,3,10,11,10,24 });
File nature :
Its a report kind of file .
position of columns are very similar
row data of file id different .
I get this as a reference
http://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F
any other thoughts ?
If the only thing different is the field widths, you could just try sending the field widths in as a parameter:
private static DataTable FixedWidthDiliminatedTxtRead(int[] fieldWidthArray)
{
string[] fields;
StringBuilder sb = new StringBuilder();
List<StringBuilder> lst = new List<StringBuilder>();
DataTable dtable = new DataTable();
ArrayList aList;
using (TextFieldParser tfp = new TextFieldParser(testOCC))
{
tfp.TextFieldType = FieldType.FixedWidth;
tfp.SetFieldWidths(fieldWidthArray);
for (int col = 1; col < 13; ++col)
dtable.Columns.Add("COL" + col);
while (!tfp.EndOfData)
{
fields = tfp.ReadFields();
aList = new ArrayList();
for (int i = 0; i < fields.Length; ++i)
aList.Add(fields[i] as string);
if (dtable.Columns.Count == aList.Count)
dtable.Rows.Add(aList.ToArray());
}
}
return dtable;
}
If you will have more logic to grab the data, you might want to consider defining an interface or abstract class for a GenericTextParser and create concrete implementations for each other file.
Hey I made one of these last week.
I did not write it with the intentions of other people using it so I appologize in advance if its not documented well but I cleaned it up for you. ALSO I grabbed several segments of code from stack overflow so I am not the original author of several pieces of this.
The places you need to edit are the path and pathout and the seperators of text.
char[] delimiters = new char[]
So it searches for part of a word and then grabs the whole word. I used a c# console application for this.
Here you go:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace UniqueListofStringFinder
{
class Program
{
static void Main(string[] args)
{
string path = #"c:\Your Path\in.txt";
string pathOut = #"c:\Your Path\out.txt";
string data = "!";
Console.WriteLine("Current Path In is set to: " + path);
Console.WriteLine("Current Path Out is set to: " + pathOut);
Console.WriteLine(Environment.NewLine + Environment.NewLine + "Input String to Search For:");
Console.Read();
string input = Console.ReadLine();
// Delete the file if it exists.
if (!File.Exists(path))
{
// Create the file.
using (FileStream fs = File.Create(path))
{
Byte[] info =
new UTF8Encoding(true).GetBytes("This is some text in the file.");
// Add some information to the file.
fs.Write(info, 0, info.Length);
}
}
System.IO.StreamReader file = new System.IO.StreamReader(path);
List<string> Spec = new List<string>();
using (StreamReader sr = File.OpenText(path))
{
while (!file.EndOfStream)
{
string s = file.ReadLine();
if (s.Contains(input))
{
char[] delimiters = new char[] { '\r', '\n', '\t', ')', '(', ',', '=', '"', '\'', '<', '>', '$', ' ', '#', '[', ']' };
string[] parts = s.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
foreach (string word in parts)
{
if (word.Contains(input))
{
if( word.IndexOf(input) == 0)
{
Spec.Add(word);
}
}
}
}
}
Spec.Sort();
// Open the stream and read it back.
//while ((s = sr.ReadLine()) != null)
//{
// Console.WriteLine(s);
//}
}
Console.WriteLine();
StringBuilder builder = new StringBuilder();
foreach (string s in Spec) // Loop through all strings
{
builder.Append(s).Append(Environment.NewLine); // Append string to StringBuilder
}
string result = builder.ToString(); // Get string from StringBuilder
Program a = new Program();
data = a.uniqueness(result);
int i = a.writeFile(data,pathOut);
}
public string uniqueness(string rawData )
{
if (rawData == "")
{
return "Empty Data Set";
}
List<string> dataVar = new List<string>();
List<string> holdData = new List<string>();
bool testBool = false;
using (StringReader reader = new StringReader(rawData))
{
string line;
while ((line = reader.ReadLine()) != null)
{
foreach (string s in holdData)
{
if (line == s)
{
testBool = true;
}
}
if (testBool == false)
{
holdData.Add(line);
}
testBool = false;
// Do something with the line
}
}
int i = 0;
string dataOut = "";
foreach (string s in holdData)
{
dataOut += s + "\r\n";
i++;
}
// Write the string to a file.
return dataOut;
}
public int writeFile(string dataOut, string pathOut)
{
try
{
System.IO.StreamWriter file = new System.IO.StreamWriter(pathOut);
file.WriteLine(dataOut);
file.Close();
}
catch (Exception ex)
{
dataOut += ex.ToString();
return 1;
}
return 0;
}
}
}
private static DataTable FixedWidthTxtRead(string filename, int[] fieldWidths)
{
string[] fields;
DataTable dtable = new DataTable();
ArrayList aList;
using (TextFieldParser tfp = new TextFieldParser(filename))
{
tfp.TextFieldType = FieldType.FixedWidth;
tfp.SetFieldWidths(fieldWidths);
for (int col = 1; col <= fieldWidths.length; ++col)
dtable.Columns.Add("COL" + col);
while (!tfp.EndOfData)
{
fields = tfp.ReadFields();
aList = new ArrayList();
for (int i = 0; i < fields.Length; ++i)
aList.Add(fields[i] as string);
if (dtable.Columns.Count == aList.Count) dtable.Rows.Add(aList.ToArray());
}
}
return dtable;
}
Here's what I did:
I built a factory for the type of processor needed (based on file type/format), which abstracted the file reader.
I then built a collection object that contained a set of triggers for each field I was interested in (also contained the property name for which this field is destined). This settings collection is loaded in via an XML configuration file, so all I need to change are the settings, and the base parsing process can react to how the settings are configured. Finally I built a reflection wrapper wherein once a field is parsed, the corresponding property on the model object is set.
As the file flowed through, the triggers for each setting evaluated each lines value. When it found what it was set to find (via pattern matching, or column length values) it fired and event that bubbled up and set a property on the model object. I can show some pseudo code if you're interested. It needs some work for efficiency's sake, but I like the concept.

Categories