How to make the custom parser for text file

How to make the custom parser for text file - c#

Actually I set four columns using data table and I want this column retrieve value from text file. I used regex for remove the particular line from the text file.
My objective is that I want to show text file on the grid using data table so first I am trying to create data table and remove the line (show at the program) using regex.
Here I post my full code.
namespace class
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
StreamReader sreader = File.OpenText(#"C:\FareSearchRegex.txt");
string line;
DataTable dt = new DataTable();
DataRow dr;
dt.Columns.Add("PTC");
dt.Columns.Add("CUR");
dt.Columns.Add("TAX");
dt.Columns.Add("FARE BASIS");
while ((line = sreader.ReadLine()) != null)
{
var pattern = "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------";
var result = Regex.Replace(line,pattern," ");
dt.Rows.Add(line);
}
}
}
class Class1
{
string PTC;
string CUR;
float TAX;
public string gsPTC
{
get{ return PTC; }
set{ PTC = value; }
}
public string gsCUR
{
get{ return CUR; }
set{ CUR = value; }
}
public float gsTAX
{
get{ return TAX; }
set{ TAX = value; }
}
}
}

If your format is strict(e.g. always 4 columns) and you want to remove only this complete line i don't see any reason to use regex:
var rows = File.ReadLines(#"C:\FareSearchRegex.txt")
.Where(l => l != "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------")
.Select(l => new { line = l, items = l.Split(','), row = dt.Rows.Add() });
foreach (var x in rows)
x.row.ItemArray = x.items;
(assumed that the fields are separated by comma)
Edit: This works with your pastebin:
string header = " PTC CUR TAX FARE BASIS";
bool takeNextLine = false;
foreach (string line in File.ReadLines(#"C:\FareSearchRegex.txt"))
{
if (line.StartsWith(header))
takeNextLine = true;
else if (takeNextLine)
{
var tokens = line.Split(new[] { #" " }, StringSplitOptions.RemoveEmptyEntries);
dt.Rows.Add().ItemArray = tokens.Where((t, i) => i != 2).ToArray();
takeNextLine = false;
}
}
(since you have an empty column which you want to exclude from the result i've used the clumsy and possibly error-prone(?) query Where((t, i) => i != 2))

To parse the file you'll need to:
Split the text of the file into data chunks. A chunk, in your case can be identified by the header PTC CUR TAX FARE BASIS and by the TOTAL line. To split the text you'll need to tokenize the input as follows> (i) define a regular expression to match the headers, (ii) define a regular expression to match the Total lines (footers); Using (i) and (ii) you can join them by the order of appearance index and determine the total size of each chunk (see the line with (x,y)=>new{StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length}) below). Use String.Substring method to separate the chunks.
Extract the data from each individual chunk. Knowing that data is split by lines you just have to iterate through all lines in a chunk (ignoring header and footer) and process each line.
This code should help:
string file = #"C:\FareSearchRegex.txt";
string text = File.ReadAllText(file);
var headerRegex = new Regex(#"^(\)>)?\s+PTC\s+CUR\s+TAX\s+FARE BASIS$", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var totalRegex = new Regex(#"^\s+TOTAL[\w\s.]+?$",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var lineRegex = new Regex(#"^(?<Num>\d+)?\s+(?<PTC>[A-Z]+)\s+\d+\s(?<Cur>[A-Z]{3})\s+[\d.]+\s+(?<Tax>[\d.]+)",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var dataIndices =
headerRegex.Matches(text).Cast<Match>()
.Select((m, index) => new{ Index = index, Match = m })
.Join(totalRegex.Matches(text).Cast<Match>().Select((m, index) => new{ Index = index, Match = m }),
x => x.Index,
x => x.Index,
(x, y) => new{ StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length });
var items = dataIndices
.Aggregate(new List<string>(), (list, x) =>
{
var item = text.Substring(x.StartIndex, x.EndIndex - x.StartIndex);
list.Add(item);
return list;
});
var result = items.SelectMany(x =>
{
var lines = x.Split(new string[]{Environment.NewLine, "\r", "\n"}, StringSplitOptions.RemoveEmptyEntries);
return lines.Skip(1) //Skip header
.Take(lines.Length - 2) // Ignore footer
.Select(line =>
{
var match = lineRegex.Match(line);
return new
{
Ptc = match.Groups["PTC"].Value,
Cur = match.Groups["Cur"].Value,
Tax = Convert.ToDouble(match.Groups["Tax"].Value)
};
});
});

Related

How to Split and Sum Members of a String Value

I have a database column that is a text field, and this text field contains values that look like
I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109
and can vary sometimes to look like:
I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300
Where 'I' represents the invoice Id, 'A' the invoice amount, 'D' the date in YYYYMMDD format and 'F' the original foreign currency value if the invoice was from a foreign supplier.
I am fetching that column and binding it to a datagrid which has a button labelled "Show Amount". On button click, it fetches the selected row and splits the string to extract "A"
I need to fetch all the sections with A= within the column result... i.e
A=97920
A=77360
A=43975
Then sum them all together and display the result on a label.
I have tried splitting using '|' first, extracting the substring 'A=' then splitting it using ';' to get the amount after "=".
string cAlloc;
string[] amount;
string InvoiceTotal;
string SupplierAmount;
string BalanceUnpaid;
DataRowView dv = invoicesDataGrid.SelectedItem as DataRowView;
if (dv != null)
{
cAlloc = dv.Row.ItemArray[7].ToString();
InvoiceTotal = dv.Row.ItemArray[6].ToString();
if (invoicesDataGrid.Columns[3].ToString() == "0")
{
lblAmount.Foreground = Brushes.Red;
lblAmount.Content = "No Amount Has Been Paid Out to the Supplier";
}
else
{
amount = cAlloc.Split('|');
foreach (string i in amount)
{
string toBeSearched = "A=";
string code = i.Substring(i.IndexOf(toBeSearched) + toBeSearched.Length);
string[] res = code.Split(';');
SupplierAmount = res[0];
float InvTotIncl = float.Parse(InvoiceTotal, CultureInfo.InvariantCulture.NumberFormat);
float AmountPaid = float.Parse(SupplierAmount, CultureInfo.InvariantCulture.NumberFormat);
float BalUnpaid = InvTotIncl - AmountPaid;
BalanceUnpaid = Convert.ToString(BalUnpaid);
if (BalUnpaid == 0)
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " No Balance Remaining, Supplier Invoice Paid in Full";
}
else if (BalUnpaid < 0)
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " Supplier Paid an Excess of " + BalanceUnpaid;
}
else
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " You Still Owe the Supplier a Total of " + BalanceUnpaid; ;
}
}
}
But I am only able to extract A=43975, the very last "A=". Instead of all three, plus I have not figured out how to sum the strings. Somebody help... please.

Regex is prefered solution. Alternatively split, split and split.
var cAlloc = "I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300";
var amount = cAlloc.Split('|');
decimal sum = 0;
foreach (string i in amount)
{
foreach (var t in i.Split(';'))
{
var p = t.Split('=');
if (p[0] == "A")
{
var s = decimal.Parse(p[1], CultureInfo.InvariantCulture);
sum += s;
break;
}
}
}

var in1 = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var in2 = "I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300";
var reg = #"A=(\d+(\.\d+)?)";
Regex.Matches(in1, reg).OfType<Match>().Sum(m => double.Parse(m.Groups[1].Value));
Regex.Matches(in2, reg).OfType<Match>().Sum(m => double.Parse(m.Groups[1].Value));
You're doing too much work for something like this. Here's a simpler solution using Regex.

If the invoice amount is always located as a second value in the set you can access it directly by index after split:
var str = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var invoices = str.Trim().Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
var totalSum = 0M;
foreach (var invoice in invoices)
{
var invoiceParts = invoice.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var invoiceAmount = decimal.Parse(invoiceParts[1].Trim().Substring(2));
totalSum += invoiceAmount;
}
Otherwise, you can use a little more "flexible" solution like this:
var str = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var invoices = str.Trim().Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
var totalSum = 0M;
foreach (var invoice in invoices)
{
var invoiceParts = invoice.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var invoiceAmount = decimal.Parse(invoiceParts.First(ip => ip.Trim().ToLower().StartsWith("a=")).Substring(2));
totalSum += invoiceAmount;
}

Import the input: "Deserialisation"
With the following given input, we have a list of object with property name I,A, and D.
var input = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
Give this simple class:
public class inputClass
{
public decimal I { get; set; }
public decimal A { get; set; }
public decimal D { get; set; }
}
Parsing it will look like:
var inputItems =
input.Split('|')
.Select(
x =>
x.Split(';')
.ToDictionary(
y => y.Split('=')[0],
y => y.Split('=')[1]
)
)
.Select(
x => //Manual parsing from dictionary to inputClass.
//If dictionary Key match an object property we could use something more generik.
new inputClass
{
I = decimal.Parse(x["I"], CultureInfo.InvariantCulture.NumberFormat),
A = decimal.Parse(x["A"], CultureInfo.InvariantCulture.NumberFormat),
D = decimal.Parse(x["D"], CultureInfo.InvariantCulture.NumberFormat),
}
)
.ToList();
It look complexe? lets give the inputClass the responsability to initialise it self based on string
PropertyName=Value[; PropertyName=Value] :
public inputClass(string input, NumberFormatInfo numberFormat)
{
var dict = input
.Split(';')
.ToDictionary(
y => y.Split('=')[0],
y => y.Split('=')[1]
);
I = decimal.Parse(dict["I"], numberFormat);
A = decimal.Parse(dict["A"], numberFormat);
D = decimal.Parse(dict["D"], numberFormat);
}
Then the parsing is simple:
var inputItems = input.Split('|').Select(x => new inputClass(x, CultureInfo.InvariantCulture.NumberFormat));
Once we have a more useable Structure a List of object We can easly compute Sum, Avg, Max, Min:
var sumA = inputItems.Sum(x => x.A);
Producing the output: "Serialisation"
In order to process the input we will define an object like similar to the Input
public class outputClass
{
public decimal I { get; set; }
public decimal A { get; set; }
public decimal D { get; set; }
public decimal F { get; set; }
The Class should be able to produce the String PropertyName=Value[; PropertyName=Value], :
public override string ToString()
{
return $"I={I};A={A};D={D};F={F}";
}
Then producing and string "serialisation" after computing the ListOutput based on the List input:
//process The input into the output.
var outputItems = new List<outputClass>();
foreach (var item in inputItems)
{
// compute things to be able to create the nex output item
item.A++;
outputItems.Add(
new outputClass { A = item.A, D = item.D, I = item.I, F = 42 }
);
}
// "Serialisation"
var outputString = String.Join("|", outputItems);
Online Demo. https://dotnetfiddle.net/VcEQmf
Long story short:
Define a class with the property you will use/display.
Add a constructor that take a string like "I=5212;A=97920;D=20181121"
nb: the String may contain property that will not be map to the object
Override the ToString(), so It can easly produce it's serialisation.
nb: Property and value that are not stored in the object will not be in the serialisation result.
Now You simply have to split on your line/object separator "|" and you are ready to go using real object, not having to care about that weird string anymore.
PS:
There was a little missunderstand about your 2 type of inputs, I mentally saw them as input, output. Dont mind those name. It can be the same class. It doens't change anything in this answer.

Difficulty using Orderby in Foreach loop C#

I have been trying to order my files by their substring at the end of their names which happens to end with a number that indicates their position relative to the rest of the files. (example: fs-1632_1.txt --> fs-1632_2.txt).
I am currently able to get the numbers and turn them into ints I just have problems getting the OrderBy Method to work correctly. I am mostly working off of this example of Orderby.
internal class Data
{
public string Name { get; set; }
public double Number { get; set; }
}
private void OrderByEx1(List<FileInfo> files)
{
int num = 0;
int index_num = 0;
string file_num = "";
string file_name = "";
foreach (FileInfo in files)
{
file_name = file.FullName;
file_name = Path.GetFileNameWithoutExtension(file_name);
index_num = file_name.LastIndexOf("_") + 1;
file_num = file_name.Substring(index_num);
num = Int32.Parse(file_num);
Data[] set = {new Data {Name = file_name, Number = num }};
}
IEnumerable<Data> query = set.OrderBy(data => data.Number);
foreach (Data file_s in query)
MessageBox.Show($"{file_s.Name} {file_s.Number}");
}

No need for the foreach-loop. You could use this safe LINQ approach:
files = files
.Select(f => new { File = f, Name = Path.GetFileNameWithoutExtension(f.Name) })
.Select(x => new
{
x.File,
x.Name,
Token = x.Name.Substring(x.Name.LastIndexOf("_", StringComparison.Ordinal) + 1)
})
.Select(x => new
{
x.File,
x.Name,
x.Token,
IsInt = int.TryParse(x.Token, out int number),
ParsedNumber = number
})
.OrderByDescending(x => x.IsInt)
.ThenBy(x => x.ParsedNumber)
.Select(x => x.File)
.ToList();
If there is no number or it can't be parsed to int the file will be listed at the bottom.

You are declaring a Data array named set, add a single element to it and then restart the loop forgetting what you have loaded in the previous loop. The order is executed only when you exit the loop, but at that point the set array contains a single element, the last one.
You need to add your Data structure to a list and then order that list
List<Data> dataFiles = new List<Data>();
foreach (FileInfo file in files)
{
file_name = file.FullName;
file_name = Path.GetFileNameWithoutExtension(file_name);
index_num = file_name.LastIndexOf("_") + 1;
file_num = file_name.Substring(index_num);
num = Int32.Parse(file_num);
dataFiles.Add(new Data {Name = file_name, Number = num });
}
// If you don't need the query var you can just order directly in the for loop
// IEnumerable<Data> query = dataFiles.OrderBy(data => data.Number);
foreach (Data file_s in dataFiles.OrderBy(data => data.Number))
{
MessageBox.Show(file_s.Name + " " + file_s.Number);
}

Group multiple rows containing index and create list of custom objects for each index

I have got a List of strings (read from a file) in this order and format and need to convert into List of class.
1.0.1.0.1, Type: DateTime, Value: 06/03/2013 11:06:10
1.0.1.0.2, Type: DateTime, Value: 06/03/2014 11:06:10
1.0.1.0.3, Type: DateTime, Value: 06/03/2015 11:06:10
1.0.1.0.4, Type: DateTime, Value: 06/03/2016 11:06:10
1.0.1.0.5, Type: DateTime, Value: 06/03/2017 11:06:10
1.0.1.1.1, Type: Integer, Value: 1
1.0.1.1.2, Type: Integer, Value: 2
1.0.1.1.3, Type: Integer, Value: 3
1.0.0.1.4, Type: Integer, Value: 4
1.0.1.1.5, Type: Integer, Value: 5
1.0.1.2.1, Type: String, Value: Hello
1.0.1.2.2, Type: String, Value: Hello1
1.0.1.2.3, Type: String, Value: Hello2
1.0.1.2.4, Type: String, Value: Hello3
1.0.1.2.5, Type: String, Value: Hello4
Here is my class
public class MyData
{
public DateTime DateTime {get;set;}
public int Index {get;set;}
public string Value {get;set;}
}
Now What I wanted is to convert it into a list of C# class
Something like this...
List<MyData> myDataList = new List<MyData>();
MyData data1 = new MyData();
data1.DateTime = "06/03/2013 11:06:10";
data1.Index = 1;
data1.Value = "Hello";
myDataList.Add(data1);
MyData data2 = new MyData();
data2.DateTime = "06/03/2014 11:06:10";
data2.Index = 2;
data2.Value = "Hello1";
myDataList.Add(data2);
and so on..
This is what I have tried so far.
List<List<string>> allLists = lines
.Select(str => new { str, token = str.Split('.') })
.Where(x => x.token.Length >= 4)
.GroupBy(x => string.Concat(x.token.Take(4)))
.Select(g => g.Select(x => x.str).ToList())
.ToList();
Do I really need to iterate or can I modify My LINQ to get me desired output ?
Here is my iteration.
foreach (var list in allLists)
{
MyData data = new MyData();
var splittedstring = list[0].Split(',').ToList();
if (splittedstring.Count == 3)
{
var valueData = splittedstring [2];
var indexof = valueData.IndexOf(':');
var value = valueData.Substring(indexof + 1);
// But Over here, how will get DateTime and Index ?
data.Value = value;
}
}

First, fix your GroupBy: string.Concat(x.token.Take(4)) may create uncertainties when dot-separated numbers are ambiguous. For example, 1.23.4.5 and 12.3.4.5 would both produce "12345" string. Use string.Join with some non-numeric separator instead:
.GroupBy(x => string.Join("|", x.token.Take(4)))
Now for the main part of your question an easy fix would be to add a static method that parses the list of three strings, and use it in your LINQ query:
List<MyData> dataList = lines
.Select(str => new { str, token = str.Split('.') })
.Where(x => x.token.Length >= 4)
.GroupBy(x => string.Concat(x.token.Take(4)))
.Select(g => g.Select(x => x.str).ToList())
.Where(list => list.Count == 3)
.Select(MyDataFromList)
.ToList();
...
private static MyData MyDataFromList(List<string> parts) {
if (parts.Count != 3) {
throw new ArgumentException(nameof(parts));
}
var byType = parts
.Select(ToTypeAndValue)
.ToDictionary(t => t.Item1, t => t.Item2)
return new MyData {
DateTime = DateTime.Parse(byType["DateTime"])
, Index = int.Parse(byType["Integer"])
, Value = byType["String"]
};
}
private static Tuple<string,string> ToTypeAndValue(string s) {
var tokens = s.Split(',');
if (tokens.Length != 3) return null;
var typeParts = tokens[1].Split(':');
if (typeParts.Length != 2 || typeParts[0] != "Type") return null;
var valueParts = tokens[2].Split(':');
if (valueParts.Length != 2 || valueParts[0] != "Value") return null;
return Tuple.Create(typeParts[1].Trim(), typeParts[2].Trim());
}
Note that the above code makes an assumption that the three types are unique (hence the use of Dictionary<string,string>). This is required, because the structure of your data provides no other way to tie the values to fields of MyData.

You can do this using regular expressions. It would look like:
public List<MyData> GetData(string str){
var regexDate = new Regex(#"\d\.\d\.\d\.\d\.(?<id>\d).*DateTime.*Value:\s*(?<val>.*)");
var regexInteger = new Regex(#"\d\.\d\.\d\.\d\.(?<id>\d).*Integer.*Value:\s*(?<val>.*)");
var regexString = new Regex(#"\d\.\d\.\d\.\d\.(?<id>\d).*String.*Value:\s*(?<val>.*)");
var dict = new Dictionary<int, MyData>();
foreach (Match myMatch in regexDate.Matches(str))
{
if (!myMatch.Success) continue;
var index = int.Parse(myMatch.Groups["id"].Value);
dict[index] = new MyData()
{
Index = index,
DateTime = DateTime.ParseExact(myMatch.Groups["val"].Value, "dd/MM/yyyy HH:mm:ss", CultureInfo.InvariantCulture)
};
}
foreach (Match myMatch in regexInteger.Matches(str))
{
if (!myMatch.Success) continue;
var index = int.Parse(myMatch.Groups["id"].Value);
dict[index].Index = Int32.Parse(myMatch.Groups["val"].Value);
}
foreach (Match myMatch in regexString.Matches(str))
{
if (!myMatch.Success) continue;
var index = int.Parse(myMatch.Groups["id"].Value);
dict[index].Value = myMatch.Groups["val"].Value;
}
return dict.Values
}

Here is my solution to your problem. I have already tested it, you can test it to here: Raw To Custom List
string text = rawData;
//Raw Data Is the exact data you read from textfile without modifications.
List<MyData> myDataList = new List<MyData>();
string[] eElco = text.Split( new[] { Environment.NewLine }, StringSplitOptions.None );
var tmem = eElco.Count();
var eachP = tmem / 3;
List<string> unDefVal = new List<string>();
foreach (string rw in eElco)
{
String onlyVal = rw.Split(new[] { "Value: " } , StringSplitOptions.None)[1];
unDefVal.Add(onlyVal);
}
for (int i = 0; i < eachP; i++)
{
int ind = Int32.Parse(unDefVal[i + eachP]);
DateTime oDate = DateTime.ParseExact(unDefVal[i], "dd/MM/yyyy hh:mm:ss",System.Globalization.CultureInfo.InvariantCulture);
MyData data1 = new MyData();
data1.DateTime = oDate;
data1.Index = ind;
data1.Value = unDefVal[i + eachP + eachP];
myDataList.Add(data1);
Console.WriteLine("Val1 = {0}, Val2 = {1}, Val3 = {2}",
myDataList[i].Index,
myDataList[i].DateTime,
myDataList[i].Value);
}

Here is my solution, using Regex. It could be improved by providing a conditional regex match based on the matched type named group(string), but I think the concept is clearer this way, and the regex easier to work with. As it stands, the date format is not validated to be as OP wrote them, they are assumed to be as OP wrote them.
This solution is tolerant to some extra spaces and parameters containing commas, but intolerant to inexact matches, i.e. extra fields added or removed in the rows in the future, etc.
The idea is to first parse the rows to a more "friendly" format, and then group the friendly format by index and return the MyData rows by iterating each group (by index).
Regex r = new Regex(#"^(?<fieldName>(\d\.)+(?<index>\d*)), *Type: *(?<dataType>.*), *Value: (?<dataValue>.*)$");
public class MyData
{
public DateTime DateTime { get; set; }
public int Index { get; set; }
public string Value { get; set; }
}
class LogRow
{
public int Index { get; set; }
public string Type { get; set; }
public string Value { get; set; }
}
//In a parser I would rather not be too defensive, I let exceptions bubble up
IEnumerable<LogRow> ParseRows(IEnumerable<string> lines)
{
foreach (var line in lines)
{
var match = r.Matches(line).AsQueryable().Cast<Match>().Single();
yield return new LogRow()
{
Index = int.Parse(match.Groups["index"].Value),
Type = match.Groups["dataType"].Value,
Value = match.Groups["dataValue"].Value
};
}
}
IEnumerable<MyData> RowsToData(IEnumerable<LogRow> rows)
{
var byIndex = rows.GroupBy(b => b.Index).OrderBy(b=> b.Key);
//assume that rows exist for all MyData fields for a given index
foreach (var group in byIndex)
{
var rawRow = group.ToDictionary(g => g.Type, g => g);
var date = DateTime.ParseExact(rawRow["DateTime"].Value, "dd/MM/yyyy HH:mm:ss", CultureInfo.InvariantCulture);
yield return new MyData() { Index = group.Key, DateTime = date, Value = rawRow["String"].Value };
}
}
Usage:
var myDataList = RowsToData(ParseRows(File.ReadAllLines("input.txt"))).ToList();

I'd just go for the manual approach... and since that list of integers at the start contains indices for the objects and for the properties, it'd only be logical to use these instead of the type strings.
Using a Dictionary, you can use that object-index to make a new object at the moment you find any of its properties, and store it using that index. And whenever you encounter another properties for the same index, you retrieve the object and fill in that property on it.
public static List<MyData> getObj(String[] lines)
{
Dictionary<Int32, MyData> myDataDict = new Dictionary<Int32, MyData>();
const String valueStart = "Value: ";
foreach (String line in lines)
{
String[] split = line.Split(',');
// Too many fail cases; I just ignore any line that stops matching at any point.
if (split.Length < 3)
continue;
String[] numData = split[0].Trim().Split('.');
if (numData.Length < 5)
continue;
// Using the 4th number as property identifier. Could also use the
// type string, but switch/case on a numeric value is more elegant.
Int32 prop;
if (!Int32.TryParse(numData[3], out prop))
continue;
// Object index, used to reference the objects in the Dictionary.
Int32 index;
if (!Int32.TryParse(numData[4], out index))
continue;
String typeDef = split[1].Trim();
String val = split[2].TrimStart();
if (!val.StartsWith(valueStart))
continue;
val = val.Substring(valueStart.Length);
MyData data;
if (myDataDict.ContainsKey(index))
data = myDataDict[index];
else
{
data = new MyData();
myDataDict.Add(index, data);
}
switch (prop)
{
case 0:
if (!"Type: DateTime".Equals(typeDef))
continue;
DateTime dateVal;
// Don't know if this date format is correct; adapt as needed.
if (!DateTime.TryParseExact(val, "dd/MM/yyyy HH:mm:ss", System.Globalization.CultureInfo.InvariantCulture, System.Globalization.DateTimeStyles.None, out dateVal))
continue;
data.DateTime = dateVal;
break;
case 1:
if (!"Type: Integer".Equals(typeDef))
continue;
Int32 numVal;
if (!Int32.TryParse(val, out numVal))
continue;
data.Index = numVal;
break;
case 2:
if (!"Type: String".Equals(typeDef)) continue;
data.Value = val;
break;
}
}
return new List<MyData>(myDataDict.Values);
}

C# Use Regex to split on Words

This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.
Using the info in the example below, a string might look like:
"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"
One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.
"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"
The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.
Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?
public List<string> FeatureFilterStrings
{
// All possible feature types from the EWSD switch.
get
{
return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
}
}
public void Parse(string input){
Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };
Regex regex = new Regex(#"(?=\\bDIVIDE|DIV|CLACOS|INT)");
string[] ms = regex.Split(updatedInput);
List<string> queryLines = new List<string>();
// takes the parsed out data and assigns it to the queryLines List<string>
foreach (string m in ms)
{
queryLines.Add(m);
}
var features = queryLines.Where(queryFilter);
foreach (string feature in features)
{
foreach (Match m in Regex.Matches(workLine, valueExpression))
{
string key = m.Groups["key"].Value.Trim();
string value = String.Empty;
value = Regex.Replace(m.Groups["value"].Value.Trim(), #"s", String.Empty);
AddKeyValue(key, value);
}
}
private void AddKeyValue(string key, string value)
{
try
{
// Check if key already exists. If it does, remove the key and add the new key with updated value.
// Value information appends to what is already there so no data is lost.
if (this.ContainsKey(key))
{
this.Remove(key);
this.Add(key, value.Split('&'));
}
else
{
this.Add(key, value.Split('&'));
}
}
catch (ArgumentException)
{
// Already added to the dictionary.
}
}
}
Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.

I would create a dictionary from your input string
string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var dict = Regex.Matches(input, #"(\w+?) = (.+?)( |$)").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
Test the code:
foreach(var kv in dict)
{
Console.WriteLine(kv.Key + "=" + kv.Value);
}

This might be a simple alternative for you.
Try this code:
var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(xs => xs[0], xs => xs[1]);
I then get the following dictionary:
Based on your updated input, things get more complicated, but this works:
var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";
Func<string, char, string> tighten =
(i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));
var parts =
tighten(tighten(input, '&'), ',')
.Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts
.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(
xs => xs[0],
xs => xs
.Skip(1)
.SelectMany(x => x.Split(','))
.SelectMany(x => x.Split('&'))
.ToArray());
I get this dictionary:

C# Processing Fixed Width Files - Solution Not Working

I have implemented Cuong's solution here:
C# Processing Fixed Width Files
Here is my code:
var lines = File.ReadAllLines(#fileFull);
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
var list = new List<KeyValuePair<int, int>>();
int startIndex = 0;
for (int i = 0; i < widthList.Count(); i++)
{
var pair = new KeyValuePair<int, int>(startIndex, widthList[i]);
list.Add(pair);
startIndex += widthList[i];
}
var csvLines = lines.Select(line => string.Join(",",
list.Select(pair => line.Substring(pair.Key, pair.Value))));
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
#fileFull = File Path & Name
The issue I have is the first line of the input file also contains digits. So it could be AAAAAABBC111111111DD2EEEEEE etc. For some reason the output from Cuong's code gives me CSV headings like 1111RRRR and 222223333.
Does anyone know why this is and how I would fix it?
Header row example:
AAAAAAAAAAAAAAAABBBBBBBBBBCCCCCCCCDEFCCCCCCCCCGGGGGGGGHHHHHHHHIJJJJJJJJKKKKLLLLMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOPPPPQQQQ1111RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR222222222333333333444444444555555555666666666777777777888888888999999999S00001111TTTTTTTTTTTTUVWXYZ!"£$$$$$$%&
Converted header row:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCCDEFCCCCCC C C C GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111RRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR2222 222223333 333334444 444445555 555556666 666667777 777778888 888889999 99999S000 0 1111 TTTTTTTTTTTT U V W X Y Z ! ",ï¿½,$$$$$$,%,&,"
Jodrell - I implemented your suggestion but the header output is like:
BBBBBBBBBBCCCCCC CCCCCCCCD DEFCCCC GGGGGGGG HHHHHHH IJJJJJJ KKKKLLL LLL MMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPPQQQQ1111RRRRRRRRRRRRRRRRR QQQ 111 RRR 33333333 44444444 55555555 66666666 77777777 88888888 99999999 S0000111 111 TTT UVWXYZ!"ï¿½$$ %&

As Jodrell already mentioned, your code doesn't work because it assumed that the character representing each column header is distinct. Change the code that parse the header widths would fix it.
Replace:
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
With:
var widthList = new List<int>();
var header = lines.First().ToArray();
for (int i = 0; i < header.Length; i++)
{
if (i == 0 || header[i] != header[i-1])
widthList.Add(0);
widthList[widthList.Count-1]++;
}
Parsed header columns:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCC D E F CCCCCCCCC GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 222222222 333333333 444444444 555555555 666666666 777777777 888888888 999999999 S 0000 1111 TTTTTTTTTTTT U V W X Y Z ! " £ $$$$$$ % &

EDIT
Because the problem annoyed me I wrote some code that handles " and ,. This code replaces the header row with comma delimited alternating zeros and ones. Any commas or double quotes in the body are appropriately escaped.
static void FixedToCsv(string sourceFile)
{
if (sourceFile == null)
{
// Throw exception
}
var dir = Path.GetDirectory(sourceFile)
var destFile = string.Format(
"{0}{1}",
Path.GetFileNameWithoutExtension(sourceFile),
".csv");
if (dir != null)
{
destFile = Path.Combine(dir, destFile);
}
if (File.Exists(destFile))
{
// Throw Exception
}
var blocks = new List<KeyValuePair<int, int>>();
using (var output = File.OpenWrite(destFile))
{
using (var input = File.OpenText(sourceFile))
{
var outputLine = new StringBuilder();
// Make header
var header = input.ReadLine();
if (header == null)
{
return;
}
var even = false;
var lastc = header.First();
var counter = 0;
var blockCounter = 0;
foreach(var c in header)
{
counter++;
if (c == lastc)
{
blockCounter++;
}
else
{
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter - 1,
blockCounter));
blockCounter = 1;
outputLine.Append(',');
even = !even;
}
outputLine.Append(even ? '1' : '0');
lastc = c;
}
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter,
blockCounter));
outputLine.AppendLine();
var lineBytes = Encoding.UTF.GetBytes(outputLine.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
// Process Body
var inputLine = input.ReadLine();
while (inputLine != null)
{
foreach(var block in block.Select(b =>
inputLine.Substring(b.Key, b.Value)))
{
var sanitisedBlock = block;
if (block.Contains(',') || block.Contains('"'))
{
santitisedBlock = string.Format(
"\"{0}\"",
block.Replace("\"", "\"\""));
}
outputLine.Append(sanitisedBlock);
outputLine.Append(',');
}
outputLine.Remove(outputLine.Length - 1, 1);
outputLine.AppendLine();
lineBytes = Encoding.UTF8.GetBytes(outputLne.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
inputLine = input.ReadLine();
}
}
}
}
1 is repeated in your header row, so your two fours get counted as one eight and everything goes wrong from there.
(There is a block of four 1s after the Qs and another block of four 1s after the 0s)
Essentialy, your header row is invalid or, at least, doesen't work with the proposed solution.
Okay, you could do somthing like this.
public void FixedToCsv(string fullFile)
{
var lines = File.ReadAllLines(fullFile);
var firstLine = lines.First();
var widths = new List<KeyValuePair<int, int>>();
var innerCounter = 0;
var outerCounter = 0
var firstLineChars = firstLine.ToCharArray();
var lastChar = firstLineChars[0];
foreach(var c in firstLineChars)
{
if (c == lastChar)
{
innerCounter++;
}
else
{
widths.Add(new KeyValuePair<int, int>(
outerCounter
innerCounter);
innerCounter = 0;
lastChar = c;
}
outerCounter++;
}
var csvLines = lines.Select(line => string.Join(",",
widths.Select(pair => line.Substring(pair.Key, pair.Value))));
// Get filePath and fileName from fullFile here.
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to make the custom parser for text file - c#

Related

How to Split and Sum Members of a String Value

Difficulty using Orderby in Foreach loop C#

Group multiple rows containing index and create list of custom objects for each index

C# Use Regex to split on Words

C# Processing Fixed Width Files - Solution Not Working

Categories

Resources