Custom File Parser - c#

I am building a parser for a custom pipe delimited file format and I am finding my code to be very bulky, could someone suggest better methods of parsing this data?
The file's data is broken down by a line delimited by a pipe (|), each line starts with a record type, followed by an ID, followed by different number of columns after.
Ex:
CDI|11111|OTHERDATA|somemore|other
CEX001|123131|DATA|data
CCC|123131|DATA|data1|data2|data3|data4|data5|data6
. I am splitting by pipe, then grabbing the first two columns, and then using a switch checking the first line and calling a function that will parse the remaining into an object purpose built for that record type. I would really like a more elegant method.
public Dictionary<string, DataRecord> Parse()
{
var data = new Dictionary<string, DataRecord>();
var rawDataDict = new Dictionary<string, List<List<string>>>();
foreach (var line in File.ReadLines(_path))
{
var split = line.Split('|');
var Id = split[1];
if (!rawDataDict.ContainsKey(Id))
{
rawDataDict.Add(Id, new List<List<string>> {split.ToList()});
}
else
{
rawDataDict[Id].Add(split.ToList());
}
}
rawDataDict.ToList().ForEach(pair =>
{
var key = pair.Key.ToString();
var values = pair.Value;
foreach (var value in values)
{
var recordType = value[0];
switch (recordType)
{
case "CDI":
var cdiRecord = ParseCdi(value);
if (!data.ContainsKey(key))
{
data.Add(key, new DataRecord
{
Id = key, CdiRecords = new List<CdiRecord>() { cdiRecord }
});
}
else
{
data[key].CdiRecords.Add(cdiRecord);
}
break;
case "CEX015":
var cexRecord = ParseCex(value);
if (!data.ContainsKey(key))
{
data.Add(key, new DataRecord
{
Id = key,
CexRecords = new List<Cex015Record>() { cexRecord }
});
}
else
{
data[key].CexRecords.Add(cexRecord);
}
break;
case "CPH":
CphRecord cphRecord = ParseCph(value);
if (!data.ContainsKey(key))
{
data.Add(key, new DataRecord
{
Id = key,
CphRecords = new List<CphRecord>() { cphRecord }
});
}
else
{
data[key].CphRecords.Add(cphRecord);
}
break;
}
}
});
return data;
}

Try out FileHelper, here is your exact example - http://www.filehelpers.net/example/QuickStart/ReadFileDelimited/
Given you're data of
CDI|11111|OTHERDATA|Datas
CEX001|123131|DATA
CCC|123131
You could create a class to model this to allow FileHelpers to parse the delimited file:
[DelimitedRecord("|")]
public class Record
{
public string Type { get; set; }
public string[] Fields { get; set; }
}
Then we could allow FileHelpers to parse in to this object type:
var engine = new FileHelperEngine<Record>();
var records = engine.ReadFile("Input.txt");
After we've got all the records loaded in to Record objects we can use a bit of linq to pull them in to their given types
var cdis = records.Where(x => x.Type == "CDI")
.Select(x => new Cdi(x.Fields[0], x.Fields[1], x.Fields[2])
.ToArray();
var cexs = records.Where(x => x.Type == "CEX001")
.Select(x => new Cex(x.Fields[0], x.Fields[1)
.ToArray();
var cccs = records.Where(x => x.Type == "CCC")
.Select(x => new Ccc(x.Fields[0])
.ToArray();
You could also simplify the above using something like AutoMapper - http://automapper.org/
Alternatively you could use ConditionalRecord attributes which will only parse certain lines if they match a given criteria. This will however be slower the more record types you have but you're code will be cleaner and FileHelpers will be doing most of the heavy lifting:
[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CDI")]
public class Cdi
{
public string Type { get; set; }
public int Number { get; set; }
public string Data1 { get; set; }
public string Data2 { get; set; }
public string Data3 { get; set; }
}
[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CEX001")]
public class Cex001
{
public string Type { get; set; }
public int Number { get; set; }
public string Data1 { get; set; }
}
[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CCC")]
public class Ccc
{
public string Type { get; set; }
public int Number { get; set; }
}
var input =
#"CDI|11111|Data1|Data2|Data3
CEX001|123131|Data1
CCC|123131";
var CdiEngine = new FileHelperEngine<Cdi>();
var cdis = CdiEngine.ReadString(input);
var cexEngine = new FileHelperEngine<Cex001>();
var cexs = cexEngine.ReadString(input);
var cccEngine = new FileHelperEngine<Ccc>();
var cccs = cccEngine.ReadString(input);

Your first loop isn't really doing anything other than organizing your data differently. You should be able to eliminate it and use the data as it is from the file. Something like this should give you what you want:
foreach (var line in File.ReadLines(_path))
{
var split = line.Split('|');
var key = split[1];
var value = split;
var recordType = value[0];
switch (recordType)
{
case "CDI":
var cdiRecord = ParseCdi(value.ToList());
if (!data.ContainsKey(key))
{
data.Add(key, new DataRecord
{
Id = key, CdiRecords = new List<CdiRecord>() { cdiRecord }
});
}
else
{
data[key].CdiRecords.Add(cdiRecord);
}
break;
case "CEX015":
var cexRecord = ParseCex(value.ToList());
if (!data.ContainsKey(key))
{
data.Add(key, new DataRecord
{
Id = key,
CexRecords = new List<Cex015Record>() { cexRecord }
});
}
else
{
data[key].CexRecords.Add(cexRecord);
}
break;
case "CPH":
CphRecord cphRecord = ParseCph(value.ToList());
if (!data.ContainsKey(key))
{
data.Add(key, new DataRecord
{
Id = key,
CphRecords = new List<CphRecord>() { cphRecord }
});
}
else
{
data[key].CphRecords.Add(cphRecord);
}
break;
}
};
Caveat: This is just put together here and hasn't been properly checked for syntax.

Related

Creating dynamic range aggregations in Nest

I have a set of POCO facets with ranges, that I have received from a client. I need to convert these ultimately into an AggregationDictionary. I cannot figure out the syntax for creating a dynamic set of aggregations (possilbly of type RangeAggregationDescriptor) and need help with this.
My POCO objects are below:
public class TypedFacets
{
public string Name { get; set; }
public string Field { get; set; }
public IReadOnlyCollection<Range> RangeValues { get; set; } = new List<Range>();
public int Size { get; set; }
}
public class Range
{
public string Name { get; set; }
public double? From { get; set; }
public double? To { get; set; }
}
The Nest generation looks like below:
var facets = new List<TypedFacets>()
{
new TypedFacets()
{
Name = "potatoRange",
Field = "potatoRange",
RangeValues = new List<Range>()
{
new Range()
{
From = 0,
To = null,
Name = "chips"
},
new Range()
{
From = 1,
To = null,
Name = "crisps"
}
}
}
};
var aggregations = new AggregationContainerDescriptor<Template>();
facets.Where(f => f.RangeValues.Any()).ToList().ForEach(f =>
{
var rad = new RangeAggregationDescriptor<Template>();
f.RangeValues.ToList().ForEach(rangeValue =>
{
rad = rad.Ranges(rs => rs.From(rangeValue.From).To(rangeValue.To).Key(rangeValue.Name));
});
// this line doesn't work and needs to change
aggregations.Range(f.Name, r => r
.Field(f.Field).Ranges(rs => rad.Ranges));
});
return ((IAggregationContainer)aggregations).Aggregations;
I'm not sure how to fix the above. Any help would be appreciated.
I eventually found the solution for this. You can create the dynamic ranges as per below
private Func<AggregationRangeDescriptor, IAggregationRange>[] CreateRangeRanges(TypedFacets rangedAgg)
{
var rangeRanges = new List<Func<AggregationRangeDescriptor, IAggregationRange>>();
rangedAgg.RangeValues.ToList().ForEach(rangeValue =>
{
rangeRanges.Add(rs => rs.From(rangeValue.From).To(rangeValue.To).Key(rangeValue.Name));
});
return rangeRanges.ToArray();
}
And then assing them like below
facets.Where(f => f.RangeValues.Any()).ToList().ForEach(f =>
{
aggregations.Range(f.Name, r => r
.Field(f.Field).Ranges(CreateRangeRanges(f)));
});

Order lines from a text file by number found in a certain spot of the line

I'm sorry that I interrupt you in this manner, I'm new to C# and I've been struggling with this problem for days... Maybe it will seem easy for you :)
I have this text file in this format
name|ID|domain|grade|verdict
Ryan|502322|Computers|9,33|Undefined
Marcel|302112|Automatics|6,22|Undefined
Alex|301234|Computers|5,66|Undefined
Leo|201122|Automatics|3,22|Undefined
How can I sort the text file using any methods (including LINQ) so that the list from the text file will be ordered by domain, and then descending by the grade column? Like this:
name|ID|domain|grade|verdict
Marcel|302112|Automatics|6,22|Undefined
Leo|201122|Automatics|3,22|Undefined
Ryan|502322|Computers|9,33|Undefined
Alex|301234|Computers|5,66|Undefined
To read the file, I'm using var Students = File.ReadAllLines(#"filepath");, I don't know if it's the smartest approach, and then I write using File.WriteAllLines
Thanks in advance! Sorry once again, I know it should be easy, but for me is really tuff :(
You can use some thing like this:
var students= File.ReadAllLines(#"filepath");
var headers = lines[0];
students = lines.Skip(1).ToArray();
var orders = lines.Select(x => x.Split('|'))
.Select(x => new { Domain = x[2], Grade = int.Parse(x[3].Replace(",", "")), All = x })
.OrderBy(x => x.Domain).ThenByDescending(x => x.Grade).Select(x => string.Join("|", x.All)).ToList();
orders.Insert(0, headers);
students=orders.ToArray();
try following code:
private void ReadFile()
{
char Delimiter = '|';
string[] Lines = File.ReadAllLines(#"E:\RaftehHa.txt", Encoding.Default);
List<string[]> FileRows = Lines.Select(line =>
line.Split(new[] { Delimiter }, StringSplitOptions.RemoveEmptyEntries)).ToList();
DataTable dt = new DataTable();
dt.Columns.AddRange(FileRows[0].Select(col => new DataColumn() { ColumnName = col }).ToArray());
FileRows.RemoveAt(0);
FileRows.ForEach(row => dt.Rows.Add(row));
DataView dv = dt.DefaultView;
dv.Sort = " ID ASC ";
dt = dv.ToTable();
dataGridView1.DataSource = dt;
}
A bit the same as already mentioned above, but as you mention you are new to C#, I have tried to add a little bit of structure to the code, but leaving the completion to you.
public class Data
{
public Data(string inputLine)
{
var split = inputLine.Split('|');
Name = split[0];
Id = int.Parse(split[1]);
Domain = split[2];
Grade = double.Parse(split[3].Replace(",", "."));
Verdict = split[4];
}
public string Name { get; }
public int Id { get; }
public string Domain { get; }
public double Grade { get; }
public string Verdict { get; }
}
public class DataFile
{
public static IEnumerable Read(string fileName)
{
var input = File.ReadAllLines(fileName);
return input.Skip(1).Select(p => new Data(p)); // skip header
}
public static void Write(IEnumerable data)
{
// todo :)
}
}
void Main()
{
var input = DataFile.Read(#"C:\Temp\ExampleData.txt");
var result = input.OrderBy(p => p.Domain).ThenByDescending(p => p.Grade);
DataFile.Write(result);
}
using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
enum DomainType {
Automatics, // 0
Computers // 1
}
class Data {
public int Id { get; set; }
public string Name { get; set; }
public string Verdict { get; set; }
public DomainType Domain { get; set; }
public Tuple<int, int> Grade { get; set; }
}
public static class Program {
static IEnumerable<Data> FileContent(string path) {
string line;
using (var reader = File.OpenText(path))
{
bool skipHeader = false;
while((line = reader.ReadLine()) != null)
{
if (!skipHeader) {
skipHeader = true;
continue;
}
var fields = line.Split('|');
string name = fields[0];
int id = int.Parse(fields[1]);
var domain = (DomainType)Enum.Parse(typeof(DomainType), fields[2]);
var grade = Tuple.Create(int.Parse(fields[3].Split(',')[0]),
int.Parse(fields[3].Split(',')[1]));
string verdict = fields[4];
var data = new Data() {
Name = name, Id = id, Domain = domain, Grade = grade, Verdict = verdict };
yield return data;
}
}
}
public static void Main() {
var result = FileContent("path_to_file").OrderBy(data => data.Domain);
foreach (var line in result) {
Console.WriteLine(line.Name);
}
}
}

Best approach to compare if one list is subset of another in C#

I have the below two classes:
public class FirstInner
{
public int Id { get; set; }
public string Type { get; set; }
public string RoleId { get; set; }
}
public class SecondInner
{
public int Id { get; set; }
public string Type { get; set; }
}
Again, there are lists of those types inside the below two classes:
public class FirstOuter
{
public int Id { get; set; }
public string Name { get; set; }
public string Title { get; set; }
public List<FirstInner> Inners { get; set; }
}
public class SecondOuter
{
public int Id { get; set; }
public string Name { get; set; }
public List<SecondInner> Inners { get; set; }
}
Now, I have list of FirstOuter and SecondOuter. I need to check if FirstOuter list is a subset of SecondOuter list.
Please note:
The names of the classes cannot be changed as they are from different systems.
Some additional properties are present in FirstOuter but not in SecondOuter. When comparing subset, we can ignore their presence in SecondOuter.
No.2 is true for FirstInner and SecondInner as well.
List items can be in any order---FirstOuterList[1] could be found in SecondOuterList[3], based on Id, but inside that again need to compare that FirstOuterList[1].FirstInner[3], could be found in SecondOuterList[3].SecondInner[2], based on Id.
I tried Intersect, but that is failing as the property names are mismatching. Another solution I have is doing the crude for each iteration, which I want to avoid.
Should I convert the SecondOuter list to FirstOuter list, ignoring the additional properties?
Basically, here is a test data:
var firstInnerList = new List<FirstInner>();
firstInnerList.Add(new FirstInner
{
Id = 1,
Type = "xx",
RoleId = "5"
});
var secondInnerList = new List<SecondInner>();
secondInner.Add(new SecondInner
{
Id = 1,
Type = "xx"
});
var firstOuter = new FirstOuter
{
Id = 1,
Name = "John",
Title = "Cena",
Inners = firstInnerList
}
var secondOuter = new SecondOuter
{
Id = 1,
Name = "John",
Inners = secondInnerList,
}
var firstOuterList = new List<FirstOuter> { firstOuter };
var secondOuterList = new List<SecondOuter> { secondOuter };
Need to check if firstOuterList is part of secondOuterList (ignoring the additional properties).
So the foreach way that I have is:
foreach (var item in firstOuterList)
{
var secondItem = secondOuterList.Find(so => so.Id == item.Id);
//if secondItem is null->throw exception
if (item.Name == secondItem.Name)
{
foreach (var firstInnerItem in item.Inners)
{
var secondInnerItem = secondItem.Inners.Find(sI => sI.Id == firstInnerItem.Id);
//if secondInnerItem is null,throw exception
if (firstInnerItem.Type != secondInnerItem.Type)
{
//throw exception
}
}
}
else
{
//throw exception
}
}
//move with normal flow
Please let me know if there is any better approach.
First, do the join of firstOuterList and secondOuterList
bool isSubset = false;
var firstOuterList = new List<FirstOuter> { firstOuter };
var secondOuterList = new List<SecondOuter> { secondOuter };
var jointOuterList = firstOuterList.Join(
secondOuterList,
p => new { p.Id, p.Name },
m => new { m.Id, m.Name },
(p, m) => new { FOuterList = p, SOuterList = m }
);
if(jointOuterList.Count != firstOuterList.Count)
{
isSubset = false;
return;
}
foreach(var item in jointOuterList)
{
var jointInnerList = item.firstInnerList.Join(
item.firstInnerList,
p => new { p.Id, p.Type },
m => new { m.Id, m.type },
(p, m) => p.Id
);
if(jointInnerList.Count != item.firstInnerList.Count)
{
isSubset = false;
return;
}
}
Note: I am assuming Id is unique in its outer lists. It means there will not be multiple entries with same id in a list. If no, then we need to use group by in above query
I think to break the question down..
We have two sets of Ids, the Inners and the Outers.
We have two instances of those sets, the Firsts and the Seconds.
We want Second's inner Ids to be a subset of First's inner Ids.
We want Second's outer Ids to be a subset of First's outer Ids.
If that's the case, these are a couple of working test cases:
[TestMethod]
public void ICanSeeWhenInnerAndOuterCollectionsAreSubsets()
{
HashSet<int> firstInnerIds = new HashSet<int>(GetFirstOuterList().SelectMany(outer => outer.Inners.Select(inner => inner.Id)).Distinct());
HashSet<int> firstOuterIds = new HashSet<int>(GetFirstOuterList().Select(outer => outer.Id).Distinct());
HashSet<int> secondInnerIds = new HashSet<int>(GetSecondOuterList().SelectMany(outer => outer.Inners.Select(inner => inner.Id)).Distinct());
HashSet<int> secondOuterIds = new HashSet<int>(GetSecondOuterList().Select(outer => outer.Id).Distinct());
bool isInnerSubset = secondInnerIds.IsSubsetOf(firstInnerIds);
bool isOuterSubset = secondOuterIds.IsSubsetOf(firstOuterIds);
Assert.IsTrue(isInnerSubset);
Assert.IsTrue(isOuterSubset);
}
[TestMethod]
public void ICanSeeWhenInnerAndOuterCollectionsAreNotSubsets()
{
HashSet<int> firstInnerIds = new HashSet<int>(GetFirstOuterList().SelectMany(outer => outer.Inners.Select(inner => inner.Id)).Distinct());
HashSet<int> firstOuterIds = new HashSet<int>(GetFirstOuterList().Select(outer => outer.Id).Distinct());
HashSet<int> secondInnerIds = new HashSet<int>(GetSecondOuterList().SelectMany(outer => outer.Inners.Select(inner => inner.Id)).Distinct());
HashSet<int> secondOuterIds = new HashSet<int>(GetSecondOuterList().Select(outer => outer.Id).Distinct());
firstInnerIds.Clear();
firstInnerIds.Add(5);
firstOuterIds.Clear();
firstOuterIds.Add(5);
bool isInnerSubset = secondInnerIds.IsSubsetOf(firstInnerIds);
bool isOuterSubset = secondOuterIds.IsSubsetOf(firstOuterIds);
Assert.IsFalse(isInnerSubset);
Assert.IsFalse(isOuterSubset);
}
private List<FirstOuter> GetFirstOuterList() { ... }
private List<SecondOuter> GetSecondOuterList() { ... }

Statement is not updating data item

I have this LINQ statement that tries to set the 1st element in the collection of string[]. But it doesn't work.
Below is the LINQ statement.
docSpcItem.Where(x => x.DocID == 2146943)
.FirstOrDefault()
.FinishingOptionsDesc[0] = "new value";
public string[] FinishingOptionsDesc
{
get
{
if (this._FinishingOptionsDesc != null)
{
return (string[])this._FinishingOptionsDesc.ToArray(typeof(string));
}
return null;
}
set { this._FinishingOptionsDesc = new ArrayList(value); }
}
What's wrong with my LINQ statement above?
Couple of things.. There are some problems with your get and set. I would just use auto properties like this..
public class DocSpcItem
{
public string[] FinishingOptionsDesc { get; set; }
public int DocID { get; set; }
}
Next for your linq statement, depending on the presence of an item with an id of 2146943 you might be setting a new version of the object rather than the one you intended. This should work..
[TestMethod]
public void Linq()
{
var items = new List<DocSpcItem>();
//2146943
for (var i = 2146930; i <= 2146950; i++)
{
items.Add(new DocSpcItem()
{ DocID = i
, FinishingOptionsDesc = new string[]
{ i.ToString() }
}
);
}
var item = items.FirstOrDefault(i => i.DocID == 2146943);
if (item != null)
{
item.FinishingOptionsDesc = new string[]{"The New Value"};
}
}
and
public class DocSpcItem
{
public string[] FinishingOptionsDesc { get; set; }
public int DocID { get; set; }
}

trying to search through 2 arrays at once C# and print out any matching info

I have a search method that is trying to match a director name to an entry in one and/or two arrays. However, I cannot figure out how to print the names of multiple titles if the director has more than one movie in either array.
Right now my code looks like this:
if (director == myDVDs[i].Director || director == myBlu[i].Director)
{
Console.WriteLine("{0}: {1}", i + 1, myBlu[i].Title);
EndOptions();
}
else if (director != myDVDs[i].Director && director != myBlu[i].Director)
{
Console.WriteLine("{0} does not have a movie in the database try again", director);
Console.Clear();
Search();
}
You can use LINQ's Concat, like this:
var matching = myDVDs.Concat(myBlu).Where(d => d.Director == director);
int count = 1;
foreach (var m in matching) {
Console.WriteLine("{0}: {1}", count++, m.Title);
}
Generic solution which simplyfies adding a new movie type. To find titles used LINQ Select(), Where(), SelectMany(), yo might need adding using System.Linq to use these methods:
var dvds = new List<IMovie>
{
new DvdMovie {Title = "DVD1", Director = "Director1"},
new DvdMovie {Title = "DVD2", Director = "Director1"},
new DvdMovie {Title = "DVD3", Director = "Director2"}
};
var bluerays = new List<IMovie>
{
new BlueRayMovie {Title = "BR1", Director = "Director3"},
new BlueRayMovie {Title = "BR2", Director = "Director3"},
new BlueRayMovie {Title = "BR3", Director = "Director1"}
};
var allMovies = new List<IEnumerable<IMovie>> {dvds, bluerays};
string searchFor = "Director1";
// Main query, all other initialization code and types are below
IEnumerable<string> titles = allMovies.SelectMany(l =>
l.Where(sl => sl.Director == searchFor)
.Select(m => m.Title));
if (titles != null && titles.Count() > 0)
{
// OUTPUTs: DVD1, DVD2, BR3
foreach (var title in titles)
{
Console.WriteLine(title);
}
}
else
{
Console.WriteLine("Nothing found for " + searchFor);
}
Types
public interface IMovie
{
string Director { get; set; }
string Title { get; set; }
}
// TODO: Make immutable.
// Make setter private and introduce parametrized constructor
public class DvdMovie : IMovie
{
public string Director { get; set; }
public string Title { get; set; }
}
// TODO: Make immutable.
// Make setter private and introduce parametrized constructor
public class BlueRayMovie : IMovie
{
public string Director { get; set; }
public string Title { get; set; }
}

Categories