Parse tab delimited file with more than one table - c#

Can anyone recommend a good method using c# (maybe filehelper) that would let me parse a file formatted like this in c#?
%T person
%F id name address city
%R 1 Bob 999 Main St Burbank
%R 2 Sara 829 South st Pasadena
%T houses
%F id personid housetype Color
%R 25 1 House Red
%R 26 2 condo Green
I'd like to get the two tables into a data table or something that I could query with linq.
The file is tab delimited

Sample parser for this kind of data
public IEnumerable<Dictionary<string, string>> Parse(TextReader reader)
{
var state = new State { Handle = ExpectTableTitle };
return GenerateFrom(reader)
.Select(line => state.Handle(line.Split('\t'), state))
.Where(returnIt => returnIt)
.Select(returnIt => state.Row);
}
private bool ExpectTableTitle(string[] lineParts, State state)
{
if (lineParts[0] == "%T")
{
state.TableTitle = lineParts[1];
state.Handle = ExpectFieldNames;
}
else
{
Console.WriteLine("Expected %T but found '"+lineParts[0]+"'");
}
return false;
}
private bool ExpectFieldNames(string[] lineParts, State state)
{
if (lineParts[0] == "%F")
{
state.FieldNames = lineParts.Skip(1).ToArray();
state.Handle = ExpectRowOrTableTitle;
}
else
{
Console.WriteLine("Expected %F but found '" + lineParts[0] + "'");
}
return false;
}
private bool ExpectRowOrTableTitle(string[] lineParts, State state)
{
if (lineParts[0] == "%R")
{
state.Row = lineParts.Skip(1)
.Select((x, i) => new { Value = x, Index = i })
.ToDictionary(x => state.FieldNames[x.Index], x => x.Value);
state.Row.Add("_tableTitle",state.TableTitle);
return true;
}
return ExpectTableTitle(lineParts, state);
}
public class State
{
public string TableTitle;
public string[] FieldNames;
public Dictionary<string, string> Row;
public Func<string[], State, bool> Handle;
}
private static IEnumerable<string> GenerateFrom(TextReader reader)
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
Then just convert/map each resultant Dictionary to one of your domain objects based on the _tableTitle entry.
Here's a test harness using your sample data. To read from a file, pass in a StreamReader instead of a StringReader.
const string data = #"%T\tperson
%F\tid\tname\taddress\tcity
%R\t1\tBob\t999 Main St\tBurbank
%R\t2\tSara\t829 South st\tPasadena
%T\thouses
%F\tid\tpersonid\thousetype\tColor
%R\t25\t1\tHouse\tRed
%R\t26\t2\tcondo\tGreen";
var reader = new StringReader(data.Replace("\\t","\t"));
var rows = Parse(reader);
foreach (var row in rows)
{
foreach (var entry in row)
{
Console.Write(entry.Key);
Console.Write('\t');
Console.Write('=');
Console.Write('\t');
Console.Write(entry.Value);
Console.WriteLine();
}
Console.WriteLine();
}
Output:
id = 1
name = Bob
address = 999 Main St
city = Burbank
_tableTitle = person
id = 2
name = Sara
address = 829 South st
city = Pasadena
_tableTitle = person
id = 25
personid = 1
housetype = House
Color = Red
_tableTitle = houses
id = 26
personid = 2
housetype = condo
Color = Green
_tableTitle = houses

Related

Regex - Capture every line based on condition

To revisit a solution I had here over a year ago:
/* ----------------- jobnameA ----------------- */
insert_job: jobnameA job_type: CMD
date_conditions: 0
alarm_if_fail: 1
/* ----------------- jobnameB ----------------- */
insert_job: jobnameB job_type: CMD
date_conditions: 1
days_of_week: tu,we,th,fr,sa
condition: s(job1) & s(job2) & (v(variable1) = "Y" | s(job1)) & (v(variable2) = "Y"
alarm_if_fail: 1
job_load: 1
priority: 10
/* ----------------- jobnameC ----------------- */
...
I use the following regex to capture each job that has uses a variable v(x) in its condition parameter (only jobnameB here matches):
(?ms)(^[ \t]*/\*[\s-]*([\w-]*)[\s-]*\*/)((?:(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?condition\: ([^\n\r]*v\([^\n\r]*)[ \t]*\))+(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*)
I now need each line caught as parameter and value groups while satisfying the same conditions.
This regex will get each line with parameter and value as separate capture groups, but this wont take into account the presence of variables v(x), so it grabs all jobs:
(?:^([\w_]*\:) ([^\n]+))
And, the following expression will get me as far as the first line (insert_job) of the satisfying jobs, but it ends there instead of grabbing all parameters.
(?:^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/)(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?(?:^([\w_]*\:) ([^\n]+))
Any further help will be appreciated.
I think this would be much easier if you broke it up into steps. I am using LINQ for this:
var jobsWithVx = Regex.Matches(src, #"(?ms)(^[ \t]*/\*[\s-]*([\w-]*)[\s-]*\*/)((?:(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?condition\: ([^\n\r]*v\([^\n\r]*)[ \t]*\))+(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*)").Cast<Match>().Select(m => m.Value);
var jobParameters = jobsWithVx.Select(j => Regex.Matches(j, #"(?ms)^([\w_]+\:) (.+?)$")).Select(m => m.Cast<Match>().Select(am => am.Groups));
Then you can work with the job parameters:
foreach (var aJobsParms in jobParameters) {
foreach (var jobParm in aJobsParms) {
// work with job and parm
}
// alternatively, convert to a Dictionary
var jobDict = aJobsParms.ToDictionary(jpgc => jpgc[1].Value, jpgc => jpgc[2].Value));
// then work with the dictionary
}
Sample that runs in LINQPad:
var src = #"/* ----------------- jobnameA ----------------- */
insert_job: jobnameA job_type: CMD
date_conditions: 0
alarm_if_fail: 1
/* ----------------- jobnameB ----------------- */
insert_job: jobnameB job_type: CMD
date_conditions: 1
days_of_week: tu,we,th,fr,sa
condition: s(job1) & s(job2) & (v(variable1) = ""Y"" | s(job1)) & (v(variable2) = ""Y""
alarm_if_fail: 1
job_load: 1
priority: 10
/* ----------------- jobnameC ----------------- */
";
var jobsWithVx = Regex.Matches(src, #"(?ms)(^[ \t]*/\*[\s-]*([\w-]*)[\s-]*\*/)((?:(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?condition\: ([^\n\r]*v\([^\n\r]*)[ \t]*\))+(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*)").Cast<Match>().Select(m => m.Value);
var jobParameters = jobsWithVx.Select(j => Regex.Matches(j, #"(?ms)^([\w_]+\:) (.+?)$")).Select(m => m.Cast<Match>().Select(am => am.Groups));
jobParameters.Dump();
I've been parsing text files for over 40 years. If I can't do it nobody can. I tried for awhile to use Regex to split your 'name: value' inputs but was unsuccessful. So I finally wrote my own method. Take a look what I did with the days of the week
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
Job.Load(FILENAME);
}
}
public class Job
{
public static List<Job> jobs = new List<Job>();
public string name { get;set;}
public string job_type { get;set;}
public int date_conditions { get; set;}
public DayOfWeek[] days_of_week { get; set; }
public string condition { get; set; }
public int alarm_if_fail { get; set; }
public int job_load { get; set; }
public int priority { get; set;}
public static void Load(string filename)
{
Job newJob = null;
StreamReader reader = new StreamReader(filename);
string inputLine = "";
while ((inputLine = reader.ReadLine()) != null)
{
inputLine = inputLine.Trim();
if ((inputLine.Length > 0) && (!inputLine.StartsWith("/*")))
{
List<KeyValuePair<string, string>> groups = GetGroups(inputLine);
foreach (KeyValuePair<string, string> group in groups)
{
switch (group.Key)
{
case "insert_job" :
newJob = new Job();
Job.jobs.Add(newJob);
newJob.name = group.Value;
break;
case "job_type":
newJob.job_type = group.Value;
break;
case "date_conditions":
newJob.date_conditions = int.Parse(group.Value);
break;
case "days_of_week":
List<string> d_of_w = new List<string>() { "su", "mo", "tu", "we", "th", "fr", "sa" };
newJob.days_of_week = group.Value.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries).Select(x => (DayOfWeek)d_of_w.IndexOf(x)).ToArray();
break;
case "condition":
newJob.condition = group.Value;
break;
case "alarm_if_fail":
newJob.alarm_if_fail = int.Parse(group.Value);
break;
case "job_load":
newJob.job_load = int.Parse(group.Value);
break;
case "priority":
newJob.priority = int.Parse(group.Value);
break;
}
}
}
}
reader.Close();
}
public static List<KeyValuePair<string, string>> GetGroups(string input)
{
List<KeyValuePair<string, string>> groups = new List<KeyValuePair<string, string>>();
string inputLine = input;
while(inputLine.Length > 0)
{
int lastColon = inputLine.LastIndexOf(":");
string value = inputLine.Substring(lastColon + 1).Trim();
int lastWordStart = inputLine.Substring(0, lastColon - 1).LastIndexOf(" ") + 1;
string name = inputLine.Substring(lastWordStart, lastColon - lastWordStart);
groups.Insert(0, new KeyValuePair<string,string>(name,value));
inputLine = inputLine.Substring(0, lastWordStart).Trim();
}
return groups;
}
}
}

Updating nested list values with Linq?

I am trying to update a nested list in C# which looks like this
List<Users>
- UserType
- List<UserComponents>
- - UserComponentKey
- - Count
Here's a written example:
List of users:
UserType = 1
UserComponents
- UserComponentKey = XYZ
- Count = 3
UserType = 2
UserComponents
- UserComponentKey = XYZ
- Count = 7
I need to update UserComponentKey XYZ for UserType 2 only, currently my updates are broken and updates XYZ for all user types. Here is my current methods which do not work as they update the UserComponent count value for ALL usertypes which contain the specified component key, and not the specific usertype I am targeting.
CLASSES:
public class Users
{
public string UserType { get; set; }
public List<UserComponent> UserComponents { get; set; }
}
public class UserComponent
{
public string UserComponentKey { get; set; }
public int Count { get; set; }
}
METHOD 1:
Users.Where(us => us.UserType == "2")
.First().UserComponents
.Where(uc => uc.UserComponentKey == "XYZ")
.First().Count = value;
METHOD 2:
if(users.UserType == "2")
{
foreach(var component in users.UserComponents)
{
switch(component.UserComponentKey)
{
case "XYZ":
component.Count = value;
break;
}
}
}
CODE GENERATING LIST (similar to):
List UserComponents = new List();
if (Item.UserAddOn != null)
{
for (var i = 0; i < Item.UserAddOn.First().Count; i++)
{
UserComponents.Add(new UserComponent
{
UserComponentKey = Item.UserAddOn[i].ComponentKey,
Count = 0
});
}
}
if (Item.User != null)
{
for (var i = 0; i < Item.User.First().Count; i++)
{
Users.Add(new User()
{
UserType = Item.User[i].ComponentKey,
Count = 0,
UsersComponents = UserComponents
});
}
}
I have stripped out actual values etc, but hopefully someone can point me in the right direction here.
Thanks!
I'm missing information to write a snippet you can use so I will simply explain it. An object variable is in reality a reference (a pointer, if you are familiar with C++/C) to the location where the object reside. When you add an object to a list, you add it's location. If you add this object to multiple list, you give the same location and therefor, editing one of them will edit all of them.
var uc1 = new UserComponent { Count = 1 };
var uc2 = new UserComponent { Count = 2 };
var uc3 = new UserComponent { Count = 2 };
var u1 = new User();
var u2 = new User();
u1.UserComponents.Add(uc1);
u1.UserComponents.Add(uc2);
u2.UserComponents.Add(uc1);
u2.UserComponents.Add(uc3);
Console.Write(u1.UserComponents[0].Count); //Outputs 1
Console.Write(u1.UserComponents[1].Count); //Outputs 2
Console.Write(u2.UserComponents[0].Count); //Outputs 1
Console.Write(u2.UserComponents[1].Count); //Outputs 2
u2.UserComponents[0].Count = 5;
u2.UserComponents[1].Count = 6;
Console.Write(u1.UserComponents[0].Count); //Outputs 5
Console.Write(u1.UserComponents[1].Count); //Outputs 6
Console.Write(u2.UserComponents[0].Count); //Outputs 5
Console.Write(u2.UserComponents[1].Count); //Outputs 2
So your code to change values is fine, but when you build up your list, you need to create distinct UserComponents if they are not linked together.
Your first call to First() is wrong. Try it like this:
Users.Where((us) => us.UserType == "2")
.Select((us) => us.UserComponents)
.Where((uc) => uc.UserComponentKey == "XYZ")
.First()
.Count = value;
Suggestion: Why don't you make UserType an int?
May be it helps:
List<Users> _users = new List<Users>();
_users.Add(new Users() { UserType = "1", UserComponents = new List<UserComponent>() { new UserComponent() { Count = 0, UserComponentKey = "XYZ" } } });
_users.Add(new Users() { UserType = "2", UserComponents = new List<UserComponent>() { new UserComponent() { Count = 2, UserComponentKey = "XYZ" } } });
_users.Add(new Users() { UserType = "3", UserComponents = new List<UserComponent>() { new UserComponent() { Count = 5, UserComponentKey = "XYZ" } } });
_users.Where(us => us.UserType == "2").First().UserComponents.Where(uc => uc.UserComponentKey == "XYZ").First().Count = 356;
foreach (Users us in _users)
{
Console.WriteLine("UserType: " + us.UserType);
foreach (UserComponent uc in us.UserComponents)
{
Console.WriteLine("Key: {0} Value: {1}", uc.UserComponentKey, uc.Count);
}
}

Read a file 2 by 2 lines using Linq

I try to read a simple TXT file using Linq, but, my dificult is. read a file in 2 by 2 lines, for this, I made a simple function, but, I belive I can read the TXT separating 2 by 2 lines...
My code to read the text lines is:
private struct Test
{
public string Line1, Line2;
};
static List<Test> teste_func(string[] args)
{
List<Test> exemplo = new List<Test>();
var lines = File.ReadAllLines(args[0]).Where(x => x.StartsWith("1") || x.StartsWith("7")).ToArray();
for(int i=0;i<lines.Length;i++)
{
Test aux = new Test();
aux.Line1 = lines[i];
i+=1;
aux.Line2 = lines[i];
exemplo.Add(aux);
}
return exemplo;
}
Before I create this function, I tried to do this:
var lines = File.ReadAllLines(args[0]). .Where(x=>x.StartsWith("1") || x.StartsWith("7")).Select(x =>
new Test
{
Line1 = x.Substring(0, 10),
Line2 = x.Substring(0, 10)
});
But, it's obvious, that system will be get line by line and create a new struct for the line...
So, how I can make to get 2 by 2 lines with linq ?
--- Edit
Maybe is possible to create a new 'linq' function, to make that ???
Func<T> Get2Lines<T>(this Func<T> obj....) { ... }
Something like this?
public static IEnumerable<B> MapPairs<A, B>(this IEnumerable<A> sequence,
Func<A, A, B> mapper)
{
var enumerator = sequence.GetEnumerator();
while (enumerator.MoveNext())
{
var first = enumerator.Current;
if (enumerator.MoveNext())
{
var second = enumerator.Current;
yield return mapper(first, second);
}
else
{
//What should we do with left over?
}
}
}
Then
File.ReadAllLines(...)
.Where(...)
.MapPairs((a1,a2) => new Test() { Line1 = a1, Line2 = a2 })
.ToList();
File.ReadLines("example.txt")
.Where(x => x.StartsWith("1") || x.StartsWith("7"))
.Select((l, i) => new {Index = i, Line = l})
.GroupBy(o => o.Index / 2, o => o.Line)
.Select(g => new Test(g));
public struct Test
{
public Test(IEnumerable<string> src)
{
var tmp = src.ToArray();
Line1 = tmp.Length > 0 ? tmp[0] : null;
Line2 = tmp.Length > 1 ? tmp[1] : null;
}
public string Line1 { get; set; }
public string Line2 { get; set; }
}

C# merge field values if the ID's match

How do i match 2 objects in a list if their ID match, and their text doesn't?
my objects is added to a list:
List<MyObject> list = New List<MyObject>();
This could be my list (This is an object):
ID Text
1 this is some text
2 text1
1 more text
1 a little more
2 text 2
3 XXX
Then i would like the result to be:
ID Text
1 this is some text more text a little more
2 text1 text2
3 XXX
I've tried with a for in a for loop, but i just can figure it out..
for (int i = 0; i < OrderList.Count; i++)
{
bool existsMoreThanOnce = false;
for (int j = i; j < OrderList.Count; j++)
{
duplicates.Add(OrderList[i]);
if (OrderList[i].OrderNumber == OrderList[j].OrderNumber && OrderList[i].OrderText != OrderList[j].OrderText)
{
if(!uniques.Contains(OrderList[j]))
{
duplicates.Add(OrderList[j]);
existsMoreThanOnce = true;
}
}
}
if (existsMoreThanOnce == false)
{
uniques.Add(OrderList[i]);
}
}
First I create a class
public class Items
{
public int ID { get; set; }
public string Text { get; set; }
public Items(int id, string text)
{
ID = id;
Text = text;
}
}
Now I the logic of my code is
List<Items> objItems = new List<Items>();
objItems.Add(new Items(1,"Rahul"));
objItems.Add(new Items(2, "Rohit"));
objItems.Add(new Items(1, "Kumar"));
objItems.Add(new Items(2, "Verma"));
List<Items> objNew = new List<Items>(); //it will hold result
string str = "";
for (int i = 0; i < objItems.Count; i++)
{
if (objItems[i].ID > 0)
{
str = objItems[i].Text;
for (int j = i + 1; j < objItems.Count; j++)
{
if (objItems[i].ID == objItems[j].ID)
{
str += objItems[j].Text + " ";
objItems[j].ID = -1;
}
}
objNew.Add(new Items(objItems[i].ID, str));
}
}
ObjNew object contains the required output.
var result = list1.Concat(list2)
.GroupBy(x => x.ID)
.Where(g => g.GroupBy(x=>x.Text).Count() > 1)
.Select(x => x.Key)
.ToList();
You can start with LINQ's GroupBy.
var output = input.GroupBy(i => i.ID)
.Select(i => new { ID = i.Key,
Text = String.Join(" ",
i.Select(x => x.Text).ToArray()) });
First i created a class to hold your list
public class MyObject
{
public int ID { get; set; }
public string Text { get; set; }
}
then i inserted the dummy values into it
List<MyObject> obj = new List<MyObject>
{
new MyObject{ID=1, Text="this is some text"},
new MyObject{ID=2, Text="text1"},
new MyObject{ID=1, Text="more text"},
new MyObject{ID=1, Text="a little more"},
new MyObject{ID=2, Text="text 2"},
new MyObject{ID=3, Text="XXX"}
};
List<MyObject> obj2 = new List<MyObject>(); //this list will hold your output
//the linq query will filter out the uniques ids.
var uniqueIds = (from a in obj select new { a.ID, a.Text }).GroupBy(x => x.ID).ToList();
//then iterated through all the unique ids to merge the texts and list them under the unique ids.
int id=0;
foreach (var item in uniqueIds)
{
string contText = "";
for (int j = 0; j < item.Count(); j++)
{
contText += item.ElementAt(j).Text + " ";
id = item.ElementAt(j).ID;
}
obj2.Add(new MyObject { ID = id, Text = contText });
}
the list obj2 will have your desired output.

Linq write new list from old list sublist, change said list, write back to old list

I'm rather new to Linq. I'm having trouble coding this.
I have a list with many different sublists.
oldList[0] some type
oldList[1] another different type
oldList[2] the type I want
oldList[3] more types
I want to select all the parameters from a specific type and write them to a temp list.
If that temp list is empty, I want to assign some values (values don't actually matter).
After changing the values, I want to write temp list back into oldList.
Please advise. This is a huge learning experience for me.
public void myFunction(list)
{
//list contains at least 5 sublists of various type
//check if the type I want is null
IEnumerable<TypeIWant> possiblyEmptyList = list.OfType<TypeIWant>(); //find the type I want from the list and save it
if (possiblyEmptyList == null) //this doesn't work and possiblyEmptyList.Count <= 1 doesn't work
{
//convert residence address to forwarding address
IEnumerable<ReplacementType> replacementList = list.OfType<ReplacementType>();
forwardingAddress = replacementList.Select(x => new TypeIWant /* this statement functions exactly the way I want it to */
{
Address1 = x.Address1,
Address2 = x.Address2,
AddressType = x.AddressType,
City = x.City,
CountryId = x.CountryId,
CountyRegion = x.CountyRegion,
Email = x.Email,
ConfirmEmail = x.ConfirmEmail,
Fax = x.Fax,
Telephone = x.Telephone,
State = x.State,
PostalCode = x.PostalCode
});
//write forwarding address back to list
//don't know how to do this
}
LINQ purpose is querying. It doesn't allow you to replace some items in collection with other items. Use simple loop instead:
IEnumerable<TypeIWant> possiblyEmptyList = list.OfType<TypeIWant>();
if (!possiblyEmptyList.Any())
{
for (int i = 0; i < list.Count; i++)
{
ReplacementType item = list[i] as ReplacementType;
if (item == null)
continue;
list[i] = ConvertToTypeIWant(item);
}
}
And conversion (which is better to do with something like automapper):
private TypeIWant ConvertToTypeIWant(ReplacementType x)
{
return new TypeIWant
{
Address1 = x.Address1,
Address2 = x.Address2,
AddressType = x.AddressType,
City = x.City,
CountryId = x.CountryId,
CountyRegion = x.CountyRegion,
Email = x.Email,
ConfirmEmail = x.ConfirmEmail,
Fax = x.Fax,
Telephone = x.Telephone,
State = x.State,
PostalCode = x.PostalCode
};
}
Not LINQ but an example.
class Program
{
static void Main(string[] args)
{
// Vars
var list = new List<List<string>>();
var a = new List<string>();
var b = new List<string>();
var c = new List<string> { "one", "two", "three" };
var d = new List<string>();
// Add Lists
list.Add(a);
list.Add(b);
list.Add(c);
list.Add(d);
// Loop through list
foreach (var x in list)
{
if (x.Count < 1)
{
var tempList = new List<string>();
tempList.Add("some value");
x.Clear();
x.AddRange(tempList);
}
}
// Print
int count = 0;
foreach (var l in list)
{
count++;
Console.Write("(List " + count + ") ");
foreach (var s in l)
{
Console.Write(s + " ");
}
Console.WriteLine("");
}
}
}
(List 1) some value
(List 2) some value
(List 3) one two three
(List 4) some value

Categories