Produce List with intent items from row text

Produce List with intent items from row text - c#

I have some row text. I want from this text to create Events and intent activities, put them on lists and then import them on a db. Let me be more specific. Let’s assume that I have the following sentence:
“On Party1 we will take some drinks, eat very good food and play ps3 games. On Party2 we will do karaoke and dance with hip hop songs.”
My desire output want to be:
Event 1: Party1
Activitie1: take some drinks
Activitie2: eat very good food
Activitie3: play ps3 games
Event2: Party2
Activitie1: do karaoke
Activitie2: dance with hip hop songs
What I have done until now, is allowing special characters “#,*” . So when I have * I produce an event and when I have # I produce an activity. As a result I can produce events and activities but I can’t match them. I need to find a way to match(intent) the produced elements. My code:
string astring = Convert.ToString(lblrowtext.Text);
if (astring.IndexOf("#") >= 0 || astring.IndexOf("*") >= 0)
{
string[] Eventsarray;
string[] Activitiessarray;
List<string> listofEvents = new List<string>();
List<string> listofActivities = new List<string>();
Activitiessarray = Regex.Matches(astring, #"#([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
Eventsarray = Regex.Matches(astring, #"\*([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
if (Activitiessarray != null)
{
if (Activitiessarray.Count() > 0)
{
listofActivities = new List<string>(Activitiessarray);
ViewState["listofActivities"] = listofActivities;
Repeater1.DataSource = listofActivities;
Repeater1.DataBind();
}
if (Eventsarray != null)
{
if (Eventsarray.Count() > 0)
{
listofEvents = new List<string>(Eventsarray);
ViewState["listofEvents"] = listofEvents;
Repeater2.DataSource = listofEvents;
Repeater2.DataBind();
}
}
}
}

You can achieve this like below:
private static void Main(string[] args)
{
var updationIndex = 0;
const string inputString = "On *Party1 #we will take some drinks, #eat very good food and #play ps3 games.# ";
Func<string, char, char, string> getMyString = (givenString, skipTill, takeTill) =>
{
var opString =
new string(
givenString.ToCharArray()
.SkipWhile(x => x != skipTill)
.Skip(1)
.TakeWhile(x => x != takeTill)
.ToArray());
updationIndex = inputString.IndexOf(givenString, StringComparison.CurrentCultureIgnoreCase)
+ opString.Length;
return opString;
};
var eventName = getMyString(inputString, '*', '#');
Console.WriteLine("Event" + eventName);
Console.WriteLine("Activities: ");
while (updationIndex < inputString.Length)
{
var activity = getMyString(inputString.Remove(0, updationIndex), '#', '#');
Console.WriteLine(activity);
if (string.IsNullOrWhiteSpace(activity))
{
break;
}
}
}

Create an Event class to which you add Activities. Each time you encounter a *, create a new Event instance.
Psuedo-C#:
class Event
{
string Description;
List<Activity> Activities;
}
class Activity
{
string Description;
}
List<Event> ParseEventLines(IEnumerable<string> allLines)
{
var result = new List<Event>();
Event currentEvent = null;
foreach (var line in allLines)
{
if (line.StartsWith("*"))
{
currentEvent = new Event { Description = line };
result.Add(currentEvent);
}
else if (line.StartsWith("#"))
{
currentEvent.Activities.Add(new Activity { Description = line });
}
}
return result;
}

Related

binary search in a sorted list in c#

I am retrieving client id\ drum id from a file and storing them in a list.
then taking the client id and storing it in another list.
I need to display the client id that the user specifies (input_id) on a Datagrid.
I need to get all the occurrences of this specific id using binary search.
the file is already sorted.
I need first to find the occurrences of input_id in id_list.
The question is: how to find all the occurrences of input_id in the sorted list id_list using binary search?
using(StreamReader sr= new StreamReader(path))
{
List<string> id_list = new List<string>();
List<string> all_list= new List<string>();
List<int> indexes = new List<int>();
string line = sr.ReadLine();
line = sr.ReadLine();
while (line != null)
{
all_list.Add(line);
string[] break1 = line.Split('/');
id_list.Add(break1[0]);
line = sr.ReadLine();
}
}
string input_id = textBox1.Text;
Data in the file:
client id/drum id
-----------------
123/321
231/3213
321/213123 ...

If the requirement was to use binary search I would create a custom class with a comparer, and then find an element and loop forward/backward to get any other elements. Like:
static void Main(string[] args
{
var path = #"file path...";
// read all the Ids from the file.
var id_list = File.ReadLines(path).Select(x => new Drum
{
ClientId = x.Split('/').First(),
DrumId = x.Split('/').Last()
}).OrderBy(o => o.ClientId).ToList();
var find = new Drum { ClientId = "231" };
var index = id_list.BinarySearch(find, new DrumComparer());
if (index != -1)
{
List<Drum> matches = new List<Drum>();
matches.Add(id_list[index]);
//get previous matches
for (int i = index - 1; i > 0; i--)
{
if (id_list[i].ClientId == find.ClientId)
matches.Add(id_list[i]);
else
break;
}
//get forward matches
for (int i = index + 1; i < id_list.Count; i++)
{
if (id_list[i].ClientId == find.ClientId)
matches.Add(id_list[i]);
else
break;
}
}
}
public class Drum
{
public string DrumId { get; set; }
public string ClientId { get; set; }
}
public class DrumComparer : Comparer<Drum>
{
public override int Compare(Drum x, Drum y) =>
x.ClientId.CompareTo(y.ClientId);
}

If i understand you question right then this should be a simple where stats.
// read all the Ids from the file.
var Id_list = File.ReadLines(path).Select(x => new {
ClientId = x.Split('/').First(),
DrumId = x.Split('/').Last()
}).ToList();
var foundIds = Id_list.Where(x => x.ClientId == input_id);

creating array of bad names to check and replace in c#

I'm looking to create a method that loops through an list and replaces with matched values with a new value. I have something working below but it really doesnt follow the DRY principal and looks ugly.
How could I create a dictionary of value pairs that would hold my data of values to match and replace?
var match = acreData.data;
foreach(var i in match)
{
if (i.county_name == "DE KALB")
{
i.county_name = "DEKALB";
}
if (i.county_name == "DU PAGE")
{
i.county_name = "DUPAGE";
}
}

In your question, you can try to use linq and Replace to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as #user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}

If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}

If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));

We can try removing all the characters but letters and apostroph (Cote d'Ivoire has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '\''));
...

I made a comment under answer of #Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.

Use foreach to loop through list (or array) as sequence of pairs

I'm creating a simple program that reads username and password from Customers.txt file and verifies if it's correct. Customer.txt is formatted to have sequence of the username then password separated by commas (ignore safety concerns for the exercise):
JohnDoe,1234,JaneDoe,5678,...
Is it possible to create a foreach loop to iterate through the strArray[] and check pairs {strArray[0], strArray[1]} then {strArray[2],strArray[3]} and so on to see if user put the right credentials?
private void enter_click_Click(object sender, RoutedEventArgs e)
{
StreamReader reader1 = new StreamReader("Customers.txt");
string text = reader1.ReadToEnd();
reader1.Close();
string[] Strarray = text.Split(',');
StreamReader reader2 = new StreamReader("Admin.txt");
string text2 = reader2.ReadToEnd();
reader2.Close();
string[] AdminArray = text2.Split(',');
if (username_txt.Text == AdminArray[0] && passwordBox1.Password == AdminArray[1])
{
AdminPage admin = new AdminPage();
admin.Activate();
admin.Show();
method.CheckDate();
return;
}
if (username_txt.Text == Strarray[0] && passwordBox1.Password == Strarray[1])
{
ATM_Screen atm = new ATM_Screen();
atm.Activate();
atm.Show();
method.CheckDate();
return;
}

Yes, you can do this:
var usernames = Strarray.Where((s, i) => { return i % 2 == 0; });
var passwords = Strarray.Where((s, i) => { return i % 2 != 0; });
var userPasswords = usernames.Zip(passwords, (l, r) => new { username = l, password = r });
foreach(var userPassword in userPasswords) {
if (userPassword.username == "rob" && userPassword.password == "robspassword") {
}
}
Edit based on comment:
You can do this for multiple valid credentials:
var allowedCredentials = new List<Tuple<String, String>> {
new Tuple<String, String>("Rob", "Robspassword"),
new Tuple<String, String> ("SomeoneElse", "SomeoneElsespassword"
};
var inputCredentials = new List<string> { "Rob","Robspassword","Rob","Notrobspassword" };
var usernames = inputCredentials.Where((s, i) => { return i % 2 == 0; });
var passwords = inputCredentials.Where((s, i) => { return i % 2 != 0; });
var userPasswords = usernames.Zip(passwords, (l, r) => new { username = l, password = r });
foreach(var userPassword in userPasswords) {
if (allowedCredentials.Any(ac => ac.Item1 == userPassword.username
&& ac.Item2 == userPassword.password)
{
//Valid
}
}

Using foreach is kind of hard to get pairs of elements from sequence:
you can iterate through sequence normally and remember first element on the odd iterations, perform operation on every even iteration
you can convert sequence into pairs first using LINQ and indexing or Zip even and odd half.
Collecting pairs approach:
string name;
bool even = false;
foreach(var text in items)
{
if (even)
{
var password = item;
// check name, password here
}
else
{
name = text;
}
even = !even;
}
Zip-based approach
foreach(var pair in items
.Where((v,id)=>id % 2 == 1) // odd - names
.Zip(items.Where((v,id)=>id % 2 == 0), // even - passwords
(name,password)=> new { Name = name, Password = password}))
{
// check pair.Name, pair.Password
}
Note: It would be much easier to use regular for loop with increment of 2.

How to make the custom parser for text file

Actually I set four columns using data table and I want this column retrieve value from text file. I used regex for remove the particular line from the text file.
My objective is that I want to show text file on the grid using data table so first I am trying to create data table and remove the line (show at the program) using regex.
Here I post my full code.
namespace class
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
StreamReader sreader = File.OpenText(#"C:\FareSearchRegex.txt");
string line;
DataTable dt = new DataTable();
DataRow dr;
dt.Columns.Add("PTC");
dt.Columns.Add("CUR");
dt.Columns.Add("TAX");
dt.Columns.Add("FARE BASIS");
while ((line = sreader.ReadLine()) != null)
{
var pattern = "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------";
var result = Regex.Replace(line,pattern," ");
dt.Rows.Add(line);
}
}
}
class Class1
{
string PTC;
string CUR;
float TAX;
public string gsPTC
{
get{ return PTC; }
set{ PTC = value; }
}
public string gsCUR
{
get{ return CUR; }
set{ CUR = value; }
}
public float gsTAX
{
get{ return TAX; }
set{ TAX = value; }
}
}
}

If your format is strict(e.g. always 4 columns) and you want to remove only this complete line i don't see any reason to use regex:
var rows = File.ReadLines(#"C:\FareSearchRegex.txt")
.Where(l => l != "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------")
.Select(l => new { line = l, items = l.Split(','), row = dt.Rows.Add() });
foreach (var x in rows)
x.row.ItemArray = x.items;
(assumed that the fields are separated by comma)
Edit: This works with your pastebin:
string header = " PTC CUR TAX FARE BASIS";
bool takeNextLine = false;
foreach (string line in File.ReadLines(#"C:\FareSearchRegex.txt"))
{
if (line.StartsWith(header))
takeNextLine = true;
else if (takeNextLine)
{
var tokens = line.Split(new[] { #" " }, StringSplitOptions.RemoveEmptyEntries);
dt.Rows.Add().ItemArray = tokens.Where((t, i) => i != 2).ToArray();
takeNextLine = false;
}
}
(since you have an empty column which you want to exclude from the result i've used the clumsy and possibly error-prone(?) query Where((t, i) => i != 2))

To parse the file you'll need to:
Split the text of the file into data chunks. A chunk, in your case can be identified by the header PTC CUR TAX FARE BASIS and by the TOTAL line. To split the text you'll need to tokenize the input as follows> (i) define a regular expression to match the headers, (ii) define a regular expression to match the Total lines (footers); Using (i) and (ii) you can join them by the order of appearance index and determine the total size of each chunk (see the line with (x,y)=>new{StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length}) below). Use String.Substring method to separate the chunks.
Extract the data from each individual chunk. Knowing that data is split by lines you just have to iterate through all lines in a chunk (ignoring header and footer) and process each line.
This code should help:
string file = #"C:\FareSearchRegex.txt";
string text = File.ReadAllText(file);
var headerRegex = new Regex(#"^(\)>)?\s+PTC\s+CUR\s+TAX\s+FARE BASIS$", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var totalRegex = new Regex(#"^\s+TOTAL[\w\s.]+?$",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var lineRegex = new Regex(#"^(?<Num>\d+)?\s+(?<PTC>[A-Z]+)\s+\d+\s(?<Cur>[A-Z]{3})\s+[\d.]+\s+(?<Tax>[\d.]+)",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var dataIndices =
headerRegex.Matches(text).Cast<Match>()
.Select((m, index) => new{ Index = index, Match = m })
.Join(totalRegex.Matches(text).Cast<Match>().Select((m, index) => new{ Index = index, Match = m }),
x => x.Index,
x => x.Index,
(x, y) => new{ StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length });
var items = dataIndices
.Aggregate(new List<string>(), (list, x) =>
{
var item = text.Substring(x.StartIndex, x.EndIndex - x.StartIndex);
list.Add(item);
return list;
});
var result = items.SelectMany(x =>
{
var lines = x.Split(new string[]{Environment.NewLine, "\r", "\n"}, StringSplitOptions.RemoveEmptyEntries);
return lines.Skip(1) //Skip header
.Take(lines.Length - 2) // Ignore footer
.Select(line =>
{
var match = lineRegex.Match(line);
return new
{
Ptc = match.Groups["PTC"].Value,
Cur = match.Groups["Cur"].Value,
Tax = Convert.ToDouble(match.Groups["Tax"].Value)
};
});
});

Determining value jumps in List<T>

I have a class:
public class ShipmentInformation
{
public string OuterNo { get; set; }
public long Start { get; set; }
public long End { get; set; }
}
I have a List<ShipmentInformation> variable called Results.
I then do:
List<ShipmentInformation> FinalResults = new List<ShipmentInformation>();
var OuterNumbers = Results.GroupBy(x => x.OuterNo);
foreach(var item in OuterNumbers)
{
var orderedData = item.OrderBy(x => x.Start);
ShipmentInformation shipment = new ShipmentInformation();
shipment.OuterNo = item.Key;
shipment.Start = orderedData.First().Start;
shipment.End = orderedData.Last().End;
FinalResults.Add(shipment);
}
The issue I have now is that within each grouped item I have various ShipmentInformation but the Start number may not be sequential by x. x can be 300 or 200 based on a incoming parameter. To illustrate I could have
Start = 1, End = 300
Start = 301, End = 600
Start = 601, End = 900
Start = 1201, End = 1500
Start = 1501, End = 1800
Because I have this jump I cannot use the above loop to create an instance of ShipmentInformation and take the first and last item in orderedData to use their data to populate that instance.
I would like some way of identifying a jump by 300 or 200 and creating an instance of ShipmentInformation to add to FinalResults where the data is sequnetial.
Using the above example I would have 2 instances of ShipmentInformation with a Start of 1 and an End of 900 and another with a Start of 1201 and End of 1800

Try the following:
private static IEnumerable<ShipmentInformation> Compress(IEnumerable<ShipmentInformation> shipments)
{
var orderedData = shipments.OrderBy(s => s.OuterNo).ThenBy(s => s.Start);
using (var enumerator = orderedData.GetEnumerator())
{
ShipmentInformation compressed = null;
while (enumerator.MoveNext())
{
var current = enumerator.Current;
if (compressed == null)
{
compressed = current;
continue;
}
if (compressed.OuterNo != current.OuterNo || compressed.End < current.Start - 1)
{
yield return compressed;
compressed = current;
continue;
}
compressed.End = current.End;
}
if (compressed != null)
{
yield return compressed;
}
}
}
Useable like so:
var finalResults = Results.SelectMany(Compress).ToList();

If you want something that probably has terrible performance and is impossible to understand, but only uses out-of-the box LINQ, I think this might do it.
var orderedData = item.OrderBy(x => x.Start);
orderedData
.SelectMany(x =>
Enumerable
.Range(x.Start, 1 + x.End - x.Start)
.Select(n => new { time = n, info = x))
.Select((x, i) => new { index = i, time = x.time, info = x.info } )
.GroupBy(t => t.time - t.info)
.Select(g => new ShipmentInformation {
OuterNo = g.First().Key,
Start = g.First().Start(),
End = g.Last().End });
My brain hurts.
(Edit for clarity: this just replaces what goes inside your foreach loop. You can make it even more horrible by putting this inside a Select statement to replace the foreach loop, like in rich's answer.)

How about this?
List<ShipmentInfo> si = new List<ShipmentInfo>();
si.Add(new ShipmentInfo(orderedData.First()));
for (int index = 1; index < orderedData.Count(); ++index)
{
if (orderedData.ElementAt(index).Start ==
(si.ElementAt(si.Count() - 1).End + 1))
{
si[si.Count() - 1].End = orderedData.ElementAt(index).End;
}
else
{
si.Add(new ShipmentInfo(orderedData.ElementAt(index)));
}
}
FinalResults.AddRange(si);

Another LINQ solution would be to use the Except extension method.
EDIT: Rewritten in C#, includes composing the missing points back into Ranges:
class Program
{
static void Main(string[] args)
{
Range[] l_ranges = new Range[] {
new Range() { Start = 10, End = 19 },
new Range() { Start = 20, End = 29 },
new Range() { Start = 40, End = 49 },
new Range() { Start = 50, End = 59 }
};
var l_flattenedRanges =
from l_range in l_ranges
from l_point in Enumerable.Range(l_range.Start, 1 + l_range.End - l_range.Start)
select l_point;
var l_min = 0;
var l_max = l_flattenedRanges.Max();
var l_allPoints =
Enumerable.Range(l_min, 1 + l_max - l_min);
var l_missingPoints =
l_allPoints.Except(l_flattenedRanges);
var l_lastRange = new Range() { Start = l_missingPoints.Min(), End = l_missingPoints.Min() };
var l_missingRanges = new List<Range>();
l_missingPoints.ToList<int>().ForEach(delegate(int i)
{
if (i > l_lastRange.End + 1)
{
l_missingRanges.Add(l_lastRange);
l_lastRange = new Range() { Start = i, End = i };
}
else
{
l_lastRange.End = i;
}
});
l_missingRanges.Add(l_lastRange);
foreach (Range l_missingRange in l_missingRanges) {
Console.WriteLine("Start = " + l_missingRange.Start + " End = " + l_missingRange.End);
}
Console.ReadKey(true);
}
}
class Range
{
public int Start { get; set; }
public int End { get; set; }
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Produce List with intent items from row text - c#

Related

binary search in a sorted list in c#

creating array of bad names to check and replace in c#

Use foreach to loop through list (or array) as sequence of pairs

How to make the custom parser for text file

Determining value jumps in List<T>

Categories

Resources