displaying sentence using string chunks - c#

here is a program i made to display all the possible strings containing "who" & "your" within an xml file. The xml file contains few sentences like:
how are you,what is your name,what is your school name. The program which i code is displaying the sentences if both "who" and "you" comes one after one. How can i break a string into chunks and then pass them to check through xml.
The code whice i tried is :
var doc = XDocument.Load("dic.xml");
string findString = "what your";
var results = doc.Descendants("s")
.Where(d => d.Value.Contains(findString.ToLower()))
.Select(d => d.Value);
foreach (string result in results)
{
Console.WriteLine(result);
}
Thanks in advance.

You would need to check if each result contains "who" and "your". Your original code was looking for the string "who your" not the two strings "who" and "your". See this link for information on string.Contains(string)
Code
var doc = XDocument.Load("dic.xml");
var results = doc.Descendants("s").Where(d => d.Value.Contains("your") || d.Value.Contains("who")).Select(d => d.Value);
foreach (string result in results)
{
Console.WriteLine(result);
}
Edit: Misread your original code and put the filtering in the wrong spot

Related

Splitting of values in a JSON array

I have my json ["[\"~:bbl:P5085\",\"~:cosco:NoTag\"]"] coming in
options.Type1.Values()
I am trying to keep only the values coming with bbl so from above I want to keep P5085 and remove all, there can be multiple bbl values in here and I need to keep all. I tried the below code but its not working. The splitting gives me
P5085","~:cosco
I dont understand what wrong am i doing in below code. Can someone provide the fix here?
private void InitializePayload(JsonTranslatorOptions options)
{
_payload.Add("ubsub:attributes", _attributes);
_payload.Add("ubsub:relations", _relations);
JArray newType = new JArray();
foreach (JValue elem in options.Type1.Values())
{
if (elem.ToString().Contains("rdl"))
{
string val = elem.ToString().Split(":")[1];
newType.Add(val);
}
}
_payload.Add("ubsub:type", newType);
}
Try this:
var input = "['[\"~:bbl:P5085\",\"~:cosco:NoTag\"]']";
var BBLs_List = JArray.Parse(input)
.SelectMany(m => JArray.Parse(m.ToString()))
.Select(s => s.ToString().Split(":"))
.Where(w => w[1] == "bbl")
.Select(s => s[2])
.ToList();
As I explain in the comments this isn't JSON, except at the top level which is an array with a single string value. That specific string could be parsed as a JSON array itself, but its values can't be handled as JSON in any way. They're just strings.
While you could try parsing and splitting that string, it would be a lot safer to find the actual specification of that format and write a parser for it. Or find a library for that API.
You could use the following code for parsing, but it's slow, not very readable and based on assumptions that can easily break - what happens if a value contains a colon?
foreach(var longString in JArray.Parse(input))
{
foreach(var smallString in JArray.Parse(longString))
{
var values=smallString.Split(":");
if(values[1]=="bbl")
{
return values[2];
}
}
}
return null;
You could convert that to LINQ, but that would be just as hard to read :
var value=JArray.Parse(input)
.SelectMany(longString=>JArray.Parse(longString))
.Select(smallString=>smallString.Split(":"))
.Where(values=>values[1]=="bbl")
.Select(values=>values[2])
.FirstOrDefault();

How to compare list of strings to a string where elements in the list might have letters be scrambled up?

I'm trying to write a lambda expression to compare members of a list to a string, but I also need to catch elements in the list that might have their letter scrambled up.
Here's code I got right now
List<string> listOfWords = new List<String>() { "abc", "test", "teest", "tset"};
var word = "test";
var results = listOfWords.Where(s => s == word);
foreach (var i in results)
{
Console.Write(i);
}
So this code will find string "test" in the list and will print it out, but I also want it to catch cases like "tset". Is this possible to do easily with linq or do I have to use loops?
How about sorting the letters and seeing if the resulting sorted sequences of chars are equal?
var wordSorted = word.OrderBy(c=>c);
listOfWords.Where(w => w.OrderBy(c=>c).SequenceEqual(wordSorted));

Transforming List<string> into a tokenised string

I have a list of strings in a List container class that look like the following:
MainMenuItem|MenuItem|subItemX
..
..
..
..
MainMenuItem|MenuItem|subItem99
What I am trying to do is transform the string, using LINQ, so that the first item for each of the tokenised string is removed.
This is the code I already have:
protected static List<string> _menuItems = GetMenuItemsFromXMLFile();
_menuItems.Where(x => x.Contains(menuItemToSearch)).ToList();
First line of code is returning an entire XML file with all the menu items that exist within an application in a tokenised form;
The second line is saying 'get me all menu items that belong to menuItemToSearch'.
menuItemToSearch is contained in the delimited string that is returned. How do I remove it using linq?
EXAMPLE
Before transform: MainMenuItem|MenuItem|subItem99
After transform : MenuItem|subItem99
Hope the example illustrates my intentions
Thanks
You can take a substring from the first position of the pipe symbol '|' to remove the first item from a string, like this:
var str = "MainMenuItem|MenuItem|subItemX";
var dropFirst = str.Substring(str.IndexOf('|')+1);
Demo.
Apply this to all strings from the list in a LINQ Select to produce the desired result:
var res = _menuItems
.Where(x => x.Contains(menuItemToSearch))
.Select(str => str.Substring(str.IndexOf('|')+1))
.ToList();
Maybe sth like this can help you.
var regex = new Regex("[^\\|]+\\|(.+)");
var list = new List<string>(new string[] { "MainMenuItem|MenuItem|subItem99", "MainMenuItem|MenuItem|subItem99" });
var result = list.Where(p => regex.IsMatch(p)).Select(p => regex.Match(p).Groups[1]).ToList();
This should work correctly.

Process part of the regex match before replacing it

I'm writing a function that will parse a file similar to an XML file from a legacy system.
....
<prod pid="5" cat='gov'>bla bla</prod>
.....
<prod cat='chi'>etc etc</prod>
....
.....
I currently have this code:
buf = Regex.Replace(entry, "<prod(?:.*?)>(.*?)</prod>", "<span class='prod'>$1</span>");
Which was working fine until it was decided that we also wanted to show the categories.
The problem is, categories are optional and I need to run the category abbreviation through a SQL query to retrieve the category's full name.
eg:
SELECT * FROM cats WHERE abbr='gov'
The final output should be:
<span class='prod'>bla bla</span><span class='cat'>Government</span>
Any idea on how I could do this?
Note1: The function is done already (except this part) and working fine.
Note2: Cannot use XML libraries, regex has to be used
Regex.Replace has an overload that takes a MatchEvaluator, which is basically a Func<Match, string>. So, you can dynamically generate a replacement string.
buf = Regex.Replace(entry, #"<prod(?<attr>.*?)>(?<text>.*?)</prod>", match => {
var attrText = match.Groups["attr"].Value;
var text = match.Groups["text"].Value;
// Now, parse your attributes
var attributes = Regex.Matches(#"(?<name>\w+)\s*=\s*(['""])(?<value>.*?)\1")
.Cast<Match>()
.ToDictionary(
m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
string category;
if (attributes.TryGetValue("cat", out category))
{
// Your SQL here etc...
var label = GetLabelForCategory(category)
return String.Format("<span class='prod'>{0}</span><span class='cat'>{1}</span>", WebUtility.HtmlEncode(text), WebUtility.HtmlEncode(label));
}
// Generate the result string
return String.Format("<span class='prod'>{0}</span>", WebUtility.HtmlEncode(text));
});
This should get you started.

Speedily Read and Parse Data

As of now, I am using this code to open a file and read it into a list and parse that list into a string[]:
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
string[] splitCP4DataBaseLines = CP4DataBaseRTB.Text.Split('\n');
List<string> tempCP4List = new List<string>();
string[] line1CP4Components;
foreach (var line in splitCP4DataBaseLines)
tempCP4List.Add(line + Environment.NewLine);
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
line1CP4Components = new Regex("\"UNIT\",\"PARTS\"", RegexOptions.Multiline)
.Split(concattedUnitPart)
.Where(c => !string.IsNullOrEmpty(c)).ToArray();
I am wondering if there is a quicker way to do this. This is just one of the files I am opening, so this is repeated a minimum of 5 times to open and properly load the lists.
The minimum file size being imported right now is 257 KB. The largest file is 1,803 KB. These files will only get larger as time goes on as they are being used to simulate a database and the user will continually add to them.
So my question is, is there a quicker way to do all of the above code?
EDIT:
***CP4***
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106536"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:11"
"PMABAR",""
"COMMENT",""
"PTPNAME","R160805"
"CMPNAME","R160805"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",180
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.25
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.40
"PTPTLCL",10
"PTPTLPX",0.30
"PTPTLPY",0.30
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",100
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106653"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:42"
"PMABAR",""
"COMMENT",""
"PTPNAME","0603R"
"CMPNAME","0603R"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",18
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.23
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.34
"PTPTLCL",0
"PTPTLPX",0.60
"PTPTLPY",0.40
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",80
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
... the file goes on and on and on.. and will only get larger.
The REGEX is placing each "UNIT PARTS" and the following code until the NEXT "UNIT PARTS" into a string[].
After this, I am checking each string[] to see if the "NAME" section exists in a different list. If it does exist, I am outputting that "UNIT PARTS" at the end of a textfile.
This bit is a potential performance killer:
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
(See this article for why.) Use a StringBuilder for repeated concatenation:
// No need to use tempCP4List at all
StringBuilder builder = new StringBuilder();
foreach (var line in splitCP4DataBaseLines)
{
concattedUnitPart.AppendLine(line);
line1CP4PartLines++;
}
Or even just:
string concattedUnitPart = string.Join(Environment.NewLine,
splitCP4DataBaseLines);
Now the regex part may well also be slow - I'm not sure. It's not obvious what you're trying to achieve, whether you need regular expressions at all, or whether you really need to do the whole thing in one go. Can you definitely not just process it line by line?
You could achieve the same output list 'line1CP4Components' using the following:
Regex StripEmptyLines = new Regex(#"^\s*$", RegexOptions.Multiline);
Regex UnitPartsMatch = new Regex(#"(?<=\n)""UNIT"",""PARTS"".*?(?=(?:\n""UNIT"",""PARTS"")|$)", RegexOptions.Singleline);
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
List<string> line1CP4Components = new List<string>(
UnitPartsMatch.Matches(StripEmptyLines.Replace(CP4DataBaseRTB.Text, ""))
.OfType<Match>()
.Select(m => m.Value)
);
return line1CP4Components.ToArray();
You may be able to ignore the use of StripEmptyLines, but your original code is doing this via the Where(c => !string.IsNullOrEmpty(c)). Also your original code is causing the '\r' part of the "\r\n" newline/linefeed pair to be duplicated. I assumed this was an accident and not intentional?
Also you don't seem to be using the value in 'line1CP4PartLines' so I omitted the creation of the value. It was seemingly inconsistent with the omission of empty lines later so I guess you're not depending on it. If you need this value a simple regex can tell you how many new lines are in the string:
int linecount = new Regex("^", RegexOptions.Multiline).Matches(CP4DataBaseRTB.Text).Count;
// example of what your code will look like
string CP4DataBase = "C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
List<string> Cp4DataList = new List<string>(File.ReadAllLines(CP4DataBase);
//or create a Dictionary<int,string[]> object
string strData = string.Empty;//hold the line item data which is read in line by line
string[] strStockListRecord = null;//string array that holds information from the TFE_Stock.txt file
Dictionary<int, string[]> dctStockListRecords = null; //dictionary object that will hold the KeyValuePair of text file contents in a DictList
List<string> lstStockListRecord = null;//Generic list that will store all the lines from the .prnfile being processed
if (File.Exists(strExtraLoadFileLoc + strFileName))
{
try
{
lstStockListRecord = new List<string>();
List<string> lstStrLinesStockRecord = new List<string>(File.ReadAllLines(strExtraLoadFileLoc + strFileName));
dctStockListRecords = new Dictionary<int, string[]>(lstStrLinesStockRecord.Count());
int intLineCount = 0;
foreach (string strLineSplit in lstStrLinesStockRecord)
{
lstStockListRecord.Add(strLineSplit);
dctStockListRecords.Add(intLineCount, lstStockListRecord.ToArray());
lstStockListRecord.Clear();
intLineCount++;
}//foreach (string strlineSplit in lstStrLinesStockRecord)
lstStrLinesStockRecord.Clear();
lstStrLinesStockRecord = null;
lstStockListRecord.Clear();
lstStockListRecord = null;
//Alter the code to fit what you are doing..

Categories