Process part of the regex match before replacing it - c#

I'm writing a function that will parse a file similar to an XML file from a legacy system.
....
<prod pid="5" cat='gov'>bla bla</prod>
.....
<prod cat='chi'>etc etc</prod>
....
.....
I currently have this code:
buf = Regex.Replace(entry, "<prod(?:.*?)>(.*?)</prod>", "<span class='prod'>$1</span>");
Which was working fine until it was decided that we also wanted to show the categories.
The problem is, categories are optional and I need to run the category abbreviation through a SQL query to retrieve the category's full name.
eg:
SELECT * FROM cats WHERE abbr='gov'
The final output should be:
<span class='prod'>bla bla</span><span class='cat'>Government</span>
Any idea on how I could do this?
Note1: The function is done already (except this part) and working fine.
Note2: Cannot use XML libraries, regex has to be used

Regex.Replace has an overload that takes a MatchEvaluator, which is basically a Func<Match, string>. So, you can dynamically generate a replacement string.
buf = Regex.Replace(entry, #"<prod(?<attr>.*?)>(?<text>.*?)</prod>", match => {
var attrText = match.Groups["attr"].Value;
var text = match.Groups["text"].Value;
// Now, parse your attributes
var attributes = Regex.Matches(#"(?<name>\w+)\s*=\s*(['""])(?<value>.*?)\1")
.Cast<Match>()
.ToDictionary(
m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
string category;
if (attributes.TryGetValue("cat", out category))
{
// Your SQL here etc...
var label = GetLabelForCategory(category)
return String.Format("<span class='prod'>{0}</span><span class='cat'>{1}</span>", WebUtility.HtmlEncode(text), WebUtility.HtmlEncode(label));
}
// Generate the result string
return String.Format("<span class='prod'>{0}</span>", WebUtility.HtmlEncode(text));
});
This should get you started.

Related

Unity Lists Finding 2 letters in a list containing 5 letter words [duplicate]

I have a list like so and I want to be able to search within this list for a substring coming from another string. Example:
List<string> list = new List<string>();
string srch = "There";
list.Add("1234 - Hello");
list.Add("4234 - There");
list.Add("2342 - World");
I want to search for "There" within my list and return "4234 - There". I've tried:
var mySearch = list.FindAll(S => s.substring(srch));
foreach(var temp in mySearch)
{
string result = temp;
}
With Linq, just retrieving the first result:
string result = list.FirstOrDefault(s => s.Contains(srch));
To do this w/o Linq (e.g. for earlier .NET version such as .NET 2.0) you can use List<T>'s FindAll method, which in this case would return all items in the list that contain the search term:
var resultList = list.FindAll(delegate(string s) { return s.Contains(srch); });
To return all th entries:
IEnumerable<string> result = list.Where(s => s.Contains(search));
Only the first one:
string result = list.FirstOrDefault(s => s.Contains(search));
What you've written causes the compile error
The best overloaded method match for 'string.Substring(int)' has some invalid arguments
Substring is used to get part of string using character position and/or length of the resultant string.
for example
srch.Substring(1, 3) returns the string "her"
As other have mentioned you should use Contains which tells you if one string occurs within another. If you wanted to know the actual position you'd use IndexOf
same problem i had to do.
You need this:
myList.Where(listStrinEntry => myString.IndexOf(listStringEntry) != -1)
Where:
myList is List<String> has the values
that myString has to contain at any position
So de facto you search if myString contains any of the entries from the list.
Hope this is what you wanted...
i like to use indexOf or contains
someString.IndexOf("this");
someString.Contains("this");
And for CaseSensitive use:
YourObj yourobj = list.FirstOrDefault(obj => obj.SomeString.ToLower().Contains("some substring"));
OR
YourObj yourobj = list.FirstOrDefault(obj => obj.SomeString.ToUpper().Contains("some substring"));

Make a new object from csv file using LINQ c#

I have a csv file
Date,Open,High,Low,Close,Volume,Adj Close
2011-09-23,24.90,25.15,24.69,25.06,64768100,25.06
2011-09-22,25.30,25.65,24.60,25.06,96278300,25.06
...
and i have a class StockQuote with fields
Date,open,high...
How can i make a list of StockQuote object from csv file using linq?
I m trying something like this:`
stirng[] Data = parser.ReadFields();
var query = from d in Data
where !String.IsNullorWhiteSpace(d)
let data=d.Split(',')
select new StockQuote()
{
Date=data[0], Open=double.Parse(data [ 1 ] ),
...
`
You can do something like this..
var yourData = File.ReadAllLines("yourFile.csv")
.Skip(1)
.Select(x => x.Split(','))
.Select(x => new
{
Date= x[0],
Open = double.Parse(x[1]),
High = double.Parse(x[2]),
Low = double.Parse(x[3]),
Close = double.Parse(x[4]),
Volume = double.Parse(x[5]),
AdjClose = double.Parse(x[6])
});
You should not be using Linq, Regex or the like for CSV parsing. For CSV parsing, use a CSV Parser.
Linq and Regex will work exactly until you run into a escaped control character, multiline fields or something of the sort. Then they will plain break. And propably be unfixable.
Take a look at this question :
Parsing CSV files in C#, with header
The answer mentionning .Net integrated CSV parser seems fine.
And no, you don't need Linq for this.

Find all string parts starting with [ and ends with ] in long string

I have an interesting problem for which I want to find a best solution I have tried my best with regex . What I want is to find all the col_x values from this string using C# using regular expression or any other method.
[col_5] is a central heating boiler manufacturer produce boilers under [col_6]
brand name . Your selected [col_7] model name is a [col_6] [col_15] boiler.
[col_6] [col_15] boiler [col_7] model [col_10] came in production untill
[col_11]. [col_6] model product index number is [col_1] given by SEDBUK
'Seasonal Efficiency of a Domestic Boiler in the UK'. [col_6] model have
qualifier [col_8] and GCN [col_9] 'Boiler Gas Council No'. [col_7] model
source of heat for a boiler combustion is a [col_12].
The output expected is an array
var data =["col_5","col_10","etc..."]
Edit
my attempt :
string text = "[col_1]cc[col_2]asdfsd[col_3]";
var matches = Regex.Matches(text, #"[[^#]*]");
var uniques = matches.Cast<Match>().Select(match => match.Value).ToList().Distinct();
foreach(string m in uniques)
{
Console.WriteLine(m);
}
but no success.
Try something like this:
string[] result = Regex.Matches(input, #"\[(col_\d+)\]").
Cast<Match>().
Select(x => x.Groups[1].Value).
ToArray();
I think that's what you need:
string pattern = #"\[(col_\d+)\]";
MatchCollection matches = Regex.Matches(input, pattern);
string[] results = matches.Cast<Match>().Select(x => x.Groups[1].Value).ToArray();
Replace input with your input string.
I hope it helps
This is a little hacky but you could do this.
var myMessage =#"[col_5] is a central heating boiler..."; //etc.
var values = Enumerable.Range(1, 100)
.Select(x => "[col_" + x + "]")
.Where(x => myMessage.Contains(x))
.ToList();
Assuming there is a known max col_"x" in this case I assumed 100, it just tries them all by brute force returning only the ones that it finds inside the text.
If you know that there are only so many columns to hunt for, I would try this instead of Regex personally as I have had too many bad experiences burning hours on Regex.

displaying sentence using string chunks

here is a program i made to display all the possible strings containing "who" & "your" within an xml file. The xml file contains few sentences like:
how are you,what is your name,what is your school name. The program which i code is displaying the sentences if both "who" and "you" comes one after one. How can i break a string into chunks and then pass them to check through xml.
The code whice i tried is :
var doc = XDocument.Load("dic.xml");
string findString = "what your";
var results = doc.Descendants("s")
.Where(d => d.Value.Contains(findString.ToLower()))
.Select(d => d.Value);
foreach (string result in results)
{
Console.WriteLine(result);
}
Thanks in advance.
You would need to check if each result contains "who" and "your". Your original code was looking for the string "who your" not the two strings "who" and "your". See this link for information on string.Contains(string)
Code
var doc = XDocument.Load("dic.xml");
var results = doc.Descendants("s").Where(d => d.Value.Contains("your") || d.Value.Contains("who")).Select(d => d.Value);
foreach (string result in results)
{
Console.WriteLine(result);
}
Edit: Misread your original code and put the filtering in the wrong spot

Transforming List<string> into a tokenised string

I have a list of strings in a List container class that look like the following:
MainMenuItem|MenuItem|subItemX
..
..
..
..
MainMenuItem|MenuItem|subItem99
What I am trying to do is transform the string, using LINQ, so that the first item for each of the tokenised string is removed.
This is the code I already have:
protected static List<string> _menuItems = GetMenuItemsFromXMLFile();
_menuItems.Where(x => x.Contains(menuItemToSearch)).ToList();
First line of code is returning an entire XML file with all the menu items that exist within an application in a tokenised form;
The second line is saying 'get me all menu items that belong to menuItemToSearch'.
menuItemToSearch is contained in the delimited string that is returned. How do I remove it using linq?
EXAMPLE
Before transform: MainMenuItem|MenuItem|subItem99
After transform : MenuItem|subItem99
Hope the example illustrates my intentions
Thanks
You can take a substring from the first position of the pipe symbol '|' to remove the first item from a string, like this:
var str = "MainMenuItem|MenuItem|subItemX";
var dropFirst = str.Substring(str.IndexOf('|')+1);
Demo.
Apply this to all strings from the list in a LINQ Select to produce the desired result:
var res = _menuItems
.Where(x => x.Contains(menuItemToSearch))
.Select(str => str.Substring(str.IndexOf('|')+1))
.ToList();
Maybe sth like this can help you.
var regex = new Regex("[^\\|]+\\|(.+)");
var list = new List<string>(new string[] { "MainMenuItem|MenuItem|subItem99", "MainMenuItem|MenuItem|subItem99" });
var result = list.Where(p => regex.IsMatch(p)).Select(p => regex.Match(p).Groups[1]).ToList();
This should work correctly.

Categories