Find all string parts starting with [ and ends with ] in long string - c#

I have an interesting problem for which I want to find a best solution I have tried my best with regex . What I want is to find all the col_x values from this string using C# using regular expression or any other method.
[col_5] is a central heating boiler manufacturer produce boilers under [col_6]
brand name . Your selected [col_7] model name is a [col_6] [col_15] boiler.
[col_6] [col_15] boiler [col_7] model [col_10] came in production untill
[col_11]. [col_6] model product index number is [col_1] given by SEDBUK
'Seasonal Efficiency of a Domestic Boiler in the UK'. [col_6] model have
qualifier [col_8] and GCN [col_9] 'Boiler Gas Council No'. [col_7] model
source of heat for a boiler combustion is a [col_12].
The output expected is an array
var data =["col_5","col_10","etc..."]
Edit
my attempt :
string text = "[col_1]cc[col_2]asdfsd[col_3]";
var matches = Regex.Matches(text, #"[[^#]*]");
var uniques = matches.Cast<Match>().Select(match => match.Value).ToList().Distinct();
foreach(string m in uniques)
{
Console.WriteLine(m);
}
but no success.

Try something like this:
string[] result = Regex.Matches(input, #"\[(col_\d+)\]").
Cast<Match>().
Select(x => x.Groups[1].Value).
ToArray();

I think that's what you need:
string pattern = #"\[(col_\d+)\]";
MatchCollection matches = Regex.Matches(input, pattern);
string[] results = matches.Cast<Match>().Select(x => x.Groups[1].Value).ToArray();
Replace input with your input string.
I hope it helps

This is a little hacky but you could do this.
var myMessage =#"[col_5] is a central heating boiler..."; //etc.
var values = Enumerable.Range(1, 100)
.Select(x => "[col_" + x + "]")
.Where(x => myMessage.Contains(x))
.ToList();
Assuming there is a known max col_"x" in this case I assumed 100, it just tries them all by brute force returning only the ones that it finds inside the text.
If you know that there are only so many columns to hunt for, I would try this instead of Regex personally as I have had too many bad experiences burning hours on Regex.

Related

Unity Lists Finding 2 letters in a list containing 5 letter words [duplicate]

I have a list like so and I want to be able to search within this list for a substring coming from another string. Example:
List<string> list = new List<string>();
string srch = "There";
list.Add("1234 - Hello");
list.Add("4234 - There");
list.Add("2342 - World");
I want to search for "There" within my list and return "4234 - There". I've tried:
var mySearch = list.FindAll(S => s.substring(srch));
foreach(var temp in mySearch)
{
string result = temp;
}
With Linq, just retrieving the first result:
string result = list.FirstOrDefault(s => s.Contains(srch));
To do this w/o Linq (e.g. for earlier .NET version such as .NET 2.0) you can use List<T>'s FindAll method, which in this case would return all items in the list that contain the search term:
var resultList = list.FindAll(delegate(string s) { return s.Contains(srch); });
To return all th entries:
IEnumerable<string> result = list.Where(s => s.Contains(search));
Only the first one:
string result = list.FirstOrDefault(s => s.Contains(search));
What you've written causes the compile error
The best overloaded method match for 'string.Substring(int)' has some invalid arguments
Substring is used to get part of string using character position and/or length of the resultant string.
for example
srch.Substring(1, 3) returns the string "her"
As other have mentioned you should use Contains which tells you if one string occurs within another. If you wanted to know the actual position you'd use IndexOf
same problem i had to do.
You need this:
myList.Where(listStrinEntry => myString.IndexOf(listStringEntry) != -1)
Where:
myList is List<String> has the values
that myString has to contain at any position
So de facto you search if myString contains any of the entries from the list.
Hope this is what you wanted...
i like to use indexOf or contains
someString.IndexOf("this");
someString.Contains("this");
And for CaseSensitive use:
YourObj yourobj = list.FirstOrDefault(obj => obj.SomeString.ToLower().Contains("some substring"));
OR
YourObj yourobj = list.FirstOrDefault(obj => obj.SomeString.ToUpper().Contains("some substring"));

Check string if it contain any of string array value, then get the substring

This may be a sub-question to this SO Question. I want to check the string against an array of string or list.
Example
string address = "1st nice ave 1st floor";
//For now, I'm getting the list from a text file but could move to use an EF
List<string> streetType = File.ReadLines(AppDomain.CurrentDomain.BaseDirectory + #"streetType.csv")
.Where(x => x.Length > 0)
.Select(y => y.ToLowerInvariant())
.ToArray();
the purpose is to strip the extra address details after the avenue, the csv file contains all USPS accepted street type.
This is what I have now
//this only returns boolean value, I got this from the SO above
streetType.Any(testaddress.ToLower().Contains);
//I also have this
Array.Exists<string>(streetType, (Predicate<string>)delegate (string s)
{
return testaddress.IndexOf(s, StringComparison.OrdinalIgnoreCase) > -1;
});
I've been looking for hours how to resolve this then I came across the SO question which is exactly what I also want but I need to get the substring to for stripping.
If there's a linq query, that would be awesome. The only way I can think of doing this is with foreach and inner if.
Example of the array values
ave
avenue
pkwy
Update:
Here is my answer, I forgot to mention that the array lookup needs to match the exact string from the address string. I ended up using regex. This is the expanded/modified answer of #giladGreen.
var result = from item in streetTypes
let index = Regex.Match(address.ToLowerInvariant(), #"\b" + item.ToLowerInvariant() + #"\b")
where index.Success == true
select address.ToLowerInvariant().Substring(0, index.Index + item.Length);
Can somebody convert this to lambda expression? I tried I failed.
Thank you all
Use IndexOf to understand of item is present in address and if so to return the string after it:
var result = from item in streetType
let index = address.IndexOf(item)
where index != -1
select address.SubString(0, index);
One way to do this would be to simply Split each address on the streetType list, and then take the first item (at index[0]) from the resulting array:
addresses = addresses
.Select(address => address.Split(streetTypes.ToArray(), StringSplitOptions.None)[0])
.ToList();
I might be inclined to do something like this:
string[] markers = "ave avenue pkwy".Split();
string address = "1st nice ave 1st floor";
var result = markers
.Select((marker, index) => new
{
markerIndex = index,
addressPosition = address.IndexOf(marker)
})
.FirstOrDefault(x => x.addressPosition != -1);
// returns { markerIndex = 0, addressPosition = 9 }
Then result is an object that is either null (if the marker is not found) or is an object containing both markerIndex, which tells you which marker was found first, and addressPosition which tells you the character at which the marker string was found.

displaying sentence using string chunks

here is a program i made to display all the possible strings containing "who" & "your" within an xml file. The xml file contains few sentences like:
how are you,what is your name,what is your school name. The program which i code is displaying the sentences if both "who" and "you" comes one after one. How can i break a string into chunks and then pass them to check through xml.
The code whice i tried is :
var doc = XDocument.Load("dic.xml");
string findString = "what your";
var results = doc.Descendants("s")
.Where(d => d.Value.Contains(findString.ToLower()))
.Select(d => d.Value);
foreach (string result in results)
{
Console.WriteLine(result);
}
Thanks in advance.
You would need to check if each result contains "who" and "your". Your original code was looking for the string "who your" not the two strings "who" and "your". See this link for information on string.Contains(string)
Code
var doc = XDocument.Load("dic.xml");
var results = doc.Descendants("s").Where(d => d.Value.Contains("your") || d.Value.Contains("who")).Select(d => d.Value);
foreach (string result in results)
{
Console.WriteLine(result);
}
Edit: Misread your original code and put the filtering in the wrong spot

Process part of the regex match before replacing it

I'm writing a function that will parse a file similar to an XML file from a legacy system.
....
<prod pid="5" cat='gov'>bla bla</prod>
.....
<prod cat='chi'>etc etc</prod>
....
.....
I currently have this code:
buf = Regex.Replace(entry, "<prod(?:.*?)>(.*?)</prod>", "<span class='prod'>$1</span>");
Which was working fine until it was decided that we also wanted to show the categories.
The problem is, categories are optional and I need to run the category abbreviation through a SQL query to retrieve the category's full name.
eg:
SELECT * FROM cats WHERE abbr='gov'
The final output should be:
<span class='prod'>bla bla</span><span class='cat'>Government</span>
Any idea on how I could do this?
Note1: The function is done already (except this part) and working fine.
Note2: Cannot use XML libraries, regex has to be used
Regex.Replace has an overload that takes a MatchEvaluator, which is basically a Func<Match, string>. So, you can dynamically generate a replacement string.
buf = Regex.Replace(entry, #"<prod(?<attr>.*?)>(?<text>.*?)</prod>", match => {
var attrText = match.Groups["attr"].Value;
var text = match.Groups["text"].Value;
// Now, parse your attributes
var attributes = Regex.Matches(#"(?<name>\w+)\s*=\s*(['""])(?<value>.*?)\1")
.Cast<Match>()
.ToDictionary(
m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
string category;
if (attributes.TryGetValue("cat", out category))
{
// Your SQL here etc...
var label = GetLabelForCategory(category)
return String.Format("<span class='prod'>{0}</span><span class='cat'>{1}</span>", WebUtility.HtmlEncode(text), WebUtility.HtmlEncode(label));
}
// Generate the result string
return String.Format("<span class='prod'>{0}</span>", WebUtility.HtmlEncode(text));
});
This should get you started.

String parsing C# creating segments?

I have a string in the form of:
"company=ABCorp, location=New York, revenue=10million, type=informationTechnology"
I want to be able to parse this string out and get "name", "value" pairs in the form of
company = ABCCorp
location= New York etc.
This could be any suitable data structure to store. I was thinking maybe a Dictionary<string, string>() but im open to suggestions.
Is there a suitable way of doing this in C#?
EDIT: My final goal here is to have something like this:
Array[company] = ABCCorp.
Array[location] = New York.
What data structure could we use to achieve the above? MY first thought is a Dictionary but I am not sure if Im missing anything.
thanks
Using String.Split and ToDictionary, you could do:
var original = "company=ABCorp, location=New York, revenue=10million, type=informationTechnology";
var split = original.Split(',').Select(s => s.Trim().Split('='));
Dictionary<string,string> results = split.ToDictionary(s => s[0], s => s[1]);
string s = "company=ABCorp, location=New York, revenue=10million, type=informationTechnology";
var pairs = s.Split(',')
.Select(x => x.Split('='))
.ToDictionary(x => x[0], x => x[1]);
pairs is Dictionary with the key value pair. The only caveat is you will probably want to deal with any white space between the comma and the string.
It depends a lot on the expected syntax. One way to do this is to use String.Split:
http://msdn.microsoft.com/en-us/library/system.string.split(v=vs.110).aspx
First split on comma, then iterate over all items in the string list returned and split those on equality.
However, this requires that comma and equality are never present in the values?
I'm assuming a weak RegEx/LINQ background so here's a way to do it without anything "special".
string text = "company=ABCorp, location=New York, revenue=10million, type=informationTechnology";
string[] pairs = text.Split(',');
Dictionary<string, string> dictData = new Dictionary<string, string>();
foreach (string currPair in pairs)
{
string[] data = currPair.Trim().Split('=');
dictData.Add(data[0], data[1]);
}
This has the requirement that a comma (,) and an equal-sign (=) never exist in the data other than as delimiters.
This relies heavily on String.Split.

Categories