Regex to get multiple filters - c#

I am new to Regex and trying to find all the files with .cs,.json etc.
But, I am just getting only 1 file extension i.e. only 1 filter value.
Code :
string ext = "json|cs|xml";
Regex RegEx = new Regex(#"<(Compile|Content|None) Include=\""([^\""]+." + ext + #")\""( /)?>",RegexOptions.IgnoreCase);
Match match = RegEx.Match(line); //Only takes json, does not take cs or xml
So, here it matches only json file.
Can anyone help me with this regex.

Short answer: you need to add bracket around where you include your ext variable, as then the parser knows to match any of those options.
Currently what you have is going to match any character not including a double quote and the string json OR the string cs OR xml. By adding the extra brackets (as below) you tell the parser to match any character not including the double quote and any of the extension you provide.
Replace
<(Compile|Content|None) Include=\""([^\""]+." + ext + #")\""( /)?>
with
<(Compile|Content|None) Include=\""([^\""]+.(" + ext + #"))\""( /)?>
PS. I find Expresso very useful in debugging Regular Expressions. Not affiliated, just been using for quite a number of years.

Related

Regex for Information Extraction from CS File

Below is a snapshot of lines from my CS file from C# code and I'm trying to extract fields mandatory or supported fields from my class file.
1) Is there a way for me to dynamically load the cs file into the .NET application and extract the information out, starting just by loading cs file from file path?
2) Following to the question above, I'm currently resorting to extract information out thru Regex.
First Regex - (m_oSupportedFields.).+?(?=EnumSupported.Mandatory;|EnumSupported.Supported)
and result as below :-
Second Regex - (..+)\=
and result as below :-
What I'm trying to achieve is to extract Persona.Forename, Personal.Surname and other fields by a Regex (one Regex for EnumSupported.Mandatory, and one for EnumSupported.Supported).
Also, I'm trying to cater for malformed line such as
m_oSupportedFields.Personal.DOB.Day.Supported=EnumSupported.Supported;
(Note the space between the equal sign)
or
m_oSupportedFields.Personal.DOB.Day.Supported = EnumSupported.Supported;
(Note the double space between)
or even
m_oSupportedFields.Personal.Surname.Supported =
EnumSupported.Mandatory;
(Note the Enum is on second line)
Please advice on how should I compile the Regex for such situation.
Thanks.
UPDATED in TEXTUAL VERSION
m_oSupportedFields.Personal.Surname.Supported = EnumSupported.Mandatory;
m_oSupportedFields.Personal.Forename.Supported = EnumSupported.Mandatory;
m_oSupportedFields.Personal.MiddleName.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Day.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Month.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Year.Supported = EnumSupported.Supported;
So from each line, you want to extract the part after m_oSupportedFields. and before .Supported =, as well as the part after the =. And you want to ignore only blank spaces before the =, but any whitespace after the =.
Your regular expression will be: ^m_oSupportedFields\.([\w\.]+)\.Supported *=\s*(EnumSupported\.\w+);
Omit the ^ if you don't want to require that the string start at the beginning of a line.
Using C#, you can access the match groups like this:
using System.Text.RegularExpressions;
string regex = #"^m_oSupportedFields\.([\w\.]+)\.Supported *=\s*(EnumSupported\.\w+);";
string input = #"m_oSupportedFields.Personal.DOB.Day.Supported=EnumSupported.Supported";
foreach (Match m in Regex.Matches(input, regex))
{
Console.WriteLine(m.Captures[0].ToString());
Console.WriteLine(m.Captures[1].ToString());
}
// Console:
// Personal.DOB.Day
// EnumSupported.Supported
1) Is there a way for me to dynamically load the cs file into the .NET application and extract the information out, starting just by loading cs file from file path?
Possibly, there is the .Net Compiler as a Service which is now used by VS2015 (Overview). Look into creating a Stand-Alone Code Analysis Tool.
extract Persona.Forename, Personal.Surname and other fields by a Regex (one Regex for EnumSupported.Mandatory, and one for EnumSupported.Supported).
To create a pattern, one can be very general or one can be very specific on what needs to be captured. As one makes the pattern to be more general, the pattern complexity increases along with the supporting code to extract the data.
Capture into Enumerable Dynamic Entities
This is a specific pattern that takes the results and places them into Linq set of dynamic entities. ** Note that it handles the possible line split**
string data = #"
m_oSupportedFields.Personal.Surname.Supported =
EnumSupported.Mandatory;
m_oSupportedFields.Personal.Forename.Supported=EnumSupported.Mandatory;
m_oSupportedFields.Personal.MiddleName.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Day.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Month.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Year.Supported = EnumSupported.Supported;
";
string pattern = #"
Personal\. # Anchor for match
(?<Full> # Grouping for Or condition
(?<Name>[^.]+) # Just the name
| # Or
(?<Combined>[^.]+\.[^.]+) # Name/subname
) # End Or Grouping
(?=\.Supported) # Look ahead to anchor to Supported (does not capture)
\.Supported
\s*= # Possible whitespace and =
[\s\r\n]*EnumSupported\.
(?<SupportType>Mandatory|Supported) # Capture support type";
// Ignore Pattern whitespace allows us to comment the pattern instead of having
// it on oneline. It does not affect regex pattern processing in anyway.
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
FullName = mt.Groups["Full"].Value,
IsName = mt.Groups["Name"].Success,
IsCombined = mt.Groups["Combined"].Success,
Type = mt.Groups["SupportType"].Value
})
The results look like this:
Note that it can determine if the name extracted is from a single like (ForeName) or double from (DOB.Day) and captures either into the named capture "FullName" with the "Name" and "Combined" capturings used as "Is-As" booleans.

C# .NET Regex remove all quotes of quotes excluding one instance in a sentance

I have description field which is:
16" Alloy Upgrade
In CSV format it appears like this:
"16"" Alloy Upgrade "
What would be the best use of regex to maintain the original format? As I'm learning I would appreciate it being broke down for my understanding.
I'm already using Regex to split some text separating 2 fields which are: code, description. I'm using this:
,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))
My thoughts are to remove the quotes, then remove the delimiter excluding use in sentences.
Thanks in advance.
If you don't want to/can't use a standard CSV parser (which I'd recommend), you can strip all non-doubled quotes using a regex like this:
Regex.Replace(text, #"(?!="")""(?!"")",string.Empty)
That regex will match every " character not preceded or followed by another ".
I wouldn't use regex since they are usually confusing and totally unclear what they do (like the one in your question for example). Instead this method should do the trick:
public string CleanField(string input)
{
if (input.StartsWith("\"") && input.EndsWith("\""))
{
string output = input.Substring(1,input.Length-2);
output = output.Replace("\"\"","\"");
return output;
}
else
{
//If it doesn't start and end with quotes then it doesn't look like its been escaped so just hand it back
return input;
}
}
It may need tweaking but in essence it checks if the string starts and ends with a quote (which it should if it is an escaped field) and then if so takes the inside part (with the substring) and then replaces double quotes with single quotes. The code is a bit ugly due to all the escaping but there is no avoiding that.
The nice thing is this can be used easily with a bit of Linq to take an existing array and convert it.
processedFieldArray = inputfieldArray.Select(CleanField).ToArray();
I'm using arrays here purely because your linked page seems to use them where you are wanting this solution.

Return RegExp C# with linebreak

I’m having a problem with Regular Expressions in C#.
What I have is a string representing a page (HTML etc.). The string also contains \r\n, \r and \n in different places, now I’m trying to match something in the string:
Match currentMatch = Regex.Match(contents, "Title: <strong>(.*?)</strong>");
string org = currentMatch.Groups[1].ToString();
This works fine, however, when I want to match something that has any of the characters mentioned earlier (line breaks) in the string, it doesn’t return anything (empty, no match):
Match currentMatch = Regex.Match(contents, "Description: <p>(.*?)</p>");
string org = currentMatch.Groups[1].ToString();
It does however work if I add the following lines above the match:
contents = contents.Replace("\r", " ");
contents = contents.Replace("\n", " ");
I however don’t like that its modify the source, what can I do about this?
The . does not match newline characters by default. You can change this, by using the Regex Option Singleline. This treats the whole input string as one line, i.e. the dot matches also newline characters.
Match currentMatch = Regex.Match(contents, "Title: <strong>(.*?)</strong>", RegexOptions.Singleline);
By the way, I hope you are aware that regex is normally not the way to deal with Html?

RegEx : (double)quoted strings

I'm using c# RegEx to search quoted strings in a script text.
I use this expression : new Regex("\"((?:\\\\.|[^\\\\\"]*)*)\""),
e.g "((?:\\.|[^\\\"]*)*)"
meanings to not take care of \" cases
This makes RegEx.Matches runs and never stops for some input strings.
Never mind this problem with .Net RegEx, I know my expression is not the best one.
Before, I used (?<!\\)".*?(?<!\\)" expression but it is not enough for "\\" input string.
The aim is to detect quoted strings before I analyze script codes.
Any one would suggest a good expression ?
It has to work for :
echo("Hello" + yourName + ", here is \"MyTest\"");
path = "\\" + file;
echo("path ends with \\");
(beware, \ are strangely edited with this site)
Thanks a lot.
Usually it is matched using
"((?:[^\\"]|\\.)*)"
See http://www.ideone.com/JiJwa.

Using RegEx to replace invalid characters

I have a directory with lots of folders, sub-folder and all with files in them. The idea of my project is to recurse through the entire directory, gather up all the names of the files and replace invalid characters (invalid for a SharePoint migration).
However, I'm completely unfamiliar with Regular Expressions. The characters i need to get rid in filenames are: ~, #, %, &, *, { } , \, /, :, <>, ?, -, | and ""
I want to replace these characters with a blank space. I was hoping to use a string.replace() method to look through all these file names and do the replacement.
So far, the only code I've gotten to is the recursion. I was thinking of the recursion scanning the drive, fetching the names of these files and putting them in a List<string>.
Can anybody help me with how to find/replace invalid chars with RegEx with those specific characters?
string pattern = "[\\~#%&*{}/:<>?|\"-]";
string replacement = " ";
Regex regEx = new Regex(pattern);
string sanitized = Regex.Replace(regEx.Replace(input, replacement), #"\s+", " ");
This will replace runs of whitespace with a single space as well.
is there a way to get rid of extra spaces?
Try something like this:
string pattern = " *[\\~#%&*{}/:<>?|\"-]+ *";
string replacement = " ";
Regex regEx = new Regex(pattern);
string sanitized = regEx.Replace(input, replacement);
Consider learning a bit about regular expressions yourself, as it's also very useful in developing (e.g. search/replace in Visual Studio).

Categories