Regular expression does not fit to pattern

Regular expression does not fit to pattern - c#

I want to go through a list of header files, and save the files which those include. My problem is that the pattern does not match.
In this link you can find the pattern which I thought will work: https://regex101.com/r/jbJLxT/3
string rgxPat = "\\#include\\s+\"(?:\\w+\\/)*(\\w +\\.(?:hed|he|hdb|h))\"";
Regex incLRgx = new Regex(rgxPat, RegexOptions.IgnoreCase);
for (int i = alrdyChckd; i < missFiles.Count; i++)
{
tmpStr = baseSBFolder + "\\" + missFiles[i].getPath() + "\\" + missFiles[i].getName();
System.IO.StreamReader actFile = new System.IO.StreamReader(tmpStr);
while((actLine = actFile.ReadLine()) != null)
{
Match match = incLRgx.Match(actLine);
if(match.Success)
{
missFiles.Add(baseSB.getFileByName(match.Groups[1].Value.ToString()));
}
}
alrdyChckd++;
}
I checked the debug varaibles and the match function always give back false return value, while the pattern and the actual line seems to be the same.
Also it's a problem that I cannot add double qoutes as I wanted with the string = #"[pettern]" form because the double queste will close the pattern.

This will give you the paths:
/^\#include\s+"(.+)"$/gm
Output of $1:
FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_cpif.h
EbsPartition/EbsCluster/EbsCluster_generic/EbsCore/Src/ebscore_basetypes.h
FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_types.h
If you want just the filenames then use:
/^\#include\s+".*\/([^\/]+)"$/gm
and $1 will give you:
rsc_cpif.h
ebscore_basetypes.h
rsc_types.h

You can capture the group by name to get the file path:
\#include(?<path>\s+"(?:\w+\/)*(\w+\.(?:hed|he|hdb|h))")
match.Groups["path"].Value.ToString()
This will give you the file path captured as "FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_cpif.h"

Related

How to get all files ending with the extension "_\<fileNum>of\<totalFileNum>" and sometimes without? [duplicate]

a user specifies a file name that can be either in the form "<name>_<fileNum>of<fileNumTotal>" or simply "<name>". I need to somehow extract the "<name>" part from the full file name.
Basically, I am looking for a solution to the method "ExtractName()" in the following example:
string fileName = "example_File"; \\ This var is specified by user
string extractedName = ExtractName(fileName); // Must return "example_File"
fileName = "example_File2_1of5";
extractedName = ExtractName(fileName); // Must return "example_File2"
fileName = "examp_File_3of15";
extractedName = ExtractName(fileName); // Must return "examp_File"
fileName = "example_12of15";
extractedName = ExtractName(fileName); // Must return "example"
Edit: Here's what I've tried so far:
ExtractName(string fullName)
{
return fullName.SubString(0, fullName.LastIndexOf('_'));
}
But this clearly does not work for the case where the full name is just "<name>".
Thanks

This would be easier to parse using Regex, because you don't know how many digits either number will have.
var inputs = new[]
{
"example_File",
"example_File2_1of5",
"examp_File_3of15",
"example_12of15"
};
var pattern = new Regex(#"^(.+)(_\d+of\d+)$");
foreach (var input in inputs)
{
var match = pattern.Match(input);
if (!match.Success)
{
// file doesn't end with "#of#", so use the whole input
Console.WriteLine(input);
}
else
{
// it does end with "#of#", so use the first capture group
Console.WriteLine(match.Groups[1].Value);
}
}
This code returns:
example_File
example_File2
examp_File
example
The Regex pattern has three parts:
^ and $ are anchors to ensure you capture the entire string, not just a subset of characters.
(.+) - match everything, be as greedy as possible.
(_\d+of\d+) - match "_#of#", where "#" can be any number of consecutive digits.

Extract ID and replace everything in `Example HTML`

New to Regular Expressions, I want to have the following text in my HTML and would like to replace with something else
Example HTML:
{{Object id='foo'}}
Extract the id into a variable like this:
string strId = "foo";
So far I have the following Regular Expression code that will capture the Example HTML:
string strStart = "Object";
string strFind = "{{(" + strStart + ".*?)}}";
Regex regExp = new Regex(strFind, RegexOptions.IgnoreCase);
Match matchRegExp = regExp.Match(html);
while (matchRegExp.Success)
{
//At this point, I have this variable:
//{{Object id='foo'}}
//I can find the id='foo' (see below)
//but not sure how to extract 'foo' and use it
string strFindInner = "id='(.*?)'"; //"{{Slider";
Regex regExpInner = new Regex(strFindInner, RegexOptions.IgnoreCase);
Match matchRegExpInner = regExpInner.Match(matchRegExp.Value.ToString());
//Do something with 'foo'
matchRegExp = matchRegExp.NextMatch();
}
I understand this might be a simple solution, I am hoping to gain more knowledge about Regular Expressions but more importantly, I am hoping to receive a suggestion on how to approach this cleaner and more efficiently.
Thank you
Edit:
Is this an example that I could potentially use: c# regex replace

While I am not solving my initial question with Regular Expressions, I did move into a simpler solution using SubString, IndexOf and string.Split for the time being, I understand that my code needs to be cleaned up but thought I would post the answer that I have thus far.
string html = "<p>Start of Example</p>{{Object id='foo'}}<p>End of example</p>"
string strObject = "Slider"; //Example
//When found, this will contain "{{Object id='foo'}}"
string strCode = "";
//ie: "id='foo'"
string strCodeInner = "";
//Tags will be a list, but in this example, only "id='foo'"
string[] tags = { };
//Looking for the following "{{Object "
string strFindStart = "{{" + strObject + " ";
int intFindStart = html.IndexOf(strFindStart);
//Then ending in the following
string strFindEnd = "}}";
int intFindEnd = html.IndexOf(strFindEnd) + strFindEnd.Length;
//Must find both Start and End conditions
if (intFindStart != -1 && intFindEnd != -1)
{
strCode = html.Substring(intFindStart, intFindEnd - intFindStart);
//Remove Start and End
strCodeInner = strCode.Replace(strFindStart, "").Replace(strFindEnd, "");
//Split by spaces, this needs to be improved if more than IDs are to be used
//but for proof of concept this is perfect
tags = strCodeInner.Split(new char[] { ' ' });
}
Dictionary<string, string> dictTags = new Dictionary<string, string>();
foreach (string tag in tags)
{
string[] tagSplit = tag.Split(new char[] { '=' });
dictTags.Add(tagSplit[0], tagSplit[1].Replace("'", "").Replace("\"", ""));
}
//At this point, I can replace "{{Object id='foo'}}" with anything I'd like
//What I don't show is that I go into the website's database,
//get the object (ie: Slider) and return the html for slider with the ID of foo
html = html.Replace(strCode, strView);
/*
"html" variable may contain:
<p>Start of Example</p>
<p id="foo">This is the replacement text</p>
<p>End of example</p>
*/

The fastest way to trim string in C#

I need to trim paths in million strings like this:
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
to
src\my_component\my_file.cpp
I.e. remove absolute part of the path, what is the fastest way to do that?
My try using regex:
Regex.Replace(path, #"(.*?)\src", ""),

I wouldn't go with regex for this, use the plain old method.
If the path prefix is always the same:
const string partToRemove = #"C:\workspace\my_projects\my_app\";
if (path.StartsWith(partToRemove, StringComparison.OrdinalIgnoreCase))
path = path.Substring(partToRemove.Length);
If the prefix is variable, you can get the last index of \src\:
var startIndex = path.LastIndexOf(#"\src\", StringComparison.OrdinalIgnoreCase);
if (startIndex >= 0)
path = path.Substring(startIndex + 1);

define the regex with a new and reuse it
there is a (significant) cost to creating the regex
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

I'm not sure if you need speed here, but if you always get the full path, you could do a simple .Substring()
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
Console.WriteLine(path.Substring(32));
However, I think you should sanitize your input first; in this case, the Uri class could do the parsing step:
var root = #"C:\workspace\my_projects\my_app\";
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
var relative = new Uri(root).MakeRelativeUri(new Uri(path));
Console.WriteLine(relative.OriginalString.Replace("/", "\\"));
Notice here the Uri will change the \ with a /: that's the .Replace reason.

Cant think any faster than this
path.Substring(33);
What is before src is constant. and it starts from index 33.
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
^
How ever if its not always constant. you can find it once. and do the rest inside loop.
int startInd = path.IndexOf(#"\src\") + 1;
// Do this inside loop. 1 million times
path.Substring(startInd);

If your files will all end in "src/filename.ext" you could use the Path class in the .NET framework for it and get around all caveats you could have with pathes and filenames:
result = "src\" + Path.GetFileName(path);
So you should first double-check that the conversion is the thing that takes to long.

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������

One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}

Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.

If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.

Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}

I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);

Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression does not fit to pattern - c#

You can capture the group by name to get the file path: \#include(?<path>\s+"(?:\w+\/)*(\w+\.(?:hed|he|hdb|h))") match.Groups["path"].Value.ToString() This will give you the file path captured as "FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_cpif.h"

Related

How to get all files ending with the extension "_\<fileNum>of\<totalFileNum>" and sometimes without? [duplicate]

Extract ID and replace everything in `Example HTML`

The fastest way to trim string in C#

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

replace a character in a string in c# based on position with a string

Categories

Resources