retain the newline in a regex Match, c# - c#

So, i've created the following regex which captures everything i need from my string:
const string tag = ":59";
var result = Regex.Split(message, String.Format(":{0}[^:]?:*[^:]*", tag),RegexOptions.Multiline);
the string follows this patter:
:59A:/sometext\n
somemore text\n
:71A:somemore text
I'm trying to capture everything in between :59A: and :71A: - this isn't fixed in stone though, as :71A: could be something else. hence, why i was using [^:]
EDIT
So, just to be clear on my requirements. I have a file(string) which is passed into a C# method, which should return only those values specified in the parameter tag. For instance, if the file(string) contains the following tags:
:20:
:21:
:59A:
:71A:
and i pass in 59 then i only need to return everything in between the start of tag :59A: and the start of the next tag, which in this instance is :71A:, but could be something else.

You can use the following code to match what you need:
string input = ":59A:/sometext\nsomemore text\n:71A:somemore text";
string pattern = "(?<=:[^:]+:)[^:]+\n";
var m = Regex.Match(input, pattern, RegexOptions.Singleline).Value;
If you want to use your tag constant, you can use this code
const string tag = ":59";
string input = ":59A:/sometext\nsomemore text\n:71A:somemore text";
string pattern = String.Format("(?<={0}[^:]*:)[^:]+\n", tag);
var m = Regex.Match(input, pattern, RegexOptions.Singleline).Value;

Related

trim string before character but still keep the remain part after it

So I have this string which I have to trim and manipulate a little with it.
My string example:
string test = "studentName_123.pdf";
Now, what I want to do is somehow extract only the _123 part and at the end I need to have studentName.pdf
What I have tried:
string test_extracted = test.Substring(0, test.LastIndexOf("_") )+".pdf";
This also works but the thing is that I don't want to add the ".pdf" suffix at the end of the string manually because I can have strings that are not pdf, for ex. studentName.docx , studentName.png.
So basically I just want the "_123" part removed but still keep the remain part after that.
I think this might help you:
string test = "studentName_123.pdf";
string test_extracted = test.Substring(0, test.LastIndexOf("_") )+ test.Substring(test.LastIndexOf("."),test.Length - test.LastIndexOf(".") );
Using Remove(int startIndex, int count):
string test = "studentName_123.pdf";
string test_extracted = test.Remove(test.LastIndexOf("_"), test.LastIndexOf(".") - test.LastIndexOf("_"));
Sounds like you mean something like this?
string extension = Path.GetExtension(test);
string pdfName = Path.GetFileNameWithoutExtension(test).Split('_')[0];
string fullName = pdfName + extension;
Since you know what value you will always be replacing in your strings, "_123", to base on your example, just utilize the replace method and replace it with nothing since the method expects two arguments;
string test_extracted = test.replace('_123', '');
This could be solved with a regular expression like this
(\w*)_.*(\.\w*) where the first capture group (\w*) matches everything before the underscore and the second group (\.\w*) matches the file extensions.
Lastly we just have to concat the groups without the stuff inbetween like so:
string test = "studentName_123.pdf";
var regex = Regex.Match(test, #"(\w*)_.*(\.\w*)");
string newString = regex.Groups[1].Value + regex.Groups[2].Value;

Identify the string that does not exists in another string using regex and C#

I am trying to capture a string that does not contains in another string.
string searchedString = " This is my search string";
string subsetofSearchedString = "This is my";
My output should be "Search string". I would like to go with only regex so that I can handle complex strings.
The below is the code that I have tried so far and I am not successful.
Match match = new Regex(subsetofSearchedString ).Match(searchedString );
if (!string.IsNullOrWhiteSpace(match.Value))
{
UnmatchedString= UnmatchedString.Replace(match.Value, string.Empty);
}
Update : The above code is not working for the below texts.
text1 = 'Property Damage (2015 ACURA)' Exposure Added Automatically for IP:Claimant DriverLoss Reserve Line :Property DamageReserve Amount $ : STATIP Role(s): Owner, DriverExposure Owner :Jaimee Watson_csr Author:
text2 = 'Property Damage (2015 ACURA)' Exposure Added Automatically for IP:Claimant DriverLoss Reserve Line :Property DamageReserve Amount $ : STATIP Role(s): Owner, Driver
Match match = new Regex(text2).Match(text1);
You can use Regex.Split:
var ans = Regex.Split(searchedString, subsetofSearchedString);
If you want the answer as a single string minus the subset, you can join it:
var ansjoined = String.Join("", ans);
Replacing with String.Empty will also work:
var ans = Regex.Replace(searchedString, subsetOfSearchedString, String.Empty);
Answer :
Regex wasn't working for me because of the presence of metacharacters in my string. Regex.Escape did not help me with the comparison.
String Contains worked like a charm here
if (text1.Contains(text2))
{
status = TestResult.Pass;
text1= text1.Replace(text2, string.Empty);
}

How to strip a string from the point a hyphen is found within the string C#

I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;

Replace string with regular expression and my own parameter

In my html I've serval token like this:
{PROP_1_1}, {PROP_1_2}, {PROP_37871_1} ...
Actually I replace that token with the following code:
htmlBuffer = htmlBuffer.Replace("{PROP_" + prop.PropertyID + "_1}", prop.PropertyDefaultHtml);
where prop is a custom object. But in this case it affects only the tokens ending with '_1'. I would like to propagate this logic to all the rest ending up with '_X' where X is numeric.
How could I implement a regexp pattern to achieve this?
You can use Regex.Replace():
Regex rgx = new Regex("{PROP_" + prop.PropertyID + "_\d+}");
htmlBuffer = rgx.Replace(htmlBuffer, prop.PropertyDefaultHtml);
You can do even better, you can catch both identifiers in a regular expression. That way you can loop through the references that exist in the string and get the properties for those, instead of looping through all the properties that you have and check if there is any reference for them in the string.
Example:
htmlBuffer = Regex.Replace(htmlBuffer, #"{PROP_(\d+)_(\d+)}", m => {
int id = Int32.Parse(m.Groups[1].Value);
int suffix = Int32.Parse(m.Groups[2].Value);
return properties[id].GetValue(suffix);
});

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.
If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.
Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}
I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);
Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

Categories