Extract substring from string with Regex

Extract substring from string with Regex - c#

Imagine that users are inserting strings in several computers.
On one computer, the pattern in the configuration will extract some characters of that string, lets say position 4 to 5.
On another computer, the extract pattern will return other characters, for instance, last 3 positions of the string.
These configurations (the Regex patterns) are different for each computer, and should be available for change by the administrator, without having to change the source code.
Some examples:
Original_String Return_Value
User1 - abcd78defg123 78
User2 - abcd78defg123 78g1
User3 - mm127788abcd 12
User4 - 123456pp12asd ppsd
Can it be done with Regex?
Thanks.

Why do you want to use regex for this? What is wrong with:
string foo = s.Substring(4,2);
string bar = s.Substring(s.Length-3,3);
(you can wrap those up to do a bit of bounds-checking on the length easily enough)
If you really want, you could wrap it up in a Func<string,string> to put somewhere - not sure I'd bother, though:
Func<string, string> get4and5 = s => s.Substring(4, 2);
Func<string,string> getLast3 = s => s.Substring(s.Length - 3, 3);
string value = "abcd78defg123";
string foo = getLast3(value);
string bar = get4and5(value);

If you really want to use regex:
^...(..)
And:
.*(...)$

To have a regex capture values for further use you typically use (), depending on the regex compiler it might be () or for microsoft MSVC I think it's []
Example
User4 - 123456pp12asd ppsd
is most interesting in that you have here 2 seperate capture areas. Is there some default rule on how to join them together, or would you then want to be able to specify how to make the result?
Perhaps something like
r/......(..)...(..)/\1\2/ for ppsd
r/......(..)...(..)/\2-\1/ for sd-pp
do you want to run a regex to get the captures and handle them yourself, or do you want to run more advanced manipulation commands?

I'm not sure what you are hoping to get by using RegEx. RegEx is used for pattern matching. If you want to extract based on position, just use substring.

It seems to me that Regex really isn't the solution here. To return a section of a string beginning at position pos (starting at 0) and of length length, you simply call the Substring function as such:
string section = str.Substring(pos, length)

Grouping. You could match on /^.{3}(.{2})/ and then look at group $1 for example.
The question is why? Normal string handling i.e. actual substring methods are going to be faster and clearer in intent.

Related

How to localize a string in unity where different languages may have different grammars

I'm translating a Unity game and some of the lines go like
Unlock at XXXX
where "XXXX" is replaced at runtime by an arbitrary substring. Easy enough to replace the wildcards, but to translate the quote, I can't simply concatenate a + b, as some languages will have the value before or inside the string. I figured I needed to, effectively, de-replace it, ie isolate and keep the substring and translate whatever's around it.
Problem is that while I can easily do the second part, I can't think of any avenues for the first. I know to get the character index of what I'm looking for, but the value takes up an arbitrary number of characters, and I can't use whitespace since some languages don't use it. Can't use digit detection since not all of the values are going to be numbers. I tried asking Google, but I couldn't translate "find whatever replaces a wildcard" into something keyword-searchable.
In short, what I'm looking for is a way to find the "XXXX" (the easy part) and then find whatever replaces it in the string (the less-easy part).
Thanks in advance.

I eventually found a workaround, thanks to everybody's kind advice. I stored the substring and referred to it in a special translation method that does take in a value. Thanks for your kind help, everybody.
public static string TranslateWithValue (string text, string value, int language) {
string sauce = text.Replace (value, "XXXX");
sauce = Translate (sauce, language);
sauce = sauce.Replace ("XXXX", value);
return sauce;
}

Usually, I use string.Format in such cases. In your case, I'd declare 2 localizeable strings:
string unlockFormat = "Unlock at {0}";
string unlockValue = "next level";
When you need the unlock condition displayed, you can combine the strings like that:
string unlockCondition = string.Format(unlockFormat, unlockValue);
which will produce the string "Unlock at next level".
Both unlockFormat and unlockValue can be translated, and the translator can move {0} wherever needed.

Build regular expression for replacing duplicated string into single word

I'm working of filtering comments. I'd like to replace string like this:
llllolllllllllllooooooooooooouuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooouuuuuuuuuuuuuuuuudddddddddddddd
with two words: lol loud
string like this:
cuytwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
with: cuytw
And string like this:
hyyuyuyuyuyuyuyuyuyuyuyuyuyu
with: hyu
but not modify strings like look, geek.
Is there any way to achieve this with single regular expression in C#?

I think I can answer this categorically.
This definitely cant be done with RegEx or even standard code due to your input and output requirements without at minimum some sort of dictionary and algorithm to try and reduce doubles in a permutation check for legitimate words.
The result (at best) would give you a list of possible non mutually-exclusive combinations of nonsense words and legitimate words with doubles.
In fact, I'd go as far to say with your current requirements and no extra specificity on rules, your input and output are generically impossible and could only be taken at face value for the cases you have given.

I'm not sure how to use RegEx for this problem, but here is an alternative which is arguably easier to read.*
Assuming you just want to return a string comprising the distinct letters of the input in order, you can use GroupBy:
private static string filterString(string input)
{
var groups = input.GroupBy(c => c);
var output = new string(groups.Select(g => g.Key).ToArray());
return output;
}
Passes:
Returns loud for llllolllllllllllooooooooooooouuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooouuuuuuuuuuuuuuuuudddddddddddddd
Returns cuytw for cuytwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
Returns hyu for hyyuyuyuyuyuyuyuyuyuyuyuyuyu
Failures:
Returns lok for look
Returns gek for geek
* On second read you want to leave words like look and geek alone; this is a partial answer.

c# convert string that has ctrl+z to regular string

i have a string like this:
some_string="A simple demo of SMS text messaging." + Convert.ToChar(26));
what is the SIMPLEST way of me getting rid of the char 26?
please keep in mind that sometimes some_string has char 26 and sometimes it does not, and it can be in different positions too, so i need to know what is the most versatile and easiest way to get rid of char 26?

If it can be in different positions (not just the end):
someString = someString.Replace("\u001A", "");
Note that you have to use the return value of Replace - strings are immutable, so any methods which look like they're changing the contents actually return a new string with the appropriate changes.

If it's only at the end:
some_string.TrimEnd((char)26)
If it can be anywhere then forget this and use Jon Skeet's answer.

Conditional Regex Replace in C# without MatchEvaluator

So, Im trying to make a program to rename some files. For the most part, I want them to look like this,
[Testing]StupidName - 2[720p].mkv
But, I would like to be able to change the format, if so desired. If I use MatchEvaluators, you would have to recompile every time. Thats why I don't want to use the MatchEvaluator.
The problem I have is that I don't know how, or if its possible, to tell Replace that if a group was found, include this string. The only syntax for this I have ever seen was something like (?<group>:data), but I can't get this to work. Well if anyone has an idea, im all for it.
EDIT:
Current Capture Regexes =
^(\[(?<FanSub>[^\]\)\}]+)\])?[. _]*(?<SeriesTitle>[\w. ]*?)[. _]*\-[. _]*(?<EpisodeNumber>\d+)[. _]*(\-[. _]*(?<EpisodeName>[\w. ]*?)[. _]*)?([\[\(\{](?<MiscInfo>[^\]\)\}]*)[\]\)\}][. _]*)*[\w. ]*(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*[Ss](?<SeasonNumber>\d+)[Ee](?<EpisodeNumber>\d+).*?(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*(?<SeasonNumber>\d)(?<EpisodeNumber>\d{2}).*?(?<Extension>\.[a-zA-Z]+)$
Current Replace Regex = [${FanSub}]${SeriesTitle} - ${EpisodeNumber} [${MiscInfo}]${Extension}
Using Regex.Replace, the file TestFile 101.mkv, I get []TestFile - 1[].mkv. What I want to do is make it so that [] is only included if the group FanSub or MiscInfo was found.
I can solve this with a MatchEvaluator because I actually get to compile a function. But this would not be a easy solution for users of the program. The only other idea I have to solve this is to actually make my own Regex.Replace function that accepts special syntax.

It sounds like you want to be able to specify an arbitrary format dynamically rather than hard-code it into your code.
Perhaps one solution is to break your filename parts into specific groups then pass in a replacement pattern that takes advantage of those group names. This would give you the ability to pass in different replacement patterns which return the desired filename structure using the Regex.Replace method.
Since you didn't explain the categories of your filename I came up with some random groups to demonstrate. Here's a quick example:
string input = "Testing StupidName Number2 720p.mkv";
string pattern = #"^(?<Category>\w+)\s+(?<Name>.+?)\s+Number(?<Number>\d+)\s+(?<Resolution>\d+p)(?<Extension>\.mkv)$";
string[] replacePatterns =
{
"[${Category}]${Name} - ${Number}[${Resolution}]${Extension}",
"${Category} - ${Name} - ${Number} - ${Resolution}${Extension}",
"(${Number}) - [${Resolution}] ${Name} [${Category}]${Extension}"
};
foreach (string replacePattern in replacePatterns)
{
Console.WriteLine(Regex.Replace(input, pattern, replacePattern));
}
As shown in the sample, named groups in the pattern, specified as (?<Name>pattern), are referred to in the replacement pattern by ${Name}.
With this approach you would need to know the group names beforehand and pass these in to rearrange the pattern as needed.

Extracting a string starting with x and ending with y

First of all, I did a search on this and was able to find how to use something like String.Split() to extract the string based on a condition. I wasn't able to find however, how to extract it based on an ending condition as well. For example, I have a file with links to images: http://i594.photobucket.com/albums/tt27/34/444.jpghttp://i594.photobucket.com/albums/as/asfd/ghjk6.jpg
You will notice that all the images start with http:// and end with .jpg. However, .jpg is succeeded by http:// without a space, making this a little more difficult.
So basically I'm trying to find a way (Regex?) to extract a string from a string that starts with http:// and ends with .jpg

Regex is the easiest way to do this. If you're not familiar with regular expressions, you might check out Regex Buddy. It's a relatively cheap little tool that I found extremely useful when I was learning. For your particular case, a possible expression is:
(http://.+?\.jpg)
It probably requires some more refinement, as there are boundary cases that could trip this up, but it would work if the file is a simple list.
You can also do free quick testing of expressions here.
Per your latest comment, if you have links to other non-images as well, then you need to make sure it doesn't start at the http:// for one link and read all the way to the .jpg for the next image. Since URLs are not allowed to have whitespace, you can do it like this:
(http://[^\s]+\.jpg)
This basically says, "match a string starting with http:// and ending with .jpg where there is at least one character between the two and none of those characters are whitespace".

Regex RegexObj = new Regex("http://.+?\\.jpg");
Match MatchResults = RegexObj.Match(subject);
while (MatchResults.Success) {
//Do something with it
MatchResults = MatchResults.NextMatch();
}

In your specific case, you could always split if by ".jpg". You will probably end up with one empty element at the end of the array, and have to append the .jpg at the end of each file if you need that. Apart from that I think it would work.
Tested the following code and it worked fine:
public void SplitTest()
{
string test = "http://i594.photobucket.com/albums/tt27/34/444.jpghttp://i594.photobucket.com/albums/as/asfd/ghjk6.jpg";
string[] items = test.Split(new string[] { ".jpg" }, StringSplitOptions.RemoveEmptyEntries);
}
It even get rid of the empty entry...

The following LINQ will separate by http: and make sure to only get values that end with jpg.
var images = from i in imageList.Split(new[] {"http:"},
StringSplitOptions.RemoveEmptyEntries)
where i.EndsWith(".jpg")
select "http:" + i;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract substring from string with Regex - c#

If you really want to use regex: ^...(..) And: .*(...)$

I'm not sure what you are hoping to get by using RegEx. RegEx is used for pattern matching. If you want to extract based on position, just use substring.

It seems to me that Regex really isn't the solution here. To return a section of a string beginning at position pos (starting at 0) and of length length, you simply call the Substring function as such: string section = str.Substring(pos, length)

Grouping. You could match on /^.{3}(.{2})/ and then look at group $1 for example. The question is why? Normal string handling i.e. actual substring methods are going to be faster and clearer in intent.

Related

How to localize a string in unity where different languages may have different grammars

Build regular expression for replacing duplicated string into single word

c# convert string that has ctrl+z to regular string

Conditional Regex Replace in C# without MatchEvaluator

Extracting a string starting with x and ending with y

Categories

Resources