C# substrings: Removing first three characters

C# substrings: Removing first three characters - c#

I have a string which I would like to remove the first three characters from. How do I go about this using substrings, or is there any other way?
string temp = "01_Barnsley"
string textIWant = "Barnsley"
Thank you

You can use String.Substring(Int32) method.
Retrieves a substring from this instance. The substring starts at a
specified character position and continues to the end of the string.
string textIWant = temp.Substring(3);
Here is a demonstration.
As an alternative, you can use String.Remove method(Int32, Int32) method also.
Returns a new string in which a specified number of characters in the
current instance beginning at a specified position have been deleted.
string textIWant = temp.Remove(0, 3);
Here is a demonstration.

you can use String.Substring Method (Int32)
string textIWant = temp.Substring(3);
or
String.Remove Method (Int32, Int32)
string textIWant = temp.Remove(0,3);

If there is a pattern to the data, one can use that to extract out what is needed using Regular Expressions.
So if one knows there are numbers (\d regex for digits and with 1 or more with a +) followed by an under bar; that is the pattern to exclude. Now we tell the parser what we want to capture by saying we want a group match using ( ) notation. Within that sub group we say capture everything by using .+. The period (.) means any character and the + as seen before means 1 or more.
The full match for the whole pattern (not what we want) is grouped as at index zero. We want the first subgroup match at index 1 which is our data.
Console.WriteLine (Regex.Match("01_Barnsley", #"\d+_(.+)").Groups[1].Value); // Barnsley
Console.WriteLine (Regex.Match("9999_Omegaman", #"\d+_(.+)").Groups[1].Value); // Omegaman
Notice how we don't have to worry if its more than two digits? Whereas substring can fail because the number grew, it is not a problem for the regex parser due to the flexibility found in our pattern.
Summary
If there is a distinct pattern to the data, and the data may change, use regex. The minimal learning curve can pay off handsomely. If you truly just need something at a specific point that is unchanging, use substring.

A solution using LINQ:
string temp = "01_Barnsley";
string textIWant = new string(temp.Skip(3).ToArray());

Related

Get Regex.Matches to start the match at Position 0

I am trying to use Regex to count the number of times a certain string appears in another comma-separated string.
I am using Regex.Matches(comma-separated string, certain string).Count to grab the number. The only issue I have is that I want it to simply count as a match if it lines up at the start of the string.
For instance, if I have the comma separated string
string comma_separated = "dog,cat,bird,blackdog,dog(1)";
and want to see how many times the search string matches with the contents of the comma-separated string
string search = "dog";
I use:
int count = Regex.Matches(comma_separated, search).Count;
I would expect it to be 2 since it matches up with
"dog,cat,bird,blackdog,dog(1)",
however it returns a 3 since it is also matching up with the dog part of blackdog.
Is there any way I can get it to only count as a match when it recognizes a match starting at the start of the string? Or am I just using Regex incorrectly?

As noted in the comments, a regex may not be the most logical way for you to achieve your desired result. However, if you would like to use a regex to find your matches, something like this would provide your desired result
(?<=,|^)dog
This will perform a "positive lookbehind" to ensure that the word "dog" is preceded by either a comma or is at the start of the string you are searching.
More info available on lookarounds in Regex here: https://www.regular-expressions.info/lookaround.html

string comma_separated = "dog,cat,bird,blackdog,dog(1)";
int count = Regex.Matches(comma_separated, string.Format(#"\b{0}\b", Regex.Escape("dog")), RegexOptions.IgnoreCase).Count;
By appending the \b to either side of the text you can find the "EXACT" match within the text.

Try using this pattern: search = #"\bdog";. \b matches word boundary.

Using Regex.Split to remove anything non numeric and splitting on -

I'm not sure why but for some reason The Regex Split method is going over my head. I'm trying to look through tutorials for what I need and can't seem to find anything.
I simply am reading an excel doc and want to format a string such as $145,000-$179,999 to give me two strings. 145000 and 179999. At the same time I'd like to prune a string such as '$180,000-Limit to simply 180000.
var loanLimits = Regex.Matches(Result.Rows[row + 2 + i][column].ToString(), #"\d+");
The above code seems to chop '$145,000-$179,999 up into 4 parts: 145, 000, 179, 999. Any ideas on how to achieve what I'm asking?

Regular expressions match exactly character by character (there's no knowledge of the concept of a "number" or a "word" in regular expressions - you have to define that yourself in your expression). The expression you are using, \d+, uses the character class \d, which means any digit 0-9 (and + means match one or more). So in the expression $145,000, notice that the part you are looking for is not just composed of digits; it also includes commas. So the regular expression finds every continuous group of characters that matches your regular expression, which are the four groups of numbers.
There are a couple of ways to approach the problem.
Include , in your regular expression, so (\d|,)+, which means match as many characters in a row that are either a digit or a comma. There will be two matches: 145,000 and 179,999, from which you can further remove the commas with myStr.Replace(",", ""). (DEMO)
Do as you say in the title, and remove all non-numeric characters. So you could use Regex.Replace with the expression [^\d-]+ - which means match anything that is not a digit or a hyphen - and then replace those with "". Then the result would be 145000-179999, which you can split with a simple non-regular-expression split, myStr.Split('-'), to get your two parts. (DEMO)
Note that for your second example ($180,000-Limit), you'll need an extra check to count the number of results returned from Match in the first example, and Split in the second example to determine whether there were two numbers in the range, or only a single number.

you can try to treat each string separately by spiting it based on - and extraction only numbers from it
ArrayList mystrings = new ArrayList();
List<string> myList = Result.Rows[row + 2 + i][column].ToString().Split('-').ToList();
foreach(var item in myList)
{
string result = Regex.Replace(item, #"[^\d]", "");
mystrings.Add(result);
}

An alternative to using RegEx is to use the built in string and char methods in the DotNet framework. Assuming the input string will always have a single hypen:
string input = "$145,000-$179,999";
var split = input.Split( '-' )
.Select( x => string.Join( "", x.Where( char.IsLetterOrDigit ) ) )
.ToList();
string first = split.First(); //145000
string second = split.Last(); //179999
first you split the string using the standard Split method
then you create a new string by selectively taking only Letters or Digits from each item in the collection: x.Where...
then you join the string using the standard Join method
finally, take the first and last item in the collection for your 2 strings.

Get substring from string in C# using Regular Expression

I have a string like:
Brief Exercise 1-1 Types of Businesses Brief Exercise 1-2 Forms of Organization Brief Exercise 1-3 Business Activities.
I want to break above string using regular expression so that it can be like:
Types of Businesses
Forms of Organization
Business Activities.
Please don't say that I can break it using 1-1, 1-2 and 1-3 because it will bring the word "Brief Exercise" in between the sentences. Later on I can have Exercise 1-1 or Problem 1-1 also. So I want some general Regular expression.
Any efficient regular expression for this scenario ?

var regex=new Regex(#"Brief (?:Exercise|Problem) \d+-\d+\s");
var result=string.Join("\n",regex.Split(x).Where(a=>!string.IsNullOrEmpty(a)));
The regex will match "Brief " followed by either "Exercise" or "Problem" (the ?: makes the group non capturing), followed by a space, then 1 or more digits then a "-", then one or more digits then a space.
The second statement uses the split function to split the string into an array and then regex to skip all the empty entries (otherwise the split would include the empty string at the begining, you could use Skip(1) instead of Where(a=>!string.IsNullOrEmpty(a)), and then finally uses string.Join to combine the array back into string with \n as the seperator.
You could use regex.Replace to convert directly to \n but you will end up with a \n at the begining that you would have to strip.
--EDIT---
if the fist number is always 1 and the second number is 1-50ish you could use the following regex to support 0-59
var regex=new Regex(#"Brief (?:Exercise|Problem) 1-\[1-5]?\d\s");

This regular expression will match on "Brief Exercise 1-" followed by a digit and an optional second digit:
#"Brief Exercise 1-\d\d?"
Update:
Since you might have "Problem" as well, an alternation between Exercise and Problem is also needed (using non capturing parenthesis):
#"Brief (?:Exercise|Problem) 1-\d\d?"

Why don't you do it the easy way? I mean, if the regular part is "Brief Exercise #-#" Replace it by some split character and then split the resulting string to obtain what you want.
If you do it otherwise you will always have to take care of special cases.
string pattern = "Brief Exercise \d+-\d+";
Regex reg = new Regex(patter);
string out = regex.replace(yourstring, "|");
string results[] = out.split("|");

Regular expression for numbers in string

The input string "134.45sdfsf" passed to the following statement
System.Text.RegularExpressions.Regex.Match(input, pattern).Success;
returns true for following patterns.
pattern = "[0-9]+"
pattern = "\\d+"
Q1) I am like, what the hell! I am specifying only digits, and not special characters or alphabets. So what is wrong with my pattern, if I were to get false returned value with the above code statement.
Q2) Once I get the right pattern to match just the digits, how do I extract all the numbers in a string?
Lets say for now I just want to get the integers in a string in the format "int.int^int" (for example, "11111.222^3333", In this case, I want extract the strings "11111", "222" and "3333").
Any idea?
Thanks

You are specifying that it contains at least one digit anywhere, not they are all digits. You are looking for the expression ^\d+$. The ^ and $ denote the start and end of the string, respectively. You can read up more on that here.
Use Regex.Split to split by any non-digit strings. For example:
string input = "123&$456";
var isAllDigit = Regex.IsMatch(input, #"^\d+$");
var numbers = Regex.Split(input, #"[^\d]+");

it says that it has found it.
if you want the whole expression to be checked so :
^[0-9]+$

Q1) Both patterns are correct.
Q2) Assuming you are looking for a number pattern "5 digits-dot-3 digits-^-4 digits" - here is what your looking for:
var regex = new Regex("(?<first>[0-9]{5})\.(?<second>[0-9]{3})\^(?<third>[0-9]{4})");
var match = regex.Match("11111.222^3333");
Debug.Print(match.Groups["first"].ToString());
Debug.Print(match.Groups["second"].ToString
Debug.Print(match.Groups["third"].ToString
I prefer named capture groups - they will give a more clear way to acces than

Regular Expression with Groups and Values in C#

I am trying to write a simple regex to convert some two digit years to four digit years in a pipe delimited file. I am using:
Regex dateFormat = new Regex(#"\|(\d\d)/(\d\d)/([\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|$1$220$3|'");
What I want is |10/31/09| to be replaced with |10312009|.
What I am getting is |10$22009|
I think the problem is .NET is evaluating $1 and $3 but is thinking there is a group in the middle with no value ($220 maybe?). How can I let .NET know that the 20 is a constant value instead of part of the group value?
Thanks in advance

Your intuition about the problem is correct: the second backreference is being interpreted as $220, not $2. To fix this, use curly braces:
dateFormat.Replace(contents,#"|$1${2}20$3|'");
More info about .NET regular expressions is available here.

Your regex text doesn't parse. Was the "[" supposed to be there? Wrap the number in {} to fix the replace issue:
Regex dateFormat = new Regex(#"\|(\d\d)/(\d\d)/(\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${1}${2}20${3}|'");

You can modify your Regex to use named groups instead. The syntax for a named group is (?). Then, in your Replace function you can use the group names instead of the group number.
Regex dateFormat = new Regex(#"\|(?<month>\d\d)/(?<day>\d\d)/(?<year>[\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${month}${day}20${year}|'");

I don't know how to do that but here is my workaround. To use named group.
Regex dateFormat = new Regex(#"\|(?<month>\d\d)/(?<date>\d\d)/(?<year>\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${month}${date}20${year}|'");
See more infor at the bottom of this page.
Hope this help.

Try this:
string contents = "|10/31/09|";
Regex dateFormat = new Regex(#"\|(?<mm>\d\d)/(?<dd>\d\d)/(?<yy>\d\d)\|");
Console.WriteLine(dateFormat.Replace(contents, "|${mm}${dd}20${yy}|"));
More information:
Call RegexObj.Replace("subject", "replacement") to perform a search-and-replace using the regex on the subject string, replacing all matches with the replacement string. In the replacement string, you can use $& to insert the entire regex match into the replacement text. You can use $1, $2, $3, etc... to insert the text matched between capturing parentheses into the replacement text. Use $$ to insert a single dollar sign into the replacement text. To replace with the first backreference immediately followed by the digit 9, use ${1}9. If you type $19, and there are less than 19 backreferences, the $19 will be interpreted as literal text, and appear in the result string as such. To insert the text from a named capturing group, use ${name}. Improper use of the $ sign may produce an undesirable result string, but will never cause an exception to be raised.
From http://www.regular-expressions.info/dotnet.html

I see problems with your regular expression, namely the unmatched [ character. The following works fine:
\|(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2})\|
That will group the month, day, and year results. You can then replace with the following string:
|$1/$2/20$3|

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.