Trim String value with particular pattern in C#.NET

Trim String value with particular pattern in C#.NET - c#

I have a string which is 900-1000 characters long.
the pattern string follows is
"Number:something,somestringNumber:something,somestring"
and so on example string:
"23:value,ordernew14:valueagain,orderagain"
the requirement is whenever it crosses more than 1000 characters, I have to remove first 500 characters. and then if doesnot starts with Number, i have to remove characters until I reach to point where first character is digit
sortinfo = sortinfo.Remove(0, 500);
sortinfo = new string(sortinfo.SkipWhile(c => !char.IsDigit(c)).ToArray());
I am able to do this with the help of above code
In the above example if i give remove 5 characters output will be
14:valueagain,orderagain
which is perfectly fine.
but if the string has value :
23:value,or3dernew14:valueagain,orderagain
and remove 5 characters, output is
3dernew14:valueagain,orderagain
and requirement is to have
14:valueagain,orderagain
and hence its breaking everything as it is not in correct format.
please help me how can I do this
my full code
class Program
{
static void Main(string[] args)
{
string str;
str=TrimSortInfo("23:value,ord4er24:valueag4ain,order6again15:value,order"); // breaking value
//str = TrimSortInfo("23:value,order24:valueagain,orderagain15:value,order"); //working value
Console.WriteLine(str);
Console.ReadLine();
}
static string TrimSortInfo(string sortinfo)
{
if (sortinfo.Length > 15)
{
sortinfo = sortinfo.Remove(0, 15);
sortinfo = new string(sortinfo.SkipWhile(c => !char.IsDigit(c))
.ToArray());
return sortinfo;
}
return sortinfo;
}
}

Using a regex:
static Regex rx = new Regex("(?<=.*?)[0-9]+:.*");
static string TrimSortInfo(string sortinfo, int trimLength = 15)
{
if (sortinfo.Length > trimLength)
{
return rx.Match(sortinfo, trimLength).Value;
}
return sortinfo;
}
Note that there is a big risk here: you could trim "in the middle" of the number.
So you could trim a "xxxxxxxxxxxxxx24:something" to "4:something".
The regex means: looking for a sequence of digits 0-9 (at least one digit) ([0-9]+), followed by a :, followed by all the other characters (.*). Before this sequence there can be any other character, but only the minimum quantity possible (?<=.*?). This pre-sequence isn't captured (?<=...).
In the end the regex can be simplified to:
static Regex rx = new Regex("[0-9]+:.*");
because it is unanchored, so the match will begin at the first occurrence of the match.
To solve this problem:
static Regex rx = new Regex("(?:[^0-9])([0-9]+:.*)");
static string TrimSortInfo(string sortinfo, int trimLength = 15)
{
if (sortinfo.Length > trimLength)
{
return rx.Match(sortinfo, trimLength - 1).Groups[1].Value;
}
return sortinfo;
}
We cheat a little. To trim 15 characters, we skip 14 characters (trimLength - 1) then we capture a non-digit character (that we will ignore (?:[^0-9])) plus the digits and the : and everything else ([0-9]+:.*). Note the use of Groups[1].Value

Related

Find 3 or more whitespaces with regex in C# [duplicate]

This question already has answers here:
Regex to validate string for having three non white-space characters
(2 answers)
Closed 3 years ago.
As said above, I want to find 3 or more whitespaces with regex in C#. Currently I tried:
\s{3,} and [ ]{3,} for Somestreet 155/ EG 47. Both didnt worked out. What did I do wrong?

This \s{3,} matches 3 or more whitespace in a row. You need for example this pattern \s.*\s.*\s to match a string with 3 whitespaces anywhere.
So this would match:
a b c d
a b c
a b
abc d e f
a
a b // ends in 1 space
// just 3 spaces
a // ends in 3 spaces

Linq is an alternative way to count spaces:
string source = "Somestreet 155/ EG 47";
bool result = source
.Where(c => c == ' ') // spaces only
.Skip(2) // skip 2 of them
.Any(); // do we have at least 1 more (i.e. 3d space?)
Edit: If you want not just spaces but whitespaces Where should be
...
.Where(c => char.IsWhiteSpace(c))
...

You could count the whitespace matches:
if (Regex.Matches(yourString, #"\s+").Count >= 3) {...}
The + makes sure that consecutive matches to \s only count once, so "Somestreet 155/ EG 47" has three matches but "Somestreet 155/ EG47" only has two.
If the string is long, then it could take more time than necessary to get all the matches then count them. An alternative is to get one match at a time and bail out early if the required number of matches has been met:
static bool MatchesAtLeast(string s, Regex re, int matchCount)
{
bool success = false;
int startPos = 0;
while (!success)
{
Match m = re.Match(s, startPos);
if (m.Success)
{
matchCount--;
success = (matchCount <= 0);
startPos = m.Index + m.Length;
if (startPos > s.Length - 2) { break; }
}
else { break; }
}
return success;
}
static void Main(string[] args)
{
Regex re = new Regex(#"\s+");
string s = "Somestreet 155/ EG\t47";
Console.WriteLine(MatchesAtLeast(s, re, 3)); // outputs True
Console.ReadLine();
}

Try ^\S*\s\S*\s\S*\s\S*$ instead.
\S matches non-whitespace characters, ^ matches beginnning of a string and $ matches end of a string.
Demo

Regex for textbox to accept only numbers, comma and not 0, decimal or special characters

I am looking for a regex to accept only numbers and comma but not 0,decimal,special characters,negative numbers and white-space between the numbers.It can allow White-space at the start and end of the value.Also It should allow 0 for 10,100,20 but I don't want the textbox to allow a single digit zero.
I have tried multiple options but I couldn't find one that solve my problem.
string testAnswer = textbox.value;
Regex answerRegex = new Regex(#"\s+(?<!-)[1-9][0-9,]*\s+");
if (testAnswer.Contains(","))
{
testAnswer = testAnswer.Replace(",", "");
Response.Write(testAnswer);
}
Match answerMatch = answerRegex.Match(testAnswer.Trim());
if (testAnswer == String.Empty || !answerMatch.Success)
{
valid = false;
answer.CssClass = "error";
}
else
{
answer.CssClass = "comp";
}

I think this will do what you want.
(\s+|^)(?<!-)[1-9][0-9,]*(\s+|$)
EDIT: I think I figured out your problem in your code. You're only checking based on Success property.
You need to check answerMatch.Groups in your code as well.
If you checked answerMatch.Groups[0] when you enter 7+9 string, you will realize that the match only matches 7 and discards the rest (+9) of the string.
Ok, I have tested this more extensively and I'm sure that this now works. I've modified your code a little bit so I can use it to demonstrate my test.
string testAnswer = "7 7+9700 700 007 400 -7 8";
bool valid = true;
string retVal = "";
Regex answerRegex = new Regex(#"(\s+|^)(?<!-)[1-9][0-9,]*(\s+|$)");
if (testAnswer.Contains(","))
{
testAnswer = testAnswer.Replace(",", "");
retVal = testAnswer;
}
MatchCollection answerMatch = answerRegex.Matches(testAnswer.Trim());
if (testAnswer == String.Empty || answerMatch.Count <= 0)
{
valid = false;
retVal = "error";
}
else
{
retVal = "comp";
foreach(Match m in answerMatch) {
Console.WriteLine(m.Groups[0].Value);
}
}
testAnswer holds my test string to be checked.
The output of the test program is as follows:
7
700
400
8
That output proves that it rejects negative numbers as well as special characters within the number string.
As for the regex string (\s+|^)(?<!-)[1-9][0-9,]*(\s+|$) here is the breakdown:
`(\s+|^) matches either the beginning of a string or leading whitespaces.
(?
[1-9][0-9,]* matches the numbers you are interested in including commas.
(\s+|$) matches either trailing whitespaces or end of the string.

If I got you right it should be
Regex answerRegex = new Regex(#"^[1-9][0-9]*$");
The * instead of the + lets pass 1, 10, etc.

How to verify if my text contains a word using regex and C#

I want to verify if my text contains a String that starts with 3 char [a-zA-Z] (CRM) and after this 3 char it contains 9 numbers [0-9]
like this "CRM123456789"

Use anchors in-order to do an exact string match. ^ asserts that we are the start and $ asserts that we are at the end.
^[a-zA-Z]{3}[0-9]{9}$
{num} repeatation quantifier which repeats the previous token according to the number present inside curly braces. So {9} in this [0-9]{9} pattern would repeat the previous token [0-9] exactly 9 times.
DEMO

You don't need regex for that task:
bool valid = input.Length == 12
&& input.StartsWith("CRM")
&& input.Substring(3).All(Char.IsDigit);
If CRM was only an example and all letters are allowed as first three characters:
bool valid = input.Length == 12
&& input.Remove(3).All(Char.IsLetter)
&& input.Substring(3).All(Char.IsDigit);

Simple:
bool valid = Regex.IsMatch(input,#"^[a-zA-Z]{3}[0-9]{9}$");

If your "CRM123456789" appears in some longer text, you'd need to check the boundaries that work for you. In my case, there are often words next to punctuation marks, or spaces. I'd use:
(?<=^|\p{P}|\p{Zs}|\b)[a-zA-Z]{3}[0-9]{9}(?=$|\p{P}|\p{Zs}|\b)
See demo here.

You can do it using linq:
if(input.Length == 12) // check characters are 12
{
if(input.Take(3).All(x=> Char.IsLetter(x)) // First 3 are alphabets
&& input.Skip(3).All(x=>Char.IsDigit(x))) // next all numbers
return true;
else
return false;
}
else
{
return false;
}

using System;
using System.Text.RegularExpressions;
class TestRegularExpressionValidation
{
static void Main()
{
string[] listofinputs =
{
"CRM32323324",
"232dsf12414",
"adfn adfm srf333333333 sdj",
"srf333333333",
"saca dfd444444444r.",
"CRM876969697",
};
string sPattern = "^\\w{3}\\d{9}$";
foreach (string s in listofinputs)
{
System.Console.Write("{0,14}", s);
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern))
{
System.Console.WriteLine(" - valid");
}
else
{
System.Console.WriteLine(" - invalid");
}
}
// Keep the console window open in debug mode.
System.Console.WriteLine("Press any key to exit.");
System.Console.ReadKey();
}
}

Using regex or string manipulation when creating permalinks

I have following method(and looks expensive too) for creating permalinks but it's lacking few stuff that are quite important for nice permalink:
public string createPermalink(string text)
{
text = text.ToLower().TrimStart().TrimEnd();
foreach (char c in text.ToCharArray())
{
if (!char.IsLetterOrDigit(c) && !char.IsWhiteSpace(c))
{
text = text.Replace(c.ToString(), "");
}
if (char.IsWhiteSpace(c))
{
text = text.Replace(c, '-');
}
}
if (text.Length > 200)
{
text = text.Remove(200);
}
return text;
}
Few stuff that it is lacking:
if someone enters text like this:
"My choiches are:foo,bar" would get returned as "my-choices-arefoobar"
and it should be like: "my-choiches-are-foo-bar"
and If someone enters multiple white spaces it would get returned as "---" which is not nice to have in url.
Is there some better way to do this in regex(I really only used it few times)?
UPDATE:
Requirement was:
Any non digit or letter chars at beginning or end are not allowed
Any non digit or letter chars should be replaced by "-"
When replaced with "-" chars should not reapeat like "---"
And finally stripping string at index 200 to ensure it's not too long

Change to
public string createPermalink(string text)
{
text = text.ToLower();
StringBuilder sb = new StringBuilder(text.Length);
// We want to skip the first hyphenable characters and go to the "meat" of the string
bool lastHyphen = true;
// You can enumerate directly a string
foreach (char c in text)
{
if (char.IsLetterOrDigit(c))
{
sb.Append(c);
lastHyphen = false;
}
else if (!lastHyphen)
{
// We use lastHyphen to not put two hyphens consecutively
sb.Append('-');
lastHyphen = true;
}
if (sb.Length == 200)
{
break;
}
}
// Remove the last hyphen
if (sb.Length > 0 && sb[sb.Length - 1] == '-')
{
sb.Length--;
}
return sb.ToString();
}
If you really want to use regexes, you can do something like this (based on the code of Justin)
Regex rgx = new Regex(#"^\W+|\W+$");
Regex rgx2 = new Regex(#"\W+");
return rgx2.Replace(rgx.Replace(text.ToLower(), string.Empty), "-");
The first regex searches for non-word characters (1 or more) at the beginning (^) or at the end of the string ($) and removes them. The second one replaces one or more non-word characters with -.

This should solve the problem that you have explained. Please let me know if it needs any further explanation.
Just as an FYI, the regex makes use of lookarounds to get it done in one run
//This will find any non-character word, lumping them in one group if more than 1
//It will ignore non-character words at the beginning or end of the string
Regex rgx = new Regex(#"(?!\W+$)\W+(?<!^\W+)");
//This will then replace those matches with a -
string result = rgx.Replace(input, "-");
To keep the string from going beyond 200 characters, you will have to use substring. If you do this before the regex, then you will be ok, but if you do it after, then you run the risk of having a trailing dash again, FYI.
example:
myString.Substring(0,200)

I use an iterative approach for this - because in some cases you might want certain characters to be turned into words instead of having them turned into '-' characters - e.g. '&' -> 'and'.
But when you're done you'll also end up with a string that potentially contains multiple '-' - so you have a final regex that collapses all multiple '-' characters into one.
So I would suggest using an ordered list of regexes, and then run them all in order. This code is written to go in a static class that is then exposed as a single extension method for System.String - and is probably best merged into the System namespace.
I've hacked it from code I use, which had extensibility points (e.g. you could pass in a MatchEvaluator on construction of the replacement object for more intelligent replacements; and you could pass in your own IEnumerable of replacements, as the class was public), and therefore it might seem unnecessarily complicated - judging by the other answers I'm guessing everybody will think so (but I have specific requirements for the SEO of the strings that are created).
The list of replacements I use might not be exactly correct for your uses - if not, you can just add more.
private class SEOSymbolReplacement
{
private Regex _rx;
private string _replacementString;
public SEOSymbolReplacement(Regex r, string replacement)
{
//null-checks required.
_rx = r;
_replacementString = replacement;
}
public string Execute(string input)
{
/null-check required
return _rx.Replace(input, _replacementString);
}
}
private static readonly SEOSymbolReplacement[] Replacements = {
new SEOSymbolReplacement(new Regex(#"#", RegexOptions.Compiled), "Sharp"),
new SEOSymbolReplacement(new Regex(#"\+", RegexOptions.Compiled), "Plus"),
new SEOSymbolReplacement(new Regex(#"&", RegexOptions.Compiled), " And "),
new SEOSymbolReplacement(new Regex(#"[|:'\\/,_]", RegexOptions.Compiled), "-"),
new SEOSymbolReplacement(new Regex(#"\s+", RegexOptions.Compiled), "-"),
new SEOSymbolReplacement(new Regex(#"[^\p{L}\d-]",
RegexOptions.IgnoreCase | RegexOptions.Compiled), ""),
new SEOSymbolReplacement(new Regex(#"-{2,}", RegexOptions.Compiled), "-")};
/// <summary>
/// Transforms the string into an SEO-friendly string.
/// </summary>
/// <param name="str"></param>
public static string ToSEOPathString(this string str)
{
if (str == null)
return null;
string toReturn = str;
foreach (var replacement in DefaultReplacements)
{
toReturn = replacement.Execute(toReturn);
}
return toReturn;
}

.Net Removing all the first 0 of a string

I got the following :
01.05.03
I need to convert that to 1.5.3
The problem is I cannot only trim the 0 because if I got :
01.05.10
I need to convert that to 1.5.10
So, what's the better way to solve that problem ? Regex ? If so, any regex example doing that ?

Expanding on the answer of #FrustratedWithFormsDesigner:
string Strip0s(string s)
{
return string.Join<int>(".", from x in s.Split('.') select int.Parse(x));
}

Regex-replace
(?<=^|\.)0+
with the empty string. The regex is:
(?<= # begin positive look-behind (i.e. "a position preceded by")
^|\. # the start of the string or a literal dot †
) # end positive look-behind
0+ # one or more "0" characters
† note that not all regex flavors support variable-length look-behind, but .NET does.
If you expect this kind of input: "00.03.03" and want to to keep the leading zero in this case (like "0.3.3"), use this expression instead:
(?<=^|\.)0+(?=\d)
and again replace with the empty string.
From the comments (thanks Kobi): There is a more concise expression that does not require look-behind and is equivalent to my second suggestion:
\b0+(?=\d)
which is
\b # a word boundary (a position between a word char and a non-word char)
0+ # one or more "0" characters
(?=\d) # positive look-ahead: a position that's followed by a digit
This works because the 0 happens to be a word character, so word boundaries can be used to find the first 0 in a row. It is a more compatible expression, because many regex flavors do not support variable-length look-behind, and some (like JavaScript) no look-behind at all.

You could split the string on ., then trim the leading 0s on the results of the split, then merge them back together.
I don't know of a way to do this in a single operation, but you could write a function that hides this and makes it look like a single operation. ;)
UPDATE:
I didn't even think of the other guy's regex. Yeah, that will probably do it in a single operation.

Here's another way you could do what FrustratedWithFormsDesigner suggests:
string s = "01.05.10";
string s2 = string.Join(
".",
s.Split('.')
.Select(str => str.TrimStart('0'))
.ToArray()
);
This is almost the same as dtb's answer, but doesn't require that the substrings be valid integers (it would also work with, e.g., "000A.007.0HHIMARK").
UPDATE: If you'd want any strings consisting of all 0s in the input string to be output as a single 0, you could use this:
string s2 = string.Join(
".",
s.Split('.')
.Select(str => TrimLeadingZeros(str))
.ToArray()
);
public static string TrimLeadingZeros(string text) {
int number;
if (int.TryParse(text, out number))
return number.ToString();
else
return text.TrimStart('0');
}
Example input/output:
00.00.000A.007.0HHIMARK // input
0.0.A.7.HHIMARK // output

There's also the old-school way which probably has better performance characteristics than most other solutions mentioned. Something like:
static public string NormalizeVersionString(string versionString)
{
if(versionString == null)
throw new NullArgumentException("versionString");
bool insideNumber = false;
StringBuilder sb = new StringBuilder(versionString.Length);
foreach(char c in versionString)
{
if(c == '.')
{
sb.Append('.');
insideNumber = false;
}
else if(c >= '1' && c <= '9')
{
sb.Append(c);
insideNumber = true;
}
else if(c == '0')
{
if(insideNumber)
sb.Append('0');
}
}
return sb.ToString();
}

string s = "01.05.10";
string newS = s.Replace(".0", ".");
newS = newS.StartsWith("0") ? newS.Substring(1, newS.Length - 1) : newS;
Console.WriteLine(newS);
NOTE: You will have to thoroughly check for possible input combination.

This looks like it is a date format, if so I would use Date processing code
DateTime time = DateTime.Parse("01.02.03");
String newFormat = time.ToString("d.M.yy");
or even better
String newFormat = time.ToShortDateString();
which will respect you and your clients culture setting.
If this data is not a date then don't use this :)

I had a similar requirement to parse a string with street adresses, where some of the house numbers had leading zeroes and I needed to remove them while keeping the rest of the text intact, so I slightly edited the accepted answer to meet my requirements, maybe someone finds it useful. Basically doing the same as accepted answer, with the difference that I am checking if the string part can be parsed as an integer, and defaulting to the string value when false;
string Strip0s(string s)
{
int outputValue;
return
string.Join(" ",
from x in s.Split(new[] { ' ' })
select int.TryParse(x, out outputValue) ? outputValue.ToString() : x);
}
Input: "Islands Brygge 34 B 07 TV"
Output: "Islands Brygge 34 B 7 TV"

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Trim String value with particular pattern in C#.NET - c#

Related

Find 3 or more whitespaces with regex in C# [duplicate]

Regex for textbox to accept only numbers, comma and not 0, decimal or special characters

How to verify if my text contains a word using regex and C#

Using regex or string manipulation when creating permalinks

.Net Removing all the first 0 of a string

Categories

Resources