What's the most efficient way to format the following string?

What's the most efficient way to format the following string? - c#

I have a very simple question, and I shouldn't be hung up on this, but I am. Haha!
I have a string that I receive in the following format(s):
123
123456-D53
123455-4D
234234-4
123415
The desired output, post formatting, is:
123-455-444
123-455-55
123-455-5
or
123-455
The format is ultimately dependent upon the total number of characters in the original string..
I have several ideas of how to do this, but I keep thing there's a better way than string.Replace and concatenate...
Thanks for the suggestions..
Ian

Tanascius is right but I cant comment or upvote due to my lack of rep but if you want additional info on the string.format Ive found this helpful.
http://blog.stevex.net/string-formatting-in-csharp/

I assume this does not merely rely upon the inputs always being numeric? If so, I'm thinking of something like this
private string ApplyCustomFormat(string input)
{
StringBuilder builder = new StringBuilder(input.Replace("-", ""));
int index = 3;
while (index < builder.Length)
{
builder.Insert(index, "-");
index += 4;
}
return builder.ToString();
}

Here's a method that uses a combination of regular expressions and LINQ to extract groups of three letters at a time and then joins them together again. Note: it assumes that the input has already been validated. The validation can also be done with a regular expression.
string s = "123456-D53";
string[] groups = Regex.Matches(s, #"\w{1,3}")
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
string result = string.Join("-", groups);
Result:
123-456-D53

EDIT: See history for old versions.
You could use char.IsDigit() for finding digits, only.
var output = new StringBuilder();
var digitCount = 0;
foreach( var c in input )
{
if( char.IsDigit( c ) )
{
output.Append( c );
digitCount++;
if( digitCount % 3 == 0 )
{
output.Append( "-" );
}
}
}
// Remove possible last -
return output.ToString().TrimEnd('-');
This code should fill from left to right (now I got it, first read, then code) ...
Sorry, I still can't test this right now.

Not the fastest, but easy on the eyes (ed: to read):
string Normalize(string value)
{
if (String.IsNullOrEmpty(value)) return value;
int appended = 0;
var builder = new StringBuilder(value.Length + value.Length/3);
for (int ii = 0; ii < value.Length; ++ii)
{
if (Char.IsLetterOrDigit(value[ii]))
{
builder.Append(value[ii]);
if ((++appended % 3) == 0) builder.Append('-');
}
}
return builder.ToString().TrimEnd('-');
}
Uses a guess to pre-allocate the StringBuilder's length. This will accept any Alphanumeric input with any amount of junk being added by the user, including excess whitespace.

Related

String small replacement [duplicate]

This question already has answers here:
Fastest way to trim a string and convert it to lower case
(6 answers)
Closed 6 years ago.
I am searching for a simple way to remove underscores from strings and replacing the next character with its upper case letter.
For example:
From: "data" to: "Data"
From: "data_first" to: "DataFirst"
From: "data_first_second" to: "DataFirstSecond"

Who needs more than one line of code?
var output = Regex.Replace(input, "(?:^|_)($|.)", m => m.Groups[1].Value.ToUpper());

This approach is known as a "finite-state machine" that iterates through the string - in that it has a finite set of states ("is the first letter of a word following an underscore" vs "character inside a word"). This represents the minimal instructions needed to perform the task. You can use a Regular Expression for the same effect, but it would generate at least the same number of instructions at runtime. Writing the code out manually guarantees a minimal runtime.
The advantage of this approach is sheer performance: there is no unnecessary allocation of intermediate strings being performed, and it iterates through the input string only once, giving a time complexity of O(n) and a space complexity of O(n). This cannot be improved upon.
public static String ConvertUnderscoreSeparatedStringToPascalCase(String input) {
Boolean isFirstLetter = true;
StringBuilder output = new StringBuilder( input.Length );
foreach(Char c in input) {
if( c == '_' ) {
isFirstLetter = true;
continue;
}
if( isFirstLetter ) {
output.Append( Char.ToUpper( c ) );
isFirstLetter = false;
}
else {
output.Append( c );
}
}
return output.ToString();
}

You can use String.Split and following LINQ query:
IEnumerable<string> newStrings = "data_first_second".Split('_')
.Select(t => new String(t.Select((c, index) => index == 0 ? Char.ToUpper(c) : c).ToArray()));
string result = String.Join("", newStrings);

All other answers valid... for a culture-aware way:
var textInfo = CultureInfo.CurrentCulture.TextInfo;
var modifiedString = textInfo.ToTitleCase(originalString).Replace("_","")
I've made a fiddle: https://dotnetfiddle.net/NAr5PP

I would do something like this:
string test = "data_first_second";
string[] testArray=test.Split('_');
StringBuilder modifiedString = new StringBuilder();
foreach (string t in testArray)
{
modifiedString.Append(t.First().ToString().ToUpper() + t.Substring(1));
}
test=modifiedString.toString();

Use LINQ and Split method like this:
var result = string.Join("",str.Split('_')
.Select(c => c.First().ToString()
.ToUpper() + String.Join("", c.Skip(1))));

Check string for invalid characters? Smartest way?

I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you

You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).

I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}

If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}

If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.

Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");

I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();

How to remove a duplicate set of characters in a string

For example a string contains the following (the string is variable):
http://www.google.comhttp://www.google.com
What would be the most efficient way of removing the duplicate url here - e.g. output would be:
http://www.google.com

I assume that input contains only urls.
string input = "http://www.google.comhttp://www.google.com";
// this will get you distinct URLs but without "http://" at the beginning
IEnumerable<string> distinctAddresses = input
.Split(new[] {"http://"}, StringSplitOptions.RemoveEmptyEntries)
.Distinct();
StringBuilder output = new StringBuilder();
foreach (string distinctAddress in distinctAddresses)
{
// when building the output, insert "http://" before each address so
// that it resembles the original
output.Append("http://");
output.Append(distinctAddress);
}
Console.WriteLine(output);

Efficiency has various definitions: code size, total execution time, CPU usage, space usage, time to write the code, etc. If you want to be "efficient", you should know which one of these you're trying for.
I'd do something like this:
string url = "http://www.google.comhttp://www.google.com";
if (url.Length % 2 == 0)
{
string secondHalf = url.Substring(url.Length / 2);
if (url.StartsWith(secondHalf))
{
url = secondHalf;
}
}
Depending on the kinds of duplicates you need to remove, this may or may not work for you.

collect strings into list and use distinct, if your string has http address you can apply regex http:.+?(?=((http:)|($)) with RegexOptions.SingleLine
var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();

Given you don't know the length of the string, you don't know if something is double and you don't know what is double:
string yourprimarystring = "http://www.google.comhttp://www.google.com";
int firstCharacter;
string temp;
for(int i = 0; i <= yourprimarystring.length; i++)
{
for(int j = 0; j <= yourprimarystring.length; j++)
{
string search = yourprimarystring.substring(i,j);
firstCharacter = yourprimaryString.IndexOf(search);
if(firstCharacter != -1)
{
temp = yourprimarystring.substring(0,firstCharacter) + yourprimarystring.substring(firstCharacter + j - i,yourprimarystring.length)
yourprimarystring = temp;
}
}
This itterates through all your elements, takes all out from first to last letter and searches for them like this:
ABCDA - searches for A finds A exludes A, thats the problem, you need to specify how long the duplication needs to be if you want to make it variable, but maybe my code helps you.

Converting "9954-4740-4491-4414" to "99:54:47:40:44:91:44:14" using Regular Expressions

If the title isn't clear enough, here's a procedural way of approaching the problem:
[TestMethod]
public void Foo()
{
var start = "9954-4740-4491-4414";
var sb = new StringBuilder();
var j = 0;
for (var i = 0 ; i < start.Length; i++)
{
if ( start[i] != '-')
{
if (j == 2)
{
sb.AppendFormat(":{0}", start[i]);
j = 1;
}
else
{
sb.Append(start[i]);
j++;
}
}
}
var end = sb.ToString();
Assert.AreEqual(end, "99:54:47:40:44:91:44:14");
}

If you're using C# 4 all you need is this:
string result = string.Join(":", Regex.Matches(start, #"\d{2}").Cast<Match>());
For C# 3 you need to provide a string[] to Join:
string[] digitPairs = Regex.Matches(start, #"\d{2}")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string result = string.Join(":", digitPairs);

I agree with "why bother with regular expressions?"
string.Join(":", str.Split('-').Select(s => s.Insert(2, ":"));

Regex.Replace version, although I like Mark's answer better:
string res = Regex.Replace(start,
#"(\d{2})(\d{2})-(\d{2})(\d{2})-(\d{2})(\d{2})-(\d{2})(\d{2})",
#"$1:$2:$3:$4:$5:$6:$7:$8");

After a while of experimenting, I've found a way to do it by using a single regular expression that works with input of unlimited length:
Regex.Replace(start, #"(?'group'\d\d)-|(?'group'\d\d)(?!$)", #"$1:")
When using named groups (the (?'name') stuff) with same name, captures are stored in the same group. That way, it is possible to replace distinct matches with same value.
It also makes use of negative lookahead (the (?!) stuff).

You don't need them: strip the '-' characters and then insert a colon between each pair of numbers. Unless I've misunderstood the desired output format.

How can I read input as two different answers

Say I get the following question
Console.WriteLine("Which teams have faced eachother? - use Red vs Blue format");
Then my answer to the question above will have two teams. But how can read them as two seperate?
So that i only read [Red] [Blue], but the "vs" part inbetween as to be there.
I hope my you understood what I am trying to say. My english is not great.
best regards,
ps, as you can tell I am pretty new in programming.
edit: oh and this is all in C#

You can use String.Split():
var answers = userInput.Split(new String[] { "vs" }, StringSplitOptions.RemoveEmptyEntries);
if (answers.Length == 2) {
var red = answers[0];
var blue = answers[1];
}

There are many option you can use Split function to make it array and remove "vs"
or simple use String.Replace("vs","") function to replace the "vs" string with blank value.

You can try using a regular expression:
Match m = Regex.Match("^(?<team1>\.+) vs (?<team2>\.+)$", userInput);
if (m.Success)
{
string team1 = m.Groups["team1"].Value;
string team2 = m.Groups["team2"].Value;
}
Please note that this may not be 100% syntactically correct - you have to refer to IntelliSense a bit - for example, I'm not sure whether the pattern is the first or the second parameter in Match, but I'm sure you get the picture.

U can read all as one string then split with "vs" seperator, then ull get table of 2 strings that u need

Use the String.Split function, as others have suggested. This will split your string into an array of strings. Then, identify which string in the array is the 'vs' string. Take the value of the index prior to 'vs' and after 'vs'. For example:
string input = "Which teams have faced eachother? - use Red vs Blue format";
string[] inputArray = input.Split( ' ' );
int vsLocation = 0;
for ( int i = 0; i < inputArray.Length; i++ ) {
if ( inputArray[i] == "vs" ) {
vsLocation = i;
break;
}
}
if ( vsLocation > 0) {
string team1 = inputArray[vsLocation - 1];
string team2 = inputArray[vsLocation + 1];
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

What's the most efficient way to format the following string? - c#

Tanascius is right but I cant comment or upvote due to my lack of rep but if you want additional info on the string.format Ive found this helpful. http://blog.stevex.net/string-formatting-in-csharp/

Related

String small replacement [duplicate]

Check string for invalid characters? Smartest way?

How to remove a duplicate set of characters in a string

Converting "9954-4740-4491-4414" to "99:54:47:40:44:91:44:14" using Regular Expressions

How can I read input as two different answers

Categories

Resources