Parsing a string to get a specific value

Parsing a string to get a specific value - c#

I'm new to C#. I'm parsing for a lot number in a 2D barcode. The actual lot number 'A2351' is hidden in this barcode string "+M727PP011/$$3201001A2351S". I would like to break this barcode up in separate string blocks but the delimiters are not consistent.
The letter prefix in front of the 4 digit lot number can be a 'A', 'P', or a 'D' There is a single letter following the lot number that can be ignored.
string Delimiter = "/$$3";
//barcode format:M###PP###/$$3 ddmmyy lotnumprefix 'A' followed by lotNum
string lotNum= "+M727PP011/$$3201001A2351S";
string[] split = lotNum.Split(new[] {Delimiter}, StringSplitOptions.None);
How do I extract the lot number after the date?

Based on your initial example and then the subsequent edit in which you showed how you are solving this, it sounds like the lot number is always in the same place. It would be cleaner (and more in line with standard C# code) to use a single call to string.Substring(int,int) rather than the two lines you are using which also require pulling in the VB library. You just need to call Substring and give it the starting index and the length.
So this code:
string lotNum = Strings.Right(barcode, 6);
lotNum = lotNum.Remove((lotNum.Length - 1), 1);
Can be done with this single substring call:
string lotNum = barcode.Substring(barcode.Length - 6, 5);
Edit
Just further clarification on why it might be better to use the call to Substring. In C# string objects are immutable. That means that when you make the call to Strings.Right you are getting back a new string object. When you then call lotNum.Remove you do not "remove" a character from the existing string, a new string is allocated with the character(s) removed and is returned to you. So with your code there are two new string allocations when trying to extract the lot number. When you make the call to Substring you will get back a new string, but instead of getting a new string that you immediately then modify and get a second new string, you will only need to allocate one new string to extract the lot number. In the example you have given there probably would not be any noticeable performance/memory issue, but it is something that could potentially lead to trouble if this code was in a tight loop or something like that.

If you're just trying to get the lot number, it's really dependent on the format of the input string (is it a consistent length, are there any reliable prefixes/suffixes relative to the data you're trying to parse that you can reference from, etc). It looks like your data is definable by its static position in the string, so it looks like you could use the substring
(with an index of 20?) method to accomplish what you want.

Related

Optimizing string manipulation

It is 2019 and we have a banking project which uses mainframe as data store and transactions.
We are using DTO's (Commarea, plain c# class) that is converted to plain string (this is how mainframe works) then sent to Mainframe.
While converting a class to string representation we use several string operations such as substring, pad left, pad right, trim etc.
As you can imagine, this causes several string allocations and hence garbage collection. It is usually at generation 0 but still.
Especially types like Decimal which is a Pack type in mainframe that fits into 8 bytes creates several strings.
I tried using ReadonlySpan<char> for example for substring. See example.
However, there are operations like PadRight, PadLeft which is not avaiable, because it is a read only span.
Update:
To clarify a part of conversion happens as follows:
val.Trim().Substring(5).PadRight(10);
I know that this creates 3 string. I know strings are immutable. My question is about doing the above operation with ReadonlySpan or Memory.
I can not use ReadonlySpan only for substring because as soon as I call ToString method I m losing the benefits.
I have to call ToString all the way at the end.
Is there another construct that supports other operations behind substring, that I can actually add remove data to the memory?
Thanks.

Using ReadOnlySpan can help reduce the number of string allocations in your code, but it won't eliminate them completely. This is because ReadOnlySpan is a read-only view of a sequence of characters, so you cannot modify the underlying data using a ReadOnlySpan.
To avoid unnecessary string allocations, you can use the string.AsSpan() method to get a ReadOnlySpan view of a string, and then use the Span.Slice() method to get substrings without allocating new strings. For example, you could use the following code to get a substring of a string without allocating a new string:
string val = "Hello world";
ReadOnlySpan<char> span = val.AsSpan();
ReadOnlySpan<char> substring = span.Slice(5);
However, as mentioned earlier, you cannot use ReadOnlySpan to modify the underlying data, so you will still need to allocate new strings for operations like PadRight and PadLeft. To avoid these allocations, you can use a StringBuilder to build up the string piece by piece, and then call ToString() on the StringBuilder when you're done. This will allow you to perform string operations without allocating new strings for each operation.
In summary, using ReadOnlySpan can help reduce the number of string allocations in your code, but it won't eliminate them completely. To avoid allocating new strings for each string operation, you can use a StringBuilder to build up the final string piece by piece.
string val = "Hello world";
StringBuilder builder = new StringBuilder(val.Length);
// Trim the string
builder.Append(val.Trim());
// Get a substring starting at the 5th character
builder.Append(val, 5, val.Length - 5);
// Pad the string with spaces to the right, to make it 10 characters long
builder.PadRight(10, ' ');
// Convert the final string to a regular string
string result = builder.ToString();

Most efficient way of adding/removing a character to beginning of string?

I was doing a small 'scalable' C# MVC project, with quite a bit of read/write to a database.
From this, I would need to add/remove the first letter of the input string.
'Removing' the first character is quite easy (using a Substring method) - using something like:
String test = "HHello world";
test = test.Substring(1,test.Length-1);
'Adding' a character efficiently seems to be messy/awkward:
String test = "ello World";
test = "H" + test;
Seeing as this will be done for a lot of records, would this be be the most efficient way of doing these operations?
I am also testing if a string starts with the letter 'T' by using, and adding 'T' if it doesn't by:
String test = "Hello World";
if(test[0]!='T')
{
test = "T" + test;
}
and would like to know if this would be suitable for this

If you have several records and to each of the several records field you need to append a character at the beginning, you can use String.Insert with an index of 0 http://msdn.microsoft.com/it-it/library/system.string.insert(v=vs.110).aspx
string yourString = yourString.Insert( 0, "C" );
This will pretty much do the same of what you wrote in your original post, but since it seems you prefer to use a Method and not an operator...
If you have to append a character several times, to a single string, then you're better using a StringBuilder http://msdn.microsoft.com/it-it/library/system.text.stringbuilder(v=vs.110).aspx

Both are equally efficient I think since both require a new string to be initialized, since string is immutable.
When doing this on the same string multiple times, a StringBuilder might come in handy when adding. That will increase performance over adding.
You could also opt to move this operation to the database side if possible. That might increase performance too.

For removing I would use the remove command as this doesn't require to know the length of the string:
test = test.Remove(0, 1);
You could also treat the string as an array for the Add and use
test = test.Insert(0, "H");
If you are always removing and then adding a character you can treat the string as an array again and just replace the character.
test = (test.ToCharArray()[0] = 'H').ToString();
When doing lots of operations to the same string I would use a StringBuilder though, more expensive to create but faster operations on the string.

custom string format puzzler

We have a requirement to display bank routing/account data that is masked with asterisks, except for the last 4 numbers. It seemed simple enough until I found this in unit testing:
string.Format("{0:****1234}",61101234)
is properly displayed as: "****1234"
but
string.Format("{0:****0052}",16000052)
is incorrectly displayed (due to the zeros??): "****1600005252""
If you use the following in C# it works correctly, but I am unable to use this because DevExpress automatically wraps it with "{0: ... }" when you set the displayformat without the curly brackets:
string.Format("****0052",16000052)
Can anyone think of a way to get this format to work properly inside curly brackets (with the full 8 digit number passed in)?
UPDATE: The string.format above is only a way of testing the problem I am trying to solve. It is not the finished code. I have to pass to DevExpress a string format inside braces in order for the routing number to be formatted correctly.

It's a shame that you haven't included the code which is building the format string. It's very odd to have the format string depend on the data in the way that it looks like you have.
I would not try to do this in a format string; instead, I'd write a method to convert the credit card number into an "obscured" string form, quite possibly just using Substring and string concatenation. For example:
public static string ObscureFirstFourCharacters(string input)
{
// TODO: Argument validation
return "****" + input.Substring(4);
}
(It's not clear what the data type of your credit card number is. If it's a numeric type and you need to convert it to a string first, you need to be careful to end up with a fixed-size string, left-padded with zeroes.)

I think you are looking for something like this:
string.Format("{0:****0000}", 16000052);
But I have not seen that with the * inline like that. Without knowing better I probably would have done:
string.Format("{0}{1}", "****", str.Substring(str.Length-4, 4);
Or even dropping the format call if I knew the length.
These approaches are worthwhile to look through: Mask out part first 12 characters of string with *?
As you are alluding to in the comments, this should also work:
string.Format("{0:****####}", 16000052);
The difference is using the 0's will display a zero if no digit is present, # will not. Should be moot in your situation.
If for some reason you want to print the literal zeros, use this:
string.Format("{0:****\0\052}", 16000052);
But note that this is not doing anything with your input at all.

How do I prevent a string from appearing in a result string when a set of child strings are concatenated to form the result string?

I have 5 strings, let's call them
EarthString
FireString
WindString
WaterString
HeartString
All of them can have varying length, any of them can be empty, or can be very long (but never null).
These 5 strings are very good friends, and every weekend they are concatenated to form a result string using this c# statement
ResultString = EarthString + FireString + WindString + WaterString + HeartString
Depending on the values of these strings, sometimes (only sometimes), ResultString will contain "Captain Planet" as a substring.
My question is, how do I manipulate each of the 5 strings before they are concatenated, so that when they are combined, "Captain Planet" will never appear as a substring in the resultant string?
The only way I can think of right now is to examine each character in each string, in sequential order, but that seems very tedious. Since each of the 5 good friends strings can be of any length, examining the characters individually will also require some kind of concatenation before we can determine whether any character need to be dropped.
Edit: The resultant string is a filtered version of the 5 strings concatenated together, all the other content remain the same except the "Captain Planet" string is dropped. Yes, i'm looking for a solution which allows the 5 strings to be manipulated before concatenation. (this is actually a simplification of a bigger programming problem i'm encountering). Thanks guys.

If you want to do it pre-concat you could
Assign the start and end of each string a numeric value based on the portion of "CaptainPlanet" they contein. Ex: if Air = "net the big captain" then it would get 3 for a start value and 7 for an end value. to determine if you could concat 2 values safely you would just check to see if the end of the left string + start of the right string were not equal to the total length of "CaptainPlanet". If you had very large strings this would allow you to inspect just the first x and last x characters of the string to compute the start/end value.
This solution doesn't account for short strings like ei air = "Cap" , earth ="tain" and fire="Planet". In that case you would need to have a special case for tokens that are shorter than the length of "CaptainPlanet" For those.

Is there a particular reason you can't just do this?
ResultString.Replace("CaptainPlanet", "x");

If it doesn't matter how many chars will be dropped, you can remove f.e. all 'C' in all strings.

The original answer cleared all of the strings, but as pointed out by J.Steen, there was already a formulation of the expected output. So there we go.
Run elementString.Replace("Captain Planet", "") on every substring.
Now you have to identify all the prefixes / suffixes of "Captain Planet" on each of the substrings, and keep that information so that it can be processed before contatenation. That is, e.g. if the substring ends with "Capt", then you should have an information that "substring contains at the end a prefix of the 4 first letters of 'Captain Planet'". You also have to consider the cases of complete substrings (e.g. one of the strings is "ptain Pla"). The problem also becomes more complex if any of the e.g. prefixes can be recursive or repeated (e.g. "CaptainCap" contains 2 kinds of valid prefixes for "CaptainCaptain", and "apt" can be found at two locations in the resulting string);
You process that information before concatenation so that the result string has the same thing as ResultString.Replace("Captain Planet", ""). Congratulations, you have made your program much more complex than necessary!
But in short, you cannot get both the result that you want (all of the substrings intact except for the combined result output) and do the processing wholly before the concatenation step.

Replacing part of text in richtextbox

I need to compare a value in a string to what user typed in a richtextbox.
For example: if a richtextbox holds string rtbText = "aaaka" and I compare this to another variable string comparable = "ka"(I want it to compare backwards). I want the last 2 letters from rtbText (comparable has only 2 letters) to be replaced with something that was predetermined(doesn't really matter what).
So rtbText should look like this:
rtbText = "aaa(something)"
This doesn't really have to be compared it can just count letters in comparable and based on that it can remove 2 letters from rtbText and replace them with something else.
UPDATE:
Here is what I have:
int coLen = comparable.Length;
comparable = null;
TextPointer caretBack = rtb.CaretPosition.GetPositionAtOffset(coLen, LogicalDirection.Backward);
TextRange rtbText = new TextRange(rtb.CaretPosition, caretBack);
string text = rtbText.Text;
rtbText returns an empty string or I get an error for everything longer than 3 characters. What am I doing wrong?
Let me elaborate it a little bit further. I have a listbox that holds replacements for values that user types in rtb. The values(replacements) are coming from there, meaning that I don't really need to go through the whole text to check values. I just need to check the values right before caret. I am comparing these values to what I have stored in another variable (comparable).
Please let me know if you don't understand something.
I did my best to explain what needs to be done.
Thank you

You could use Regex.Replace.
// this replaces all occurances of "ka" with "Replacement"
Regex replace = new Regex("ka");
string result = replace.Replace("aaaka","Replacemenet");

gumenimeda, I had similar problems few weeks ago. I found my self doing the following (I asume you will have more than one occurance in the RichTextBox that you will need to change), note that I did it for Windows Forms where I have access directly to the Rtf text of the control, not quite sure if it will work well in your scenario:
I find all the occurancies of the string (using IndexOf for example) and store them in a List for example.
Sort the list in descending order (max index goes first, the one before him second, etc)
Start replacing the occurancies directly in the RichTextBox, by removing the characters I don't need and appending the characters I need.
The sorting in step 2 is necessary as we always want to start from the last occurance going up to the first. Starting from the first occurance or any other and going down will have an unpleasant surprise - if the length of the chunk you want to remove and the length of the chunk you want to append are different in length, the string will be modified and all other occurancies will be invalid (for example if the second occurance was in at position 12 and your new string is 2 characters longer than the original, it will become 14th). This is not an issue if we go from the last to the first occurance as the change in string will not affect the next occurance in the list).
Ofcourse I can not be sure that this is the fastest way that can be used to achieve the desired result. It's just what I came up with and what worked for me.
Good luck!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.