C# Regex.Replace (or String.Replace) only partially works - c#

I run a repeated Regex.Replace over a string, replacing certain "variables" with their "values". Thing is, some get replaced and some don't!
I have to analyze certain batch files (IBM JCL batch language, to be precise) and search them for JCL variables (rules: JCLvariable starts with "&" and ends with space; ","; "." or other variable start, that being "&"). My functions is supposed to take the string with variables and array of variables-and-their-values as an input; then search the string and replace JCL variables with their values. So is I run a forcycle and for each value-variable struct in array, I run Regex.Replace (in order to prevent the "&TOSP." being misplaced for "&TO." and adhere to JCL var rules, see above):
private string ReplaceDSNVarsWithValues(string _DSN,JCLvar[] VarsAndValues)
{
//FIXME: nefunguje pro TIPfile a nebere všechny &var
for(int Fa=0;Fa<VarsAndValues.Length/2;++Fa)
{
_DSN = Regex.Replace(_DSN, "&"+VarsAndValues[Fa].JCLvariable+"[^A-Za-z0-9]", VarsAndValues[Fa].JCLvalue);
}
return _DSN;
}
Eg. I have this as a string to replace:
string _DSN = "&TOSP..COPY.&SYSTEM..SP&APL..BVSIN.SAVEC.D&MES.&DEN..V&VER.K99";
And then I have an array of struct containing couples of variable and value, eg.
JCLvar[1].variable = "APL",JCLvar[1].value = "PROD"
Combine that and it should result in the "SP&APL." part changing to "SPPROD".
The problem is, only SOME of the variables get replaced:
&TOSP..COPY.&SYSTEM..SP&APL..BVSIN.SAVEC.D&MES.&DEN..V&VER.K99 gets changed to SP.COPY.DBA0.SPPROD.BVSIN.SAVEC.D&MESDENV&VER.K99 as it should (disregard &MES,&DEN - these are not filled in the ValsAnd Values array and therefore don't get replaced), but in
&TO..#ZDSK99.PODVYP.M&MES.U&DEN..SUC.RES, the "&TO." doesn't get changed at all - although it exists in the array and via debugging, I see that it is being passed to the regex /but it doesn't get changed/.
How the heck it comes SOME variables get replaced and others don't?
In the array VarsAndValues, order of variables matters, because if "TOSP" is first, it gets replaced and "&TO" does not, while if "TO" is first, it gets replaced and "&TOSP" doesn't; therefore, I got suspicion that Regex.Replace somehow fails to do repeated replace on similar expressions/variables in the same string OR fails to recognize the variable/expression to be replaced - but I see no reason for the first possibility and the second one is impossible, as the replaced expressions clearly stay there.
//Note - I know it's certainly not nice coding, but it's more a single-purpose script I wrote to save me weeks of manual work than anything else

I don't see anything wrong with your regex. But why are you iterating over only half of VarsAndValues?
for(int Fa=0;Fa<VarsAndValues.Length/2;++Fa)
tells me you're stopping halfway through the array, so if TOSP happens to fall in the second half, it won't be replaced.

Related

How to localize a string in unity where different languages may have different grammars

I'm translating a Unity game and some of the lines go like
Unlock at XXXX
where "XXXX" is replaced at runtime by an arbitrary substring. Easy enough to replace the wildcards, but to translate the quote, I can't simply concatenate a + b, as some languages will have the value before or inside the string. I figured I needed to, effectively, de-replace it, ie isolate and keep the substring and translate whatever's around it.
Problem is that while I can easily do the second part, I can't think of any avenues for the first. I know to get the character index of what I'm looking for, but the value takes up an arbitrary number of characters, and I can't use whitespace since some languages don't use it. Can't use digit detection since not all of the values are going to be numbers. I tried asking Google, but I couldn't translate "find whatever replaces a wildcard" into something keyword-searchable.
In short, what I'm looking for is a way to find the "XXXX" (the easy part) and then find whatever replaces it in the string (the less-easy part).
Thanks in advance.
I eventually found a workaround, thanks to everybody's kind advice. I stored the substring and referred to it in a special translation method that does take in a value. Thanks for your kind help, everybody.
public static string TranslateWithValue (string text, string value, int language) {
string sauce = text.Replace (value, "XXXX");
sauce = Translate (sauce, language);
sauce = sauce.Replace ("XXXX", value);
return sauce;
}
Usually, I use string.Format in such cases. In your case, I'd declare 2 localizeable strings:
string unlockFormat = "Unlock at {0}";
string unlockValue = "next level";
When you need the unlock condition displayed, you can combine the strings like that:
string unlockCondition = string.Format(unlockFormat, unlockValue);
which will produce the string "Unlock at next level".
Both unlockFormat and unlockValue can be translated, and the translator can move {0} wherever needed.

Parsing a string to get a specific value

I'm new to C#. I'm parsing for a lot number in a 2D barcode. The actual lot number 'A2351' is hidden in this barcode string "+M727PP011/$$3201001A2351S". I would like to break this barcode up in separate string blocks but the delimiters are not consistent.
The letter prefix in front of the 4 digit lot number can be a 'A', 'P', or a 'D' There is a single letter following the lot number that can be ignored.
string Delimiter = "/$$3";
//barcode format:M###PP###/$$3 ddmmyy lotnumprefix 'A' followed by lotNum
string lotNum= "+M727PP011/$$3201001A2351S";
string[] split = lotNum.Split(new[] {Delimiter}, StringSplitOptions.None);
How do I extract the lot number after the date?
Based on your initial example and then the subsequent edit in which you showed how you are solving this, it sounds like the lot number is always in the same place. It would be cleaner (and more in line with standard C# code) to use a single call to string.Substring(int,int) rather than the two lines you are using which also require pulling in the VB library. You just need to call Substring and give it the starting index and the length.
So this code:
string lotNum = Strings.Right(barcode, 6);
lotNum = lotNum.Remove((lotNum.Length - 1), 1);
Can be done with this single substring call:
string lotNum = barcode.Substring(barcode.Length - 6, 5);
Edit
Just further clarification on why it might be better to use the call to Substring. In C# string objects are immutable. That means that when you make the call to Strings.Right you are getting back a new string object. When you then call lotNum.Remove you do not "remove" a character from the existing string, a new string is allocated with the character(s) removed and is returned to you. So with your code there are two new string allocations when trying to extract the lot number. When you make the call to Substring you will get back a new string, but instead of getting a new string that you immediately then modify and get a second new string, you will only need to allocate one new string to extract the lot number. In the example you have given there probably would not be any noticeable performance/memory issue, but it is something that could potentially lead to trouble if this code was in a tight loop or something like that.
If you're just trying to get the lot number, it's really dependent on the format of the input string (is it a consistent length, are there any reliable prefixes/suffixes relative to the data you're trying to parse that you can reference from, etc). It looks like your data is definable by its static position in the string, so it looks like you could use the substring
(with an index of 20?) method to accomplish what you want.

Copy Every thing after a regex is matched

i have to create a function GetSourceCodeOfClass("ClassName",FilePath) this function will be used more than 10000 times to get Srouce code from c# Files, and from every source file i have to extract the source code of a complete class i.e
" Class someName { every thing in the body including sinature} "
Now this is simple, if a single file contains a single class but there will be many source files that will contain more than two classes in them , further more the bigger problem is there maybe nested classes inside a single class.
i want following thing :-
i want to extract the complete source of a given Class
if file contains more than two classes then i want to extract only the source code of specified class.
if file contains more than one class and my specified class have nested classes in it then i want to capture myClasses's source as well as all nested classes.
i have an algorithm in mid that is:
1-open file
2-match regex (C# classes signature ) - parameterized
#"(public|private|internal|protected|inline)?[\t ]*(static)?[\t
]class[\t ]" + sOurClassName + #"(([\t ][:][\t ]([a-zA-z]+(([
])[,]([ ])\w+))+))?\s[\n\r\t\s]?{"
3- If Regex is matched in the source file
4 Start copying at that point until the same regex is matched again but without parameters
regex is:
#" (public|private|internal|protected)?[\t ]*(static)?[\t ]class[\t
]\w+(([\t ][:][\t ]([a-zA-z]+(([ ])[,]([
])\w+))+))?\s[\n\r\t\s]?{"
(this is where i have no clue and i am stuck. I want to copy every thing after first matched to the second matched or after first match till the end )
copying nested classes is still an issue and i am still thinking about it if some one have an idea , can help me in this too.
Note- match.groups[0] or match.groups[1] this will only copy the signature but i want the complete source of the class thats why i am doing this way . ..
BTW i am using C#
I agree with Nathan's sentiment that you would be better using an existing C#-aware parser. Trying to write a regex for the task is a lot of work, and you are unlikely to get it right on the first try. It may work on your first example code, or even the first few, but eventually you'll find some code that's slightly different than what you expected and the regex will fail to catch something important.
That said, if you are comfortable with that limitation and risk, the general technique you are asking about (if I understand correctly…the question isn't entirely clear) is common enough, and worth understanding if you expect to use regex a lot. The key points to understand are that with a Match object, you can call the NextMatch() method to obtain the next match in the next, and that when calling the Regex.Match() method, you can pass the start and length of a substring you want to check, and it will limit its processing to that substring.
You can use the latter point to switch from one regex to another mid-parse.
In your scenario, I understand it to be that you want to run a regex containing the specific class name, to find that particular class in the file, and then to search the text after the initial match for any subsequent class in the file. If the second search finds something, you want to only return the text from the start of the first match to the start of the second match. If the second search finds nothing, you want to return the text from the start of the first match to the end of the whole file.
If that's correct, then something like this should work:
string ExtractClass(string fileContents, Regex classRegex, Regex nonClassRegex)
{
Match match1 = classRegex.Match(fileContents);
if (!match1.Success)
{
return null;
}
Match match2 = nonClassRegex.Match(fileContents, match1.Index + match1.Length);
if (!match2.Success)
{
return fileContents.Substring(match1.Index);
}
return fileContents.Substring(match1.Index, match2.Index - match1.Index);
}
I should note that between two class declarations, or between the end of a lone class declaration and the actual end of the file there can easily be other non-white-space text that isn't part of the class declaration. I assume you have a plan for dealing with that.
If the above doesn't address your need, you should examine your question closely, and edit it both for length and clarity.

Comparing Strings in .NET

I am running into what must be a HUGE misunderstanding...
I have an object with a string component ID, I am trying to compare this ID to a string in my code in the following way...
if(object.ID == "8jh0086s)
{
//Execute code
}
However, when debugging, I can see that ID is in fact "8jh0086s" but the code is not being executed. I have also tried the following
if(String.Compare(object.ID,"8jh0086s")==0)
{
//Execute code
}
as well as
if(object.ID.Equals("8jh0086s"))
{
//Execute code
}
And I still get nothing...however I do notice that when I am debugging the '0' in the string object.ID does not have a line through it, like the one in the compare string. But I don't know if that is affecting anything. It is not the letter 'o' or 'O', it's a zero but without a line through it.
Any ideas??
I suspect there's something not easily apparent in one of your strings, like a non-printable character for example.
Trying running both strings through this to look at their actual byte values. Both arrays should contain the same numerical values.
var test1 = System.Text.Encoding.UTF8.GetBytes(object.ID);
var test2 = System.Text.Encoding.UTF8.GetBytes("8jh0086s");
==== Update from first comment ====
A very easy way to do this is to use the immediate window or watch statements to execute those statements and view the results without having to modify your code.
Your first example should be correct.
My guess is there is an un-rendered character present in the Object.ID.
You can inspect this further by debugging, copying both values into an editor like Notepad++ and turning on view all symbols.
I suspect you answered your own question. If one string has O and the other has 0, then they will compare differently. I have been in similar situations where strings seem the same but they really aren't. Worst-case, write a loop to compare each individual character one at a time and you might find some subtle difference like that.
Alternatively, if object.ID is not a string, but perhaps something of type "object" then look at this:
http://blog.coverity.com/2014/01/13/inconsistent-equality
The example uses int, not string, but it can give you an idea of the complications with == when dealing with different objects. But I suspect this is not your problem since you explicitly called String.Compare. That was the right thing to do, and it tells you that the strings really are different!

How do I prevent a string from appearing in a result string when a set of child strings are concatenated to form the result string?

I have 5 strings, let's call them
EarthString
FireString
WindString
WaterString
HeartString
All of them can have varying length, any of them can be empty, or can be very long (but never null).
These 5 strings are very good friends, and every weekend they are concatenated to form a result string using this c# statement
ResultString = EarthString + FireString + WindString + WaterString + HeartString
Depending on the values of these strings, sometimes (only sometimes), ResultString will contain "Captain Planet" as a substring.
My question is, how do I manipulate each of the 5 strings before they are concatenated, so that when they are combined, "Captain Planet" will never appear as a substring in the resultant string?
The only way I can think of right now is to examine each character in each string, in sequential order, but that seems very tedious. Since each of the 5 good friends strings can be of any length, examining the characters individually will also require some kind of concatenation before we can determine whether any character need to be dropped.
Edit: The resultant string is a filtered version of the 5 strings concatenated together, all the other content remain the same except the "Captain Planet" string is dropped. Yes, i'm looking for a solution which allows the 5 strings to be manipulated before concatenation. (this is actually a simplification of a bigger programming problem i'm encountering). Thanks guys.
If you want to do it pre-concat you could
Assign the start and end of each string a numeric value based on the portion of "CaptainPlanet" they contein. Ex: if Air = "net the big captain" then it would get 3 for a start value and 7 for an end value. to determine if you could concat 2 values safely you would just check to see if the end of the left string + start of the right string were not equal to the total length of "CaptainPlanet". If you had very large strings this would allow you to inspect just the first x and last x characters of the string to compute the start/end value.
This solution doesn't account for short strings like ei air = "Cap" , earth ="tain" and fire="Planet". In that case you would need to have a special case for tokens that are shorter than the length of "CaptainPlanet" For those.
Is there a particular reason you can't just do this?
ResultString.Replace("CaptainPlanet", "x");
If it doesn't matter how many chars will be dropped, you can remove f.e. all 'C' in all strings.
The original answer cleared all of the strings, but as pointed out by J.Steen, there was already a formulation of the expected output. So there we go.
Run elementString.Replace("Captain Planet", "") on every substring.
Now you have to identify all the prefixes / suffixes of "Captain Planet" on each of the substrings, and keep that information so that it can be processed before contatenation. That is, e.g. if the substring ends with "Capt", then you should have an information that "substring contains at the end a prefix of the 4 first letters of 'Captain Planet'". You also have to consider the cases of complete substrings (e.g. one of the strings is "ptain Pla"). The problem also becomes more complex if any of the e.g. prefixes can be recursive or repeated (e.g. "CaptainCap" contains 2 kinds of valid prefixes for "CaptainCaptain", and "apt" can be found at two locations in the resulting string);
You process that information before concatenation so that the result string has the same thing as ResultString.Replace("Captain Planet", ""). Congratulations, you have made your program much more complex than necessary!
But in short, you cannot get both the result that you want (all of the substrings intact except for the combined result output) and do the processing wholly before the concatenation step.

Categories