Alternative for regex.unescape in C# for quotes (") - c#

So i am facing a problem here which i am sure has a simple answer but i cannot seem to find it.
I am comparing string data from 2 tables using C# code
When the data is null or empty in both tables, i want the comparison to return "True" which basically means they are identical.
I am using string.IsNullorEmpty for checking null or empty conditions.
The problem is in one table, the string value is "" while the other table has the same value escaped and is appearing as "\"\""
I assumed using regex.unescape will solve this but it does not seem to be working and i am getting an output that both the values are different causing problems.
One solution i figured out is directly checking if str == "\"\"" for solving the problem.
But are there any cleaner options?

I think you are mixing things here.
If your strings come from the same data source, then either all of them are escaped, or they are not (and if that's not the case, you have bigger problems than what you are stating).
So, if they are not escaped, and one of them contains "", and the other one contains \"\", then they are not equal, one is 2 characters in length, and the other one is 4.
So I'm assuming that they are escaped and your first string is actually empty in the database (it doesn't contain any characters), and the second one is \"\".
You can then use Regex.Unescape (if they are always escaped), but those two strings are not the same: one is empty, and the other one contains (once unescaped), "", so the first string contains no characters and the second one has two of them: no wonder they won't be compared equal.
Now, iff they are indeed escaped, it does not make sense that one contains "", because those characters should be escaped. And if this is not the case, then you have a very specific problem which is not what you asked for: you need to determine whether your string comes escaped or not from the data source... and that's basically impossible unless there's a very specific set of rules which determine so.
If the data source contains randomly escaped or not strings, imagine your data source returns a string \"\": how do you determine if the actual content is escaped and it means {'"','"'} (2 characters, each of them being a double quote), or if it isn't, and it's 4 characters, representing {'\','"','\','"'} (one backslash, one double-quote, one backslash and one double-quote)? There's just no way to tell unless you have a specification that determines those rules (or another field saying if the string is escaped or not).
So, back to your question: although you haven't put any code, my guess is that it is just not wrong: either your expectatives are what are wrong (you want \"\" to mean a string is empty, but it doesn't, because it just doesn't mean that), or your data is wrong.
Either way, there's no generic code solution to any of those... there's specific code solutions for specific cases (like the one you are showing), but not a generic one: with the info you gave in your question, it's just impossible
After all this babbling, now for a specific answer, if your table A contains unescaped strings, and your table B contains escaped strings:
stringFromTableA == Regex.Unescape(stringFromTableB)
Should return true if stringFromTableA contains "" and stringFromTableB contains \"\". Check it. Neither of those will be empty, so string.IsNullOrEmpty() will return false
And an update: should you be checking those string values in the Visual Studio debugger, the debugger shows them escaped, so if you are seing "" in one and \"\" in the other, then your first string is empty (and string.IsNullOrEmpty will return true), and your second string contains two double quotes: string.IsNullOrEmpty will return false, since it is not actually null or empty. And Regex.Unescape will do nothing on this case, since your string doesn't contain any \ and there's nothing to escape, it's just the debugger showing those \'s.

Related

Replacing single quotes in full sql queries with C#

In our C# desktop-application we generate a lot of dynamic sql-queries. Now we have some troubles with single quotes in strings. Here's a sample:
INSERT INTO Addresses (CompanyName) VALUES ('Thomas' Imbiss')
My question is: How can I find and replace all single quotes between 2 other single quotes in a string? Unfortunately I can't replace the single quotes when creating the different queries. I can only do that after the full query is created and right before the query gets executed.
I tried this pattern (Regular Expressions): "\w\'\w"
But this pattern doesn't work, because after "s'" there's a space instead of a char.
I am sorry to say, there is no solution in approach you expect.
For example, have these columns and values:
column A, value ,A',
column B, value ,B',
If they are together in column list, you have ',A',',',B','.
Now, where is the boundary between first and second value? It is ambiguous.
You must take action when creating text fields for SQL. Either use SQL parameters or properly escape qoutes and other problematic characters there.
Consider showing the above ambiguous example to managers, pushing the whole problem back as algorithmically unsolvable at your end. Or offer implementing a guess-work and ask them whether they will be happy if content of several text fields can get mixed in some cases like above one.
At time of SQL query creation, if they do not want to start using SQL parameters, the solution for enquoting any input string is as simple as replacing:
string Enquote(string input)
{
return input.All(c => Strings.AscW(c) < 128) ? "'" : "N'"
+ input.Replace("'", "''")
+ "'"
}
Of course, it can have problem with deliberately malformed Unicode strings (surrogate pairs to hide ') but it is not normally possible to produce these strings through the user interface. Generally this can be still faster than converting all queries to versions with SQL parameters.

Why are some unintended symbols added to my string?

I wrote a console application which fetches strings from some fields in a Sharepoint list. Then I simply write the strings to console. This works fine for the most fields. There is one MultiLineTextField with RichText enabled where i had to remove all the html-tags, that causes this issue.
Even after all the tags are removed the strings seem to contain question marks which were never added to the string. The most weird thing about this is when I set a breakpoint and look into the string's value there are no question marks, but they suddenly appear on the console output.
The only thing I could think of was to Trim the string. Because sometimes they appear in front of the actual string sometimes they are at the and of it, but never in between.
So this is what I tried:
myString = myString.Trim();
myString = myString.Replace("?",string.Empty);
But this does not solve the issue. Besides this would not be a smart solution in case one of the strings would be supposed to contain question marks. For detailed code please see the link above.
Also Convert.ToBase64String(Encoding.UTF8.GetBytes(myString)) gives me the following output:
4oCLTWVobCwgRWllciwgV2Fzc2VyLCBIYWNrZmxlaXNjaCA=
There are probably some non-printing unicode (or possibly low ASCII) characters in the end of the string. The console has a different encoding, and will often render such as ?. Basically: use the indexer (yourString[n]) or yourString.ToCharArray() to investigate what is actually in the string aroung the location of the ?.
With the edit, we can see that the string has a zero-width space (decimal 8203) at the start:
Sounds like you're maybe having a problem with unicode characters. Chances are you're outputting the string as ASCII instead of Unicode. Take a look at this question as it sounds like you may be experiencing the same problem.

Comparing Strings in .NET

I am running into what must be a HUGE misunderstanding...
I have an object with a string component ID, I am trying to compare this ID to a string in my code in the following way...
if(object.ID == "8jh0086s)
{
//Execute code
}
However, when debugging, I can see that ID is in fact "8jh0086s" but the code is not being executed. I have also tried the following
if(String.Compare(object.ID,"8jh0086s")==0)
{
//Execute code
}
as well as
if(object.ID.Equals("8jh0086s"))
{
//Execute code
}
And I still get nothing...however I do notice that when I am debugging the '0' in the string object.ID does not have a line through it, like the one in the compare string. But I don't know if that is affecting anything. It is not the letter 'o' or 'O', it's a zero but without a line through it.
Any ideas??
I suspect there's something not easily apparent in one of your strings, like a non-printable character for example.
Trying running both strings through this to look at their actual byte values. Both arrays should contain the same numerical values.
var test1 = System.Text.Encoding.UTF8.GetBytes(object.ID);
var test2 = System.Text.Encoding.UTF8.GetBytes("8jh0086s");
==== Update from first comment ====
A very easy way to do this is to use the immediate window or watch statements to execute those statements and view the results without having to modify your code.
Your first example should be correct.
My guess is there is an un-rendered character present in the Object.ID.
You can inspect this further by debugging, copying both values into an editor like Notepad++ and turning on view all symbols.
I suspect you answered your own question. If one string has O and the other has 0, then they will compare differently. I have been in similar situations where strings seem the same but they really aren't. Worst-case, write a loop to compare each individual character one at a time and you might find some subtle difference like that.
Alternatively, if object.ID is not a string, but perhaps something of type "object" then look at this:
http://blog.coverity.com/2014/01/13/inconsistent-equality
The example uses int, not string, but it can give you an idea of the complications with == when dealing with different objects. But I suspect this is not your problem since you explicitly called String.Compare. That was the right thing to do, and it tells you that the strings really are different!

How do I prevent a string from appearing in a result string when a set of child strings are concatenated to form the result string?

I have 5 strings, let's call them
EarthString
FireString
WindString
WaterString
HeartString
All of them can have varying length, any of them can be empty, or can be very long (but never null).
These 5 strings are very good friends, and every weekend they are concatenated to form a result string using this c# statement
ResultString = EarthString + FireString + WindString + WaterString + HeartString
Depending on the values of these strings, sometimes (only sometimes), ResultString will contain "Captain Planet" as a substring.
My question is, how do I manipulate each of the 5 strings before they are concatenated, so that when they are combined, "Captain Planet" will never appear as a substring in the resultant string?
The only way I can think of right now is to examine each character in each string, in sequential order, but that seems very tedious. Since each of the 5 good friends strings can be of any length, examining the characters individually will also require some kind of concatenation before we can determine whether any character need to be dropped.
Edit: The resultant string is a filtered version of the 5 strings concatenated together, all the other content remain the same except the "Captain Planet" string is dropped. Yes, i'm looking for a solution which allows the 5 strings to be manipulated before concatenation. (this is actually a simplification of a bigger programming problem i'm encountering). Thanks guys.
If you want to do it pre-concat you could
Assign the start and end of each string a numeric value based on the portion of "CaptainPlanet" they contein. Ex: if Air = "net the big captain" then it would get 3 for a start value and 7 for an end value. to determine if you could concat 2 values safely you would just check to see if the end of the left string + start of the right string were not equal to the total length of "CaptainPlanet". If you had very large strings this would allow you to inspect just the first x and last x characters of the string to compute the start/end value.
This solution doesn't account for short strings like ei air = "Cap" , earth ="tain" and fire="Planet". In that case you would need to have a special case for tokens that are shorter than the length of "CaptainPlanet" For those.
Is there a particular reason you can't just do this?
ResultString.Replace("CaptainPlanet", "x");
If it doesn't matter how many chars will be dropped, you can remove f.e. all 'C' in all strings.
The original answer cleared all of the strings, but as pointed out by J.Steen, there was already a formulation of the expected output. So there we go.
Run elementString.Replace("Captain Planet", "") on every substring.
Now you have to identify all the prefixes / suffixes of "Captain Planet" on each of the substrings, and keep that information so that it can be processed before contatenation. That is, e.g. if the substring ends with "Capt", then you should have an information that "substring contains at the end a prefix of the 4 first letters of 'Captain Planet'". You also have to consider the cases of complete substrings (e.g. one of the strings is "ptain Pla"). The problem also becomes more complex if any of the e.g. prefixes can be recursive or repeated (e.g. "CaptainCap" contains 2 kinds of valid prefixes for "CaptainCaptain", and "apt" can be found at two locations in the resulting string);
You process that information before concatenation so that the result string has the same thing as ResultString.Replace("Captain Planet", ""). Congratulations, you have made your program much more complex than necessary!
But in short, you cannot get both the result that you want (all of the substrings intact except for the combined result output) and do the processing wholly before the concatenation step.

C# Regex.Replace (or String.Replace) only partially works

I run a repeated Regex.Replace over a string, replacing certain "variables" with their "values". Thing is, some get replaced and some don't!
I have to analyze certain batch files (IBM JCL batch language, to be precise) and search them for JCL variables (rules: JCLvariable starts with "&" and ends with space; ","; "." or other variable start, that being "&"). My functions is supposed to take the string with variables and array of variables-and-their-values as an input; then search the string and replace JCL variables with their values. So is I run a forcycle and for each value-variable struct in array, I run Regex.Replace (in order to prevent the "&TOSP." being misplaced for "&TO." and adhere to JCL var rules, see above):
private string ReplaceDSNVarsWithValues(string _DSN,JCLvar[] VarsAndValues)
{
//FIXME: nefunguje pro TIPfile a nebere všechny &var
for(int Fa=0;Fa<VarsAndValues.Length/2;++Fa)
{
_DSN = Regex.Replace(_DSN, "&"+VarsAndValues[Fa].JCLvariable+"[^A-Za-z0-9]", VarsAndValues[Fa].JCLvalue);
}
return _DSN;
}
Eg. I have this as a string to replace:
string _DSN = "&TOSP..COPY.&SYSTEM..SP&APL..BVSIN.SAVEC.D&MES.&DEN..V&VER.K99";
And then I have an array of struct containing couples of variable and value, eg.
JCLvar[1].variable = "APL",JCLvar[1].value = "PROD"
Combine that and it should result in the "SP&APL." part changing to "SPPROD".
The problem is, only SOME of the variables get replaced:
&TOSP..COPY.&SYSTEM..SP&APL..BVSIN.SAVEC.D&MES.&DEN..V&VER.K99 gets changed to SP.COPY.DBA0.SPPROD.BVSIN.SAVEC.D&MESDENV&VER.K99 as it should (disregard &MES,&DEN - these are not filled in the ValsAnd Values array and therefore don't get replaced), but in
&TO..#ZDSK99.PODVYP.M&MES.U&DEN..SUC.RES, the "&TO." doesn't get changed at all - although it exists in the array and via debugging, I see that it is being passed to the regex /but it doesn't get changed/.
How the heck it comes SOME variables get replaced and others don't?
In the array VarsAndValues, order of variables matters, because if "TOSP" is first, it gets replaced and "&TO" does not, while if "TO" is first, it gets replaced and "&TOSP" doesn't; therefore, I got suspicion that Regex.Replace somehow fails to do repeated replace on similar expressions/variables in the same string OR fails to recognize the variable/expression to be replaced - but I see no reason for the first possibility and the second one is impossible, as the replaced expressions clearly stay there.
//Note - I know it's certainly not nice coding, but it's more a single-purpose script I wrote to save me weeks of manual work than anything else
I don't see anything wrong with your regex. But why are you iterating over only half of VarsAndValues?
for(int Fa=0;Fa<VarsAndValues.Length/2;++Fa)
tells me you're stopping halfway through the array, so if TOSP happens to fall in the second half, it won't be replaced.

Categories