compare strings with one having nonreadable chars - c#

I am having problem regarding comparing two strings. One string has one or more nonreadable characters in it while other other string is same but in readable format.
When I try to use this, I am having trouble
if (Alemria=Almería)...
I am having such string Almería in a table.
How can this be done?

Use an overload of string.Equals that takes a StringComparison enum - use one of the CurrentCulture enum members.
You will need to set the current culture to a culture that can sort by these characters.

Depending on how strict you want to be, you might try this. If you know the characters in question are only from spanish the alphabet you could strip those out of the seed values (maybe use RegEx) and modify your comparison logic to do the same with target records. For example, remove all 'ñ' and 'n' from both side and maybe add a length comparison to increase reliability. Of course do this with all the special characters, not just 'ñ'.

See if this article will help you, you could replace all accented characters in the word and then do your comparison.

I suppose CompareOptions.IgnoreNonSpace is that you are looking for. Compare will ignore accents, diacritics, and vowel marks.
string str1 = "mun";
string str2 = "mün";
int result1 = string.Compare(str1, str2, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace);
int result2 = string.Compare(str1, str2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);
But Alemria will differ from Almería anyway. Seems like it's considered totally another symbol.

Try
if(var.StartsWith("Almer"))//repalce var with your string var
MessageBox.Show("String matched");//do watever you want to do here

Related

Replacing Characters in String C#

I need to replace a series of characters in a file name in C#. After doing many searches, I can't find a good example of replacing all characters between two specific ones. For example, the file name would be:
"TestExample_serialNumber_Version_1.0_.pdf"
All I want is the final product to be "serialNumber".
Is there a special character I can use to replace all characters up to and including the first underscore? Then I can run the the replace method again to replace everything after the and including the next underscore? I've heard of using regex but I've done something similar to this in Java and it seemed much easier to accomplish. I must not be understanding the string formats in C#.
I would imagine it would look something like:
name.Replace("T?_", "");//where ? equals any characters between
name.Replace("_?", "");
Rather than "replace", just use a regex to extract the part you want. Something like:
(?:TestExample_)(.*)(?:_Version)
Would give you the serialnumber part in a capture group.
Or if TestExample is variable (in which case, you need your question to be more specific about exactly what patten you are matching) you could probably just do:
(?:_)(.*)(?:_Version)
Assuming the Version part is constant.
In C#, you could do something like:
var regex1 = new Regex("(?:TestExample_)(.*)(?:_Version)");
string testString = "TestExample_serialNumber_Version_1.0_.pdf";
string serialNum = regex1.Match(testString).Groups[1].Value;
As an alternative to regex, you could find the first instance of an underscore then find the next instance of an underscore and take the substring between those indices.
string myStr = "TestExample_serialNumber_Version_1.0_.pdf";
string splitStr = "_";
int startIndex = myStr.IndexOf(splitStr) + 1;
string serialNum = myStr.Substring(startIndex, myStr.IndexOf(splitStr, startIndex) - startIndex);

C# Trouble with Regex.Replace

Been scratching my head all day about this one!
Ok, so I have a string which contains the following:
?\"width=\"1\"height=\"1\"border=\"0\"style=\"display:none;\">');
I want to convert that string to the following:
?\"width=1height=1border=0style=\"display:none;\">');
I could theoretically just do a String.Replace on "\"1\"" etc. But this isn't really a viable option as the string could theoretically have any number within the expression.
I also thought about removing the string "\"", however there are other occurrences of this which I don't want to be replaced.
I have been attempting to use the Regex.Replace method as I believe this exists to solve problems along my lines. Here's what I've got:
chunkContents = Regex.Replace(chunkContents, "\".\"", ".");
Now that really messes things up (It replaces the correct elements, but with a full stop), but I think you can see what I am attempting to do with it. I am also worrying that this will only work for single numbers (\"1\" rather than \"11\").. So that led me into thinking about using the "*" or "+" expression rather than ".", however I foresaw the problem of this picking up all of the text inbetween the desired characters (which are dotted all over the place) whereas I obviously only want to replace the ones with numeric characters in between them.
Hope I've explained that clearly enough, will be happy to provide any extra info if needed :)
Try this
var str = "?\"width=\"1\"height=\"1234\"border=\"0\"style=\"display:none;\">');";
str = Regex.Replace(str , "\"(\\d+)\"", "$1");
(\\d+) is a capturing group that looks for one or more digits and $1 references what the group captured.
This works
String input = #"?\""width=\""1\""height=\""1\""border=\""0\""style=\""display:none;\"">');";
//replace the entire match of the regex with only what's captured (the number)
String result = Regex.Replace(input, #"\\""(\d+)\\""", match => match.Result("$1"));
//control string for excpected result
String shouldBe = #"?\""width=1height=1border=0style=\""display:none;\"">');";
//prints true
Console.WriteLine(result.Equals(shouldBe).ToString());

C# - Fastest way to find one of a set of strings in another string

I need to check whether a string contains any swear words.
Following some advice from another question here, I made a HashSet containing the words:
HashSet<string> swearWords = new HashSet<string>() { "word_one", "word_two", "etc" };
Now I need to see if any of the values contained in swearWords are in my string.
I've seen it done the other way round, eg:
swearWords.Contains(myString)
But this will return false.
What's the fastest way to check if any of the words in the HashSet are in myString?
NB: I figure I can use a foreach loop to check each word in turn, and break if a match is found, I'm just wondering if there's a faster way.
If you place your swears in an IEnumerable<> implementing container:
var containsSwears = swarWords.Any(w => myString.Contains(w));
Note: HashSet<> implements IEnumerable<>
You could try a regex, but I'm not sure it's faster.
Regex rx = new Regex("(" + string.Join("|", swearWords) + ")");
rx.IsMatch(myString)
If you have really large set of swear words you could use Aho–Corasick algorithm: http://tomasp.net/blog/ahocorasick.aspx
The main problem with such schemes is defining what a word is in the context of the string you want to check.
Naive implementations such as those using input.Contains simply do not have the concept of a word; they will "detect" swear words even when that was not the intent.
Breaking words on whitespace is not going to cut it (consider also punctuation marks, etc).
Breaking on characters other than whitespace is going to raise culture issues: what characters are considered word-characters exactly?
Assuming that your stopword list only uses the latin alphabet, a practical choice would be to assume that words are sequences consisting of only latin characters. So a reasonable starting solution would be
var words = Regex.Split(#"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Pc}\p{Lm}]", myString);
The regex above is the standard class \W modified to not include digits; for more info, see http://msdn.microsoft.com/en-us/library/20bw873z.aspx. For other approaches, see this question and possibly the CodeProject link supplied in the accepted answer.
Having split the input string, you can iterate over words and replace those that match anything in your list (use swearWords.Contains(word) to check) or simply detect if there are any matches at all with
var anySwearWords = words.Intersect(swearWords).Any();
You could split "myString" into an IEnumerable type, and then use "Overlaps" on them?
http://msdn.microsoft.com/en-us/library/bb355623(v=vs.90).aspx
(P.S. Long time no see...)
EDIT: Just noticed error in my previous answer.

Comparing strings with quotation marks

Hello guys i'm trying to create a program in C# where I am comparing two strings in which within the strings they have the double quotation marks. My problem is how do I compare them for equality because it seems the compiler ignores the words within the quotation marks and does not give me the right comparison.
An example is if
string1 = Hi "insert name" here.
string2 = Hi "insert name" here.
I want to use string1.equals(string2). But it seems it tells me the strings are not equal. How do I do this? Please help.
PS. I have no control on what the strings will look like as they are dynamic variables. So I can't just say add an escape sequence to it.
string s1 = "Hi \"insert name\" here.";
string s2 = "Hi \"insert name\" here.";
Console.WriteLine((s1 == s2).ToString()); //True
I have no problem ...
.NET will not ignore string values with double quotes when doing comparisons. I think your analysis of what is happening is flawed. For example, given these values:
var string1 = "This contains a \"quoted value\"";
var string2 = "This contains a \"quoted value\"";
var string3 = "This contains a \"different value\"";
string1.Equals(string2) will equal true, and string2.Equals(string3) will equal false.
Here are some potential reasons why you're not seeing an expected result when comparing:
One string may contain different quote characters than another. For example, "this", and “this” are completely different strings.
Your comparison may be failing due to other content not matching. For example, one string may have trailing spaces, and the other may not.
You may be comparing two objects instead of two strings. Object.Equals compares whether two objects are the same object. If you're not dealing with String references, the wrong comparison may be happening.
There are many more potential causes for your issue, but it's not because string comparison ignores double quotes. The more details you provide in your question, the easier it is for us to narrow down what you're seeing.

string IndexOf and Replace

I have just faced this problem today and wonder if someone has any idea about why does this test may fail (depending on culture). The aim is to check if the test text contain two spaces next to each other, which does according to string.IndexOf (even if i tell the string to replace all occurrences of two spaces next to each other). After some testing it seems \xAD is somehow causing this issue.
public class ReplaceIndexOfSymmetryTest
{
[Test]
public void IndexOfShouldNotFindReplacedString()
{
string testText = "\x61\x20\xAD\x20\x62";
const string TWO_SPACES = " ";
const string ONE_SPACE = " ";
string result = testText.Replace(TWO_SPACES, ONE_SPACE);
Assert.IsTrue(result.IndexOf(TWO_SPACES) < 0);
}
}
Yes, I've come across the same thing before (although with different characters). Basically IndexOf will take various aspects of "special" Unicode characters into account when finding matches, whereas Replace just treats the strings as a sequence of code points.
From the IndexOf docs:
This method performs a word (case-sensitive and culture-sensitive) search using the current culture. The search begins at the first character position of this instance and continues until the last character position.
... and from Replace:
This method performs an ordinal (case-sensitive and culture-insensitive) search to find oldValue.
You could use the overload of IndexOf which takes a StringComparison, and force it to perform an ordinal comparison though.
Like Jon said, use StringComparison.Ordinal to get it right.
Assert.IsTrue(result.IndexOf(TWO_SPACES, StringComparison.Ordinal) < 0);

Categories