I want to know how Microsoft write algorithm for string comparison.
string.equal and string.compare
Do they compare character by character like this:
int matched = 1;
for (int i = 0; i < str1.Length; i++)
{
if (str1[i] == str2[i])
{
matched++;
}
else
{
break;
}
}
if (matched == str1.Length) return true;
Or match all at once
if (str1[0] == str2[0] && str1[1] == str2[1] && str1[2] == str2[2]) return true;
I trying pressing F12 on the string.equal function but it got me to the function declaration not the actual code. Thanks
After Thilo mentioned to look at the source i was able to find this... this is how Microsoft wrote it.
public static bool Equals(String a, String b) {
if ((Object)a==(Object)b) {
return true;
}
if ((Object)a==null || (Object)b==null) {
return false;
}
if (a.Length != b.Length)
return false;
return EqualsHelper(a, b);
}
But this raise a question whether is faster by checking character by character or doing a complete match?
Looking at the source (copied below):
null check
reference identity
different length => not equal
go over the binary encoding of the characters in a bit of an unrolled loop
this raise a question whether is faster by checking character by character or doing a complete match
I don't understand the question. You cannot do a "complete match" without checking each of the characters. What you can do is bail out as soon as you find a mismatch. That reduces runtime a bit, but does not change the fact that it is O(n).
// Determines whether two strings match.
[Pure]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public bool Equals(String value) {
if (this == null) //this is necessary to guard against reverse-pinvokes and
throw new NullReferenceException(); //other callers who do not use the callvirt instruction
if (value == null)
return false;
if (Object.ReferenceEquals(this, value))
return true;
if (this.Length != value.Length)
return false;
return EqualsHelper(this, value);
}
[System.Security.SecuritySafeCritical] // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool EqualsHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.Requires(strA.Length == strB.Length);
int length = strA.Length;
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
// unroll the loop
#if AMD64
// for AMD64 bit platform we unroll by 12 and
// check 3 qword at a time. This is less code
// than the 32 bit case and is shorter
// pathlength
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
#else
while (length >= 10)
{
if (*(int*)a != *(int*)b) return false;
if (*(int*)(a+2) != *(int*)(b+2)) return false;
if (*(int*)(a+4) != *(int*)(b+4)) return false;
if (*(int*)(a+6) != *(int*)(b+6)) return false;
if (*(int*)(a+8) != *(int*)(b+8)) return false;
a += 10; b += 10; length -= 10;
}
#endif
// This depends on the fact that the String objects are
// always zero terminated and that the terminating zero is not included
// in the length. For odd string sizes, the last compare will include
// the zero terminator.
while (length > 0)
{
if (*(int*)a != *(int*)b) break;
a += 2; b += 2; length -= 2;
}
return (length <= 0);
}
}
Related
I'm trying to know if a string contains a length between 5 and 10 and at the same time, 7-10 letters are in upper case. The idea is to detect if a message sent by a user is 70%-100% capped.
This is what I have tried so far:
bool IsMessageUpper(string input)
{
if (input.Length.Equals(5 <= 10) && (input.Take(7).All(c => char.IsLetter(c) && char.IsUpper(c))))
{
return true;
}
else
{
return false;
}
}
You can rewrite your method in this way
bool IsMessageUpper(string input)
{
int x = input.Length;
return x>=7 && x<= 10 && input.Count(char.IsUpper) >= 7;
}
You can also add some safety checks to handle undesidered inputs
bool IsMessageUpper(string input)
{
int x = (input ?? "").Length;
return x>=7 && x<= 10 && input.Count(char.IsUpper) >= 7;
}
I need to write a function which verifies parentheses are balanced in a string. Each open parentheses should have a corresponding close parentheses and they should correspond correctly.
For example, the function should return true for the following strings:
(if (any? x) sum (/1 x))
I said (it's not (yet) complete). (she didn't listen)
The function should return false for the following strings:
:-)
())(
OPTIONAL BONUS
Implement the solution as a recursive function with no mutation / side-effects.
Can you please help me out in writing using C# because I'm new to .NET technologies.
Thanks.
Here's what I've tried so far. It works for sending parameters as open and close brackets, but I'm confused with just passing a string... and also I should not use stack.
private static bool Balanced(string input, string openParenthesis, string closedParenthesis)
{
try
{
if (input.Length > 0)
{
//Obtain first character
string firstString = input.Substring(0, 1);
//Check if it is open parenthesis
//If it is open parenthesis push it to stack
//If it is closed parenthesis pop it
if (firstString == openParenthesis)
stack.Push(firstString);
else if (firstString == closedParenthesis)
stack.Pop();
//In next iteration, chop off first string so that it can iterate recursively through rest of the string
input = input.Substring(1, input.Length - 1);
Balanced(input, openParenthesis, closedParenthesis); //this call makes the function recursive
}
if (stack.Count == 0 && !exception)
isBalanced = true;
}
catch (Exception ex)
{
exception = true;
}
return isBalanced;
}
You don't need to use any recursion method for such a simple requirement, just try this simple method and it works like a charm:
public bool AreParenthesesBalanced(string input) {
int k = 0;
for (int i = 0; i < input.Length; i++) {
if (input[i] == '(') k++;
else if (input[i] == ')'){
if(k > 0) k--;
else return false;
}
}
return k == 0;
}
I have used the startIndex and increment with each recursive call
List<string> likeStack = new List<string>();
private static bool Balanced(string input, string openParenthesis, string closedParenthesis , int startIndex)
{
try
{
if (startIndex < input.Length)
{
//Obtain first character
string firstString = input.Substring(startIndex, 1);
//Check if it is open parenthesis
//If it is open parenthesis push it to stack
//If it is closed parenthesis pop it
if (firstString == openParenthesis)
likeStack.Add(firstString);
else if (firstString == closedParenthesis)
likeStack.RemoveAt(likeStack.Count -1);
//In next iteration, chop off first string so that it can iterate recursively through rest of the string
Balanced(input, openParenthesis, closedParenthesis , startIndex + 1); //this call makes the function recursive
}
if (likeStack.Count == 0 && !exception)
isBalanced = true;
}
catch (Exception ex)
{
exception = true;
}
return isBalanced;
}
How's this recursive version?
public static bool Balanced(string s)
{
var ix = -1;
return Balanced(s, false, ref ix);
}
private static bool Balanced(string s, bool inParens, ref int ix)
{
ix++;
while (ix < s.Length)
{
switch (s[ix++])
{
case '(':
if (!Balanced(s, true, ref ix))
return false;
break;
case ')':
return inParens;
}
}
return !inParens;
}
Note: Using .Net 4.0
Consider the following piece of code.
String ad = "FE23658978541236";
String ad2 = "00FABE002563447E".ToLower();
try
{
PhysicalAddress.Parse(ad);
}
catch (Exception)
{
//We dont get here, all went well
}
try
{
PhysicalAddress.Parse(ad2);
}
catch (Exception)
{
//we arrive here for what reason?
}
try
{
//Ok, I do it myself then.
ulong dad2 = ulong.Parse(ad2, System.Globalization.NumberStyles.HexNumber);
byte[] bad2 = BitConverter.GetBytes(dad2);
if (BitConverter.IsLittleEndian)
{
bad2 = bad2.Reverse().ToArray<byte>();
}
PhysicalAddress pa = new PhysicalAddress(bad2);
}
catch (Exception ex)
{
//We don't get here as all went well
}
So an exception is thrown in PhysicalAddress.Parse method when trying to parse an address with lower case. When I look at the source code of .Net its totally clear to me why.
Its because of the following piece of code.
if (value >= 0x30 && value <=0x39){
value -= 0x30;
}
else if (value >= 0x41 && value <= 0x46) {
value -= 0x37;
}
That is found within the Parse method.
public static PhysicalAddress Parse(string address) {
int validCount = 0;
bool hasDashes = false;
byte[] buffer = null;
if(address == null)
{
return PhysicalAddress.None;
}
//has dashes?
if (address.IndexOf('-') >= 0 ){
hasDashes = true;
buffer = new byte[(address.Length+1)/3];
}
else{
if(address.Length % 2 > 0){ //should be even
throw new FormatException(SR.GetString(SR.net_bad_mac_address));
}
buffer = new byte[address.Length/2];
}
int j = 0;
for (int i = 0; i < address.Length; i++ ) {
int value = (int)address[i];
if (value >= 0x30 && value <=0x39){
value -= 0x30;
}
else if (value >= 0x41 && value <= 0x46) {
value -= 0x37;
}
else if (value == (int)'-'){
if (validCount == 2) {
validCount = 0;
continue;
}
else{
throw new FormatException(SR.GetString(SR.net_bad_mac_address));
}
}
else{
throw new FormatException(SR.GetString(SR.net_bad_mac_address));
}
//we had too many characters after the last dash
if(hasDashes && validCount >= 2){
throw new FormatException(SR.GetString(SR.net_bad_mac_address));
}
if (validCount%2 == 0) {
buffer[j] = (byte) (value << 4);
}
else{
buffer[j++] |= (byte) value;
}
validCount++;
}
//we too few characters after the last dash
if(validCount < 2){
throw new FormatException(SR.GetString(SR.net_bad_mac_address));
}
return new PhysicalAddress(buffer);
}
Can this be considered a bug? Or is it so very wrong to use a lower cased hex values in a string? Or is there some convention I am unaware of.
Personally, I consider this programmer unfriendly.
From MSDN:
The address parameter must contain a string that can only consist of
numbers and upper-case letters as hexadecimal digits. Some examples of
string formats that are acceptable are as follows .... Note that an address that contains f0-e1-d2-c3-b4-a5 will fail to parse and throw an exception.
So you could simply do: PhysicalAddress.Parse(ad.ToUpper());
No, it's only a bug if it doesn't do something the documentation states that it does, or it does something the documentation states that it doesn't. The mere fact that it doesn't behave as you expect doesn't make it a bug. You could of course consider it a bad design decision (or, as you put it so eloquently, programmer-unfriendly) but that's not the same thing.
I tend to agree with you there since I like to follow the "be liberal in what you expect, consistent in what you deliver" philosophy and the code could probably be easily fixed with something like:
if (value >= 0x30 && value <=0x39) {
value -= 0x30;
}
else if (value >= 0x41 && value <= 0x46) {
value -= 0x37;
}
else if (value >= 0x61 && value <= 0x66) { // added
value -= 0x57; // added
} // added
else if ...
though, of course, you'd have to change the doco as well, and run vast numbers of tests to ensure you hadn't stuffed things up.
Regarding the doco, it can be found here and the important bit is repeated below (with my bold):
The address parameter must contain a string that can only consist of numbers and upper-case letters as hexadecimal digits. Some examples of string formats that are acceptable are as follows:
001122334455
00-11-22-33-44-55
F0-E1-D2-C3-B4-A5
Note that an address that contains f0-e1-d2-c3-b4-a5 will fail to parse and throw an exception.
my program should print a message on the screen if the formula that the user entered is good for the terms(you can only use digits and letters, u can't start with '(' and like mathematical formula, for each open bracket, has to be a suitable(and in the right place) close bracket.
here some formulas that the program should accepts and prints:
True-
a(aa(a)aaa(aa(a)aa)aa)aaaaa
a(((())))
here some formulas that the program should not accepts and prints:
False-
()()()
)()()(
but the program always prints False
thanks for helping
Heres the code:EDIT
bool IsNumeric(char character)
{
return "0123456789".Contains(character);
// or return Char.IsNumber(character);
}
bool IsLetter(char character)
{
return "ABCDEFGHIJKLMNOPQRSTUVWXWZabcdefghigjklmnopqrstuvwxyz".Contains(character);
}
bool IsRecognized(char character)
{
return IsBracket(character) | IsNumeric(character) | IsLetter(character);
}
public bool IsValidInput(string input)
{
if (String.IsNullOrEmpty(input) || IsBracket(input[0]))
{
return false;
}
var bracketsCounter = 0;
for (var i = 0; i < input.Length; i++)
{
var character = input[i];
if (!IsRecognized(character))
{
return false;
}
if (IsBracket(character))
{
if (character == '(')
bracketsCounter++;
if (character == ')')
bracketsCounter--;
}
}
if (bracketsCounter > 0)
{
return false;
}
return bracketsCounter==0;
}
}
}
Your algorithm is unnecessarily complex - all you need is a loop and a counter.
Check the initial character for ( (you already do that)
Set the counter to zero, and go through each character one by one
If the character is not a letter or a parentheses, return false
If the character is an opening (, increment the counter
If the character is a closing ), decrement the counter; if the counter is less than zero, return false
Return true if the count is zero after the loop has ended; otherwise return false
Is debugging this hard really? This condition:
((!IsNumeric(st[i])) && (st[i] != '(') && (st[i] != ')')&&((st[i]<'a')||(st[i]>'z')||(st[i]<'A')||(st[i]>'Z')))
return false;
is obviously wrong. It returns false for a every time. You don't take into consideration that a is greater than Z.
EDIT:
so how can i make it easier to read? that the only way i figured. do u
have other solution for this problem?
As for that condition block - use smaller methods / functions, for example.
bool IsBracket(char character)
{
return (character == '(' | character == ')');
}
bool IsNumeric(char character)
{
return "0123456789".Contains(character);
// or return Char.IsNumber(character);
}
bool IsLetter(char character)
{
// see why this is NOT prone to fail just because 'a' is greater than 'Z' in C#?
return (character >= 'a' & character <= 'z') |
(character >= 'A' & character <= 'Z');
// or return Regex.IsMatch(character.ToString(), "[a-zA-Z]", RegexOptions.None);
// or return Char.IsLetter(character);
}
// now you can implement:
bool IsRecognized(char character)
{
return IsBracket(character) | IsNumeric(character) | IsLetter(character);
}
and then in your big method you could just safely use:
if (!IsRecognized(st[i]))
return false;
It may look like an overkill for such a trivial example, but it's a better approach in principle, and certainly more readable.
And after that, you could reduce your code to something along the lines of:
bool IsInputValid(string input)
{
if (String.IsNullOrEmpty(input) || IsBracket(input[0]))
{
return false;
}
var bracketsCounter = 0;
for (var i = 0; i < input.Length; i++)
{
var character = input[i];
if (!IsRecognized(character))
{
return false;
}
if (IsBracket(character)) // redundant?
{
if (character == '(') // then what?
if (character == ')') // then what?
}
if (bracketsCounter < what?)
{
what?
}
}
return bracketsCounter == what?;
}
(dasblinkenlight's algorithm)
EDIT 10th April
You got it wrong.
bool IsNumeric(char character)
{
return "0123456789".Contains(character);
// or return Char.IsNumber(character);
}
bool IsLetter(char character)
{
return "ABCDEFGHIJKLMNOPQRSTUVWXWZabcdefghigjklmnopqrstuvwxyz".Contains(character);
}
bool IsRecognized(char character)
{
return IsBracket(character) | IsNumeric(character) | IsLetter(character);
}
public bool IsValidInput(string input)
{
if (String.IsNullOrEmpty(input) || IsBracket(input[0]))
{
return false;
}
var bracketsCounter = 0;
for (var i = 0; i < input.Length; i++)
{
var character = input[i];
if (!IsRecognized(character))
{
return false;
}
if (IsBracket(character))
{
if (character == '(')
bracketsCounter++;
if (character == ')')
bracketsCounter--;
}
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
if (bracketsCounter < 0) // NOT "> 0", and HERE - INSIDE the for loop
{
return false;
}
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
}
return bracketsCounter==0;
}
}
}
By the way, you also made a mistake in your IsLetter method:
...UVWXWZ? Should be UVWXYZ
Please tell me how to validate GUID in .net and it is unique for always?
Guid's are unique 99.99999999999999999999999999999999% of the time.
It depends on what you mean by validate?
Code to determine that a Guid string is in fact a Guid, is as follows:
private static Regex isGuid =
new Regex(#"^(\{){0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}(\}){0,1}$", RegexOptions.Compiled);
internal static bool IsGuid(string candidate, out Guid output)
{
bool isValid = false;
output = Guid.Empty;
if(candidate != null)
{
if (isGuid.IsMatch(candidate))
{
output=new Guid(candidate);
isValid = true;
}
}
return isValid;
}
2^128 is a very, very large number. It is a billion times larger than the number of picoseconds in the life of the universe. Too large by a long shot to ever validate, the answer is doomed to be "42". Which is the point of using them: you don't have to. If you worry about getting duplicates then you worry for the wrong reason. The odds your machine will be destroyed by a meteor impact are considerably larger.
Duck!
Here's a non-Regex answer that should be pretty fast:
public static bool IsHex(this char c)
{
return ((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F'));
}
public static bool IsGuid(this string s)
{
// Length of a proper GUID, without any surrounding braces.
const int len_without_braces = 36;
// Delimiter for GUID data parts.
const char delim = '-';
// Delimiter positions.
const int d_0 = 8;
const int d_1 = 13;
const int d_2 = 18;
const int d_3 = 23;
// Before Delimiter positions.
const int bd_0 = 7;
const int bd_1 = 12;
const int bd_2 = 17;
const int bd_3 = 22;
if (s == null)
return false;
if (s.Length != len_without_braces)
return false;
if (s[d_0] != delim ||
s[d_1] != delim ||
s[d_2] != delim ||
s[d_3] != delim)
return false;
for (int i = 0;
i < s.Length;
i = i + (i == bd_0 ||
i == bd_1 ||
i == bd_2 ||
i == bd_3
? 2 : 1))
{
if (!IsHex(s[i])) return false;
}
return true;
}
You cannot validate GUID's uniqueness. You just hope it was generated with a tool that produces unique 16 bytes. As for validation, this simple code might work (assuming you are dealing with GUID's string representation:
bool ValidateGuid(string theGuid)
{
try { Guid aG = new Guid(theGuid); }
catch { return false; }
return true;
}
If you're looking for a way to determine if it's the format of the actual .Net Guid type, take a look at this article. A quick regex does the trick.
this question was already discussed in this post. You may find more interesting details
In .net 4, you can use this extension method
public static class GuidExtensions
{
public static bool IsGuid(this string value)
{
Guid output;
return Guid.TryParse(value, out output);
}
}
i wrote a extension for this
public static bool TryParseGuid(this Guid? guidString)
{
if (guidString != null && guidString != Guid.Empty)
{
if (Guid.TryParse(guidString.ToString(), out _))
{
return true;
}
else
return false;
}
return false;
}