Ignoring a line on Comparing two strings - c#

I need to compare two strings representing an html (something like 300 lines both). They should be identical, except a line which contains a date in this format dd/MM/yyyy hh:mm:ss, so I need to ignore that line.
The problem is that I have a static file containing one html which I use as the base in comparing, and the other one I get on runtime from a URL. So this line with that date will be always different.
The line doesn't have any identifier tag, like id or name, even the parent elements doesn't have nothing to identify it. So, what options do I have to ignore this line in the comparing method?

Remove the date time with a Regex.Replace, then compare the strings.

You can try to find wich is the position in the string of the sequence of chars that define the date line.
Suppose your date line starts with "mydate".
Get the first part of the string from index 0 to indexOf("mydate") from the two files and compare them (if you do not find "mydate", then something is really different, the date line was not found).
Then get the second part of the string from the index of what should be directly after the date line from the two files and compare them.

You can remove both datetimes from both htmls using regex, then compare them.

A simple solution consist in identifying the characters of static HTML (s1) that are not identical to the HTML (S2) got from URL.
A prerequisite is to update the static HTML s1 by replacing the DateTime by a string like "##.##.##.##.##.##" insuring that all characters of this string cannot match any char (including separators) of the DateTime in s2.
string originalDateTimeString = "##.##.##.##.##.##" ;
// check to see if same length
bool compareok=s1.Length==s2.Length ;
// check all char. when different store char in diff1
string diff1="" ;
int lastDiffIndex =-1 ;
for (int i=0;i<s1.Length && compareok; i ++) if(s1[i]!=s2[i])
{ // Check if differences are consecutive
compareok = lastDiffIndex==-1 || lastDiffIndex==i-1 ;
diff1+=s1[i] ;
lastDiffIndex=i ;
}
// The comparison succeeds if the differences matches the original DateTime string
compareok = compareok && diff1==originalDateTimeString ;

Related

How do you find a delimited/isolated substring with string.contains?

I am trying to parse out and identify some values from strings that I have in a list.
I am using string.Contains to identify the value im looking for, but I am getting hits even if the value is surrounded by other text. How can I make sure I only get a hit if the value is isolated?
Example parse:
Looking for value = "302"
string sale =
"199708. (30), italiano, delim fabricata modella, serialNumber302. tnr F18529302E.";
var result = sale.ToLower().Contains(”302”));
In this example I will get a hit for "serialNumber302" and "F18529302E", which in the context is incorrect since I only want a hit if it finds “302” isolated, like “dontfind302 shouldfind 302”.
Any ideas on how to do this?
If you try Regex, you can define a word boundary using \b:
string sale =
"199708. (30), italiano, delim fabricata modella, serialNumber302. tnr F18529302E.";
bool result = Regex.IsMatch(sale, #"\b302\b"); // false
sale = "A string with 302 isolated";
result = Regex.IsMatch(sale, #"\b302\b"); // true
So 302 will only be found if it is at the start of the string, at the end of the string, or if it is surrounded by non-word characters i.e. not a-z A-Z 0-9 or _
EDIT: From the comments I realiſed that it waſn't clear whether or not "serialNum302" ſhould get a hit. I aſſumed ſo in this anſwer.
I ſee a few eaſy ways you could do this:
1) If the input is always a number as in the example, one option would be to only ſearch for ſubſtrings not ſurrounded by more numbers, by examining all the reſults of an initial ſearch and comparing their neighboring characters againſt the ſtring "0123456789". I really don't think this is the beſt option though, becauſe ſooner or later it's goïng to break when it miſinterprets one of the other bits of data.
2) If the ſtring sale always has the ſeriäl number in the format "serialNumber[Num]", inſtead of juſt looking for Num, look for "serialNumber" + Num, as this is leſs likely to be meſſed up with the other data.
3) From your ſtring, it looks like you have a ſtandardized format that's beïng introduced to the ſyſtem. In this caſe, parſe it in a ſtandardized way, e.g. by ſplitting it into ſubſtrings at the commas, then parſing each ſubſtring differently as it requires.

Converting string array to int

I'm having a weird problem, trying to take a string from a string array
and convert it to an integer.
Take a look at this code snippet:
string date = "‎21/‎07/‎2010 ‏‎13:50";
var date1 = date.Split(' ')[0];
string[] dateArray = date1.Split('/');
string s = "21";
string t1 = dateArray[0];
bool e = string.Compare(s, t1) == 0; //TRUE
int good = Convert.ToInt32(s); //WORKING!
int bad = Convert.ToInt32(t1); //Format exception - Input string was not in a correct format.
Can someone please explain why the conversion with s works, while with t1 fails?
Your string is full of hidden characters, causing it to break. There's four U+200E and one U+200F
Here's a clean string to try on:
string date = "21/07/2010 13:50";
Why do you use string.Compare(s, t1) == 0 to test if the strings are equal? This overload of Compare does a culture sensitive comparison. But it doesn't mean that the strings are identical. To check if the strings consist of identical "sequences" of char values, use ordinal comparison. Ordinal comparison can be done, for example, with
bool e = s == t1;
In your case, the strings have different Lengths, and they also differ on the first index, s[0] != t1[0].
Your string date contains right-to-left marks and left-to-right marks. This may happen because you copy-paste from an Arabic text (or another language written in the "wrong" direction).
To remove these characters in the ends of your string (not in the middle), you can use something like
t1 = t1.Trim('\u200E', '\u200F');

C# Regex.Match to decimal

I have a string "-4.00 %" which I need to convert to a decimal so that I can declare it as a variable and use it later. The string itself is found in string[] rows. My code is as follows:
foreach (string[] row in rows)
{
string row1 = row[0].ToString();
Match rownum = Regex.Match(row1.ToString(), #"\-?\d+\.+?\d+[^%]");
string act = Convert.ToString(rownum); //wouldn't convert match to decimal
decimal actual = Convert.ToDecimal(act);
textBox1.Text = (actual.ToString());
}
This results in "Input string was not in a correct format." Any ideas?
Thanks.
I see two things happening here that could contribute.
You are treating the Regex Match as though you expect it to be a string, but what a Match retrieves is a MatchGroup.
Rather than converting rownum to a string, you need to lookat rownum.Groups[0].
Secondly, you have no parenthesised match to capture. #"(\-?\d+\.+?\d+)%" will create a capture group from the whole lot. This may not matter, I don't know how C# behaves in this circumstance exactly, but if you start stretching your regexes you will want to use bracketed capture groups so you might as well start as you want to go on.
Here's a modified version of your code that changes the regex to use a capturing group and explicitly look for a %. As a consequence, this also simplifies the parsing to decimal (no longer need an intermediary string):
EDIT : check rownum.Success as per executor's suggestion in comments
string[] rows = new [] {"abc -4.01%", "def 6.45%", "monkey" };
foreach (string row in rows)
{
//regex captures number but not %
Match rownum = Regex.Match(row.ToString(), #"(\-?\d+\.+?\d+)%");
//check for match
if(!rownum.Success) continue;
//get value of first (and only) capture
string capture = rownum.Groups[1].Value;
//convert to decimal
decimal actual = decimal.Parse(capture);
//TODO: do something with actual
}
If you're going to use the Match class to handle this, then you have to access the Match.Groups property to get the collection of matches. This class assumes that more than one occurrence appears. If you can guarantee that you'll always get 1 and only 1 you could get it with:
string act = rownum.Groups[0];
Otherwise you'll need to parse through it as in the MSDN documentation.

Thousand separated value to integer

I want to convert a thousand separated value to integer but am getting one exception.
double d = Convert.ToDouble("100,100,100");
is working fine and getting d=100100100
int n = Convert.ToInt32("100,100,100");
is getting one format exception
Input string was not in a correct format
Why?
try this:
int i = Int32.Parse("100,100,100", NumberStyles.AllowThousands);
Note that the Parse method will throw an exception on an invalid string, so you might also want to check out the TryParse method as well:
string s = ...;
int i;
if (Int32.TryParse(s, NumberStyles.AllowThousands, CultureInfo.InvariantCulture, out i))
{
// if you are here, you were able to parse the string
}
What Convert.ToInt32 is actually calling in your example is Int32.Parse.
The Int32.parse(string) method only allows three types of input: white space, a sign, and digits. In the following configuration [ws][sign]digits[ws] (in brackets are optional).
Since your's contained commas, it threw an exception.
Because you're supposed to specify a string containing a plain integer number (maybe preceded by +/- sign), with no thousands separator. You have to replace the separator befor passing the string to the ToInt32 routine.
You can't have separators, just numbers 0 thru 9, and an optional sign.
http://msdn.microsoft.com/en-us/library/sf1aw27b.aspx

Copy first few strings separated by a symbol in c#

I have a string consist of integer numbers followed by "|" followed by some binary data.
Example.
321654|<some binary data here>
How do i get the numbers in front of the string in the lowest resource usage possible?
i did get the index of the symbol,
string s = "321654654|llasdkjjkwerklsdmv"
int d = s.IndexOf("|");
string n = s.Substring(d + 1).Trim();//did try other trim but unsuccessful
What to do next? Tried copyto but copyto only support char[].
Assuming you only want the numbers before the pipe, you can do:
string n = s.Substring(0, d);
(Make it d + 1 if you want the pipe character to also be included.)
I might be wrong, but I think you are under the impression that the parameter to string.Substring(int) represents "length." It does not; it represents the "start-index" of the desired substring, taken up to the end of the string.
s.Substring(0,d);
You can use String.Split() here is a reference http://msdn.microsoft.com/en-us/library/ms228388%28VS.80%29.aspx
string n = (s.Split("|"))[0] //this gets you the numbers
string o = (s.Split("|"))[1] //this gets you the letters

Categories