I am working on a project with reads in 2 csv files:
var myFullCsv = ReadFile(myFullCsvFilePath);
var masterCsv = ReadFile(csvFilePath);
and then creates a new var containing the extra lines that exist in myFullCsv but not master Csv. The code is great because of its simplicity:
var extraFilesCsv = myFullCsv.Except(masterCsv);
The csv files read in contain data like this:
c01.jpg,95182,24f77a1e,\Folder1\FolderA\,
c02.jpg,131088,c17b1f13,\Folder1\FolderA\,
c03.jpg,129485,ddc964ec,\Folder1\FolderA\,
c04.jpg,100999,930ee633,\Folder1\FolderA\,
c05.jpg,101638,b89f1f28,\Folder1\FolderA\,
However, I have just found a situation where the case of some characters in each file does not match. For example (JPG in caps):
c01.JPG,95182,24f77a1e,\Folder1\FolderA\,
If the data is like this then it is not included in extraFilesCsv but I need it to be. Can anybody tell me how I can make this code insensitive to the case of the text?
Edit: Sorry, I forgot that ReadFile was not a standard command. Here is the code:
public static IEnumerable<string> ReadFile(string path)
{
string line;
using (var reader = File.OpenText(path))
while ((line = reader.ReadLine()) != null)
yield return line;
}
I'm assuming you've read in both csv files and have a collection of strings representing each file.
You can specify a specific EqualityComparer in the call to Except(), which instructs on the type of comparison to do between two collections of objects.
You can create your own comparer or, assuming both collections are of strings, try specifying an existing one that ignores case:
var extraFilesCsv
= myFullCsv.Except(masterCsv, StringComparer.CurrentCultureIgnoreCase);
By default, if you don't specify a comparer, it uses EqualityComparer<TElement>.Default, which differs based on the class type you're comparing.
For strings, it first does a straight-up a==b comparison by default, which is case-sensitive. (The exact implementation on the string class is a little more complicated, but it's probably unnecessary to post it here.)
Related
at line 161,I want to insert my text in parameter t,but it won't change when i debug it.although the parameter tmp had alredy changed.
I want to change this Text in UI,when my parameter t changes.
With respect to your specific issue, Insert is defined as:
public string Insert (int startIndex, string value);
and returns a new string. In C#, strings aren't modified, new strings are created. In this way, they act like a value type, even though they're a reference type. In other words, once a string is created, it is never modified - it's 'immutable'. So, you need to store your newly created string.
In cases like this, I like to use the string interpolation, as it allows me to get a slightly clearer representation of what the final string will look like.
var tmp = System.Text.Encoding.UTF8.GetString ( e.Message );
t.text = $"{tmp}\n{t.text}"; // Note that a newline is represented as \n
Or, if you add the System.Text namespace; you could reduce it down to:
using System.Text;
...
t.text = $"{Encoding.UTF8.GetString ( e.Message )}\n{t.text}";
The string type in c# is immutable, therefore Insert returns a new string instead of modifying the current one.
Do:
t = t.text.Insert(0, tmp + "//n");
See also
How to modify string contents in C#
I just discovered this nice tool XmlUnit that allows me to evaluate 2 different XML documents and display the eventual discrepencies.
string control = "<a><b attr=\"abc\"></b></a>";
string test = "<a><b attr=\"xyz\"></b></a>";
var myDiff = DiffBuilder.Compare(Input.FromString(control))
.WithTest(Input.FromString(test))
.Build();
Assert.IsFalse(myDiff.HasDifferences(), myDiff.ToString());
However, I have found that the myDiff.ToString() only displays the first difference encountered.
Is there a way to display them all ?
I just found the solution
Assert.IsFalse(myDiff.HasDifferences(), string.Join(Environment.NewLine, myDiff.Differences));
I assume that you are using the xmlunit.net library (You didn't say the name of the tool that you found but your example seems to match).
You can search their GitHub repo and find the DiffBuilder class file. If you look at the Build method you will see it returns a Diff object. If you go to the Diff class file you will find that it's ToString method looks like this.
public override string ToString() {
return ToString(formatter);
}
Which doesn't tell you a lot but if you go to the other ToString overload you find this.
public string ToString(IComparisonFormatter formatter) {
if (!HasDifferences()) {
return "[identical]";
}
return differences.First().Comparison.ToString(formatter);
}
Now we are getting somewhere. We now know that Diff stores its list of differences in a private differences field and why ToString() only returns one difference (The .First() call). If you look through that class you will find that there's a public property called Differences which exposes that field as an IEnumerable. So the way to get all differences is to loop through that property and collect all of them like so.
string control = "<a><b attr=\"abc\" attr2=\"123\"></b></a>";
string test = "<a><b attr=\"xyz\" attr2=\"987\"></b></a>";
var myDiff = DiffBuilder.Compare(Input.FromString(control))
.WithTest(Input.FromString(test))
.Build();
var sb = new StringBuilder();
foreach(var dif in myDiff.Differences)
{
sb.AppendLine(dif.Comparison.ToString());
}
Assert.IsFalse(myDiff.HasDifferences(), sb.ToString());
Note that I got the syntax for formatting the difference from the Diff class's ToString code. Also notice that I added a second attribute to your examples to demonstrate that this really is showing all the differences.
I'm having some issues with the string comparison of a string the is received by Request.queryString and a line from a file .resx.
The code receive Request.queryString to a variable named q, then it goes to a function to compare if a line has q value in it:
while ((line = filehtml.ReadLine()) != null)
{
if (line.ToLower().Contains(q.ToLower().ToString()))
HttpContext.Current.Response.Write("<b>Content found!</b>");
else
HttpContext.Current.Response.Write("<b>Content not found!</b>");
}
As it's a search in static files, special characters must be consider and seraching for: Iberê for example, isn't returning true because the .Contains, .IndexOf or .LastindexOf is comparing: iberê, that is coming from q, with iberê that is coming from the line.
Consider that I already tried to use ResXResourceReader (which can't be found by Visual Studio), ResourceReader and ResourceManager (these I couldn't set a static file by the path to be read).
EDIT:
Problem solved. There was a instance of SpecialChars, overwriting q value with EntitiesEncode method
The problem is that the ê character is escaped in both strings. So if you did something like this, it wouldn't work:
string line = "sample iberê text";
string q = "iberê";
if (line.Contains(q)) {
// do something
}
You need to unscape the strings. Use HttpUtility in the System.Web assembly. This will work:
line = System.Web.HttpUtility.HtmlDecode(line);
q = System.Web.HttpUtility.HtmlDecode(q);
if (line.Contains(q)) {
// do something
}
As suggested by #r3bel below, if you're using .net 4 or above you can also use System.Net.WebUtility.HtmlDecode, so you don't need an extra assembly reference.
In C++, we can define a custom locale that enables stream object to ignore non-digits in the file, and reads only the integers.
Can we do something similar? How can we efficiently read only integers from a text file? Does C# stream object use locale? If yes, can we define custom locale that we can use with stream object so as to ignore unwanted characters while reading the file?
Here is one example in C++ which efficiently counts frequency of words in a text file:
Elegant ways to count the frequency of words in a file
My proposal:
public void ReadJustNumbers()
{
Regex r = new Regex(#"\d+");
using (var sr = new StreamReader("xxx"))
{
string line;
while (null != (line=sr.ReadLine()))
{
foreach (Match m in r.Matches(line))
{
Console.WriteLine(m.Value);
}
}
}
}
where xxx is the file name, obviously you will use the matching digit in a more elegant way than dumping on the console ;)
I would like to know that if I have an english dictionary in a text file what is the best way to check whether a given string is a proper and correct english word. My dictionary contains about 100000 english words and I have to check on an average of 60000 words in one go. I am just looking for the most efficient way. Also should I store all the strings first or I just process them as they are generated.
Thanx
100k is not too great a number, so you can just pop everything in a Hashset<string>.
Hashset lookup is key-based, so it will be lightning fast.
example how this might look in code is:
string[] lines = File.ReadAllLines(#"C:\MyDictionary.txt");
HashSet<string> myDictionary = new HashSet<string>();
foreach (string line in lines)
{
myDictionary.Add(line);
}
string word = "aadvark";
if (myDictionary.Contains(word))
{
Console.WriteLine("There is an aadvark");
}
else
{
Console.WriteLine("The aadvark is a lie");
}
You should probably use HashSet<string> if you're using .NET 3.5 or higher.
Just load the dictionary of valid words into a HashSet<string> and then either use Contains on each candidate string, or use some of the set operators to find all words which aren't valid.
For example:
// There are loads of ways of loading words from a file, of course
var valid = new HashSet<string>(File.ReadAllLines("dictionary.txt"));
var candidates = new HashSet<string>(File.ReadAllLines("candidate.txt"));
var validCandidates = candidates.Intersect(valid);
var invalidCandidates = candidates.Except(valid);
You may also wish to use case-insensitive comparisons or something similar - use the StringComparer static properties to get at appropriate instances of StringComparer which you can pass to the HashSet constructor.
If you're using .NET 2, you can use a Dictionary<string, whatever> as a poor-man's set - basically use whatever you like as the value, and just check for keys.