I can't seem to figure out why this bit of code is failing, it seems simple enough.
Code:
string[] ignore = File.ReadAllLines(#"logicfiles\[flag]-[ignore-these-links].txt");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(rawHtml);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
string linkUrl = link.GetAttributeValue("href", string.Empty);
if (!ignore.Any(linkUrl.Contains) && linkUrl.Length < 10 && !linkUrl.StartsWith("/"))
{
DataGridViewLinks.Rows.Add(linkUrl, keywordUsed, "", "", engineUsed);
}
}
The above code does not work as in it just adds every URL to the DataGrid this part !ignore.Any(linkUrl.Contains) is the part that is failing to work right, the ignore array contains strings like facebook, youtube etc if the url linkUrl does NOT contain one of these strings in it, then add it to the DataGrid (is how it should work)
But if i do this:
string[] ignore = File.ReadAllLines(#"logicfiles\[flag]-[ignore-these-links].txt");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(rawHtml);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
string linkUrl = link.GetAttributeValue("href", string.Empty);
if (linkUrl.Length < 10 && !linkUrl.StartsWith("/"))
{
DataGridViewLinks.Rows.Add(linkUrl, keywordUsed, "", "", engineUsed);
}
}
And take that part of the code away, the other 2 conditions work perfectly, so I know the part of the logic not working is !ignore.Any(linkUrl.Contains)
I cannot see why, if someone could point out the issue it would be appreciated.
Your Contains logic is fine. There may be something wrong with the values that are being passed in from the text file. An upper / lower case issue or similar.
I recommend printing out or otherwise inspecting both the parsed url values, and the filter strings coming in from the text file and ensuring you're comparing what you think you are.
Here is just the logic for the Contains with a set of values showing it working:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
var linkUrls = new List<string>{
"https://youtube.com/2134",
"https://google.com/2134",
"https://microsoft.com/2134"
};
var ignores = new List<string>{
"youtube",
"somethingElse"
};
foreach (var linkUrl in linkUrls)
{
if (!ignores.Any(linkUrl.Contains))
{
Console.WriteLine($"passed filter: {linkUrl}");
}
}
}
}
Output:
passed filter: https://google.com/2134
passed filter: https://microsoft.com/2134
See:
https://dotnetfiddle.net/8wud9O
Related
I have a folder with a lot of files like this:
2016-01-02-03-abc.txt
2017-01-02-03-defjh.jpg
2018-05-04-03-hij.txt
2022-05-04-03-klmnop.jpg
I need to extract the pattern from each group of filenames.
For example, I need the pattern 01-02-03 from the first two files placed in a list. I also need the pattern 05-04-03 placed in the same list. So, my list will look like this:
01-02-03
05-04-03
Here is what I have so far. I can successfully remove the characters but getting one instance of a pattern back into a list is beyond my pay grade:
public void GetPatternsToList()
{
//Get all filenames with characters removed and place in listbox.
List<string> files = new List<string>(Directory.EnumerateFiles(folderBrowserDialog1.SelectedPath));
foreach (var file in files)
{
var removeallbeforefirstdash = file.Substring(file.IndexOf("-") + 1); // removes everthing before the dash in the filename
var finalfile = removeallbeforefirstdash.Substring(0,removeallbeforefirstdash.LastIndexOf("-")); // removes everything after dash in name -- will crash if file without dash is in folder (not sure how to fix this either)
string[] array = finalfile.ToArray(); // I need to do the above with each file in the list and then place it back in an array to display in a listbox
List<string> filesList = array.ToList();
listBox1.DataSource = filesList;
}
}
You could do it this way:
public void GetPatternsToList()
{
var files = Directory.GetFiles(folderBrowserDialog1.SelectedPath);
var patterns = new HashSet<string>();
foreach (var file in files)
{
var splitFileName = file.Split('-').Skip(1).Take(3);
var joinedFileName = string.Join("-", splitFileName);
if(!string.IsNullOrEmpty(joinedFileName)
patterns.Add(joinedFileName);
}
listBox1.DataSource = patterns;
}
I used a HashSet<string> in order to avoid adding duplicate patterns to the DataSource.
A few remarks that aren't related to your question, but your code in general:
I would pass the SelectedPath as a string to the method
I would let the method return you the HashSet
If you implement the above, please also name the method accordingly
All of the above is of course optional for you, but would improve your code quality.
Try this:
public void GetPatternsToList()
{
List<string> files = new List<string>(Directory.EnumerateFiles(folderBrowserDialog1.SelectedPath));
List<string> resultFiles = new List<string>();
foreach (var file in files)
{
var removeallbeforefirstdash = file.Substring(file.IndexOf("-") + 1); // removes everthing before the dash in the filename
var finalfile = removeallbeforefirstdash.Substring(0, removeallbeforefirstdash.LastIndexOf("-")); // removes everything after dash in name -- will crash if file without dash is in folder (not sure how to fix this either)
resultFiles.Add(finalfile);
}
listBox1.DataSource = resultFiles.Distinct().ToList();
}
I have a list(in txt file) that looks like this field:description
field20D.name = Reference
field20[101].name = Sender's Reference
field20[102].name = File Reference
field20[102_STP].name = File Reference
field20[103].name = Sender's Reference
The numbers in [] like 101,102 are messagetype.
How can i write the code so when i have a property with any value in that list, to get the equivalent description for it.
example: when a field has a value "20D" to build a string "20D - Reference"
Here's a good class that can do what you stated above.
using System;
using System.Collections.Generic;
namespace TestConsoleProject
{
public class WeirdLineFormatReader
{
public IEnumerable<Tuple<string, string>> ReadLines(IEnumerable<string> lines)
{
foreach (string line in lines)
{
// split each line on the =
string[] strLineArray = line.Split('=');
// get the first and second values of the split line
string field = strLineArray[0];
string value = strLineArray[1];
// remove the first field word
field = field.Substring("field".Length);
// remove the .name portion
field = field.Replace(".name", "");
// remove the surrounding white-space
field = field.Trim();
// remove all white space before/after the description
value = value.Trim();
yield return new Tuple<string, string>(field, value);
}
}
}
}
Here's a quick console project that will use the class to output your format to the console the way you want.
using System;
using System.IO;
namespace TestConsoleProject
{
class Program
{
static void Main(string[] args)
{
var lines = File.ReadLines(args[0]);
var reader = new WeirdLineFormatReader();
var tuples = reader.ReadLines(lines);
foreach (var tuple in tuples)
Console.WriteLine("{0} - {1}", tuple.Item1, tuple.Item2);
Console.ReadKey();
}
}
}
Just for the fun of it and also because I suspect you are only showing us a couple of the lines in a much larger text file; here's a format for unit testing when you find that you need to add more code to the ReadLines(string[]) method later.
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Linq;
using TestConsoleProject;
namespace UnitTestProject
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestFormatter_WithoutBrackets()
{
// Arrange
var reader = new WeirdLineFormatReader();
string[] lines = {
"field20D.name = Reference"
};
// Act
var tuples = reader.ReadLines(lines).ToList();
// Assert
Assert.AreEqual(tuples[0].Item1, "20D", "Field 20D did not format correctly, Actual:" + tuples[0].Item1);
Assert.AreEqual(tuples[0].Item2, "Reference", "Field 20D's description did not format correctly, Actual:" + tuples[0].Item2);
}
[TestMethod]
public void TestFormatter_WithBrackets()
{
// Arrange
var reader = new WeirdLineFormatReader();
string[] lines = {
"field20[103].name = Sender's Reference"
};
// Act
var tuples = reader.ReadLines(lines).ToList();
// Assert
Assert.AreEqual(tuples[0].Item1, "20[103]", "Field 20[103] did not format correctly, Actual:" + tuples[0].Item1);
Assert.AreEqual(tuples[0].Item2, "Sender's Reference", "Field 20[103]'s description did not format correctly, Actual:" + tuples[0].Item2);
}
}
}
Using this unit test project you can quickly write new tests for edge cases that you discover. After you modify the ReadLines() method, you can re-run all the unit tests to see if you broke any of the older tests.
Pseudo:
create a dictionary of key value pairs
split on carriage return
split on equals sign (can there be an equals sign in the value?)
regex the first half to get everything between 'field' and '.name', put that in the key
put the split from after the equals sign in the value
now you can reference your dictionary by key:
entry = myDictionary["20D"];
return $"{entry.Key} - {entry.value}";
I wanna do a list without duplicates from a file which have too many lines with identifier, sometimes repeated. When I try using List<string>.Contains, it doesn't work. This is, I think, because I'm adding object instead of strings directly.
public List<string> obterRelacaoDeBlocos()
{
List<string> listaDeBlocos = new List<string>();
foreach(string linhas in arquivos.obterLinhasDoArquivo())
{
string[] linhaQuebrada = linhas.Split('|');
string bloco = linhaQuebrada[1].ToString();
if (listaDeBlocos.Contains((string)bloco) != true)
{
listaDeBlocos.Add( bloco + ":" + listaDeBlocos.Contains(bloco).ToString());
}
}
return listaDeBlocos;
}
You're appending ":" + listaDeBlocos.Contains(bloco).ToString() to the string before you add it to the list. That's not going to match when you encounter the same word again, so Contains will return false and the same word will get added again.
I don't see what point it serves to append ": true" to the end of each string in the list anyway, so just remove that part and it should work.
if (!listaDeBlocos.Contains(bloco))
{
listaDeBlocos.Add(bloco);
}
Since you're only interested in one part of each string, based on how you're splitting, you could rewrite your method using LINQ. This is untested but should work:
public List<string> obterRelacaoDeBlocos()
{
return arquivos.obterLinhasDoArquivo().Select(x => x.Split('|')[1]).Distinct().ToList();
}
I'm new to Roslyn. I'm writing a code fix provider that transforms foreach blocks that iterate through the results of a Select, e.g.
foreach (var item in new int[0].Select(i => i.ToString()))
{
...
}
to
foreach (int i in new int[0])
{
var item = i.ToString();
...
}
To do this, I need to insert a statement at the beginning of the BlockSyntax inside the ForEachStatementSyntax that represents the foreach block. Here is my code for that:
var blockStatement = forEach.Statement as BlockSyntax;
if (blockStatement == null)
{
return document;
}
forEach = forEach.WithStatement(
blockStatment.WithStatements(
blockStatement.Statements.Insert(0, selectorStatement));
Unfortunately, doing that results in the whitespace being off:
foreach (int i in new int[0])
{
var item = i.ToString();
...
}
I Googled solutions for this. I came across this answer, which recommended using either Formatter.Format or SyntaxNode.NormalizeWhitespace.
I can't use Formatter.Format because that takes a Workspace parameter, and it looks I don't have access to a Workspace per Roslyn: Current Workspace in Diagnostic with code fix project.
I tried using NormalizeWhitespace() on the syntax root of the document, but that invasively formatted other code not related to the fix. I tried using it on just the ForEachStatementSyntax associated with the foreach block, and then calling syntaxRoot = syntaxRoot.ReplaceNode(oldForEach, newForEach), but that results in the entire foreach block not being properly indented.
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var array = new int[0];
int length = array.Length;
foreach (int i in array)
{
string item = i.ToString();
} }
}
}
So is it possible to simply insert the statement with the correct indentation in the first place, without having to format other code?
Thanks.
You can add the Formatter Annotation to the nodes that you want the formatter to run on using WithAdditionalAnnotations
blockStatement.Statements.Insert(0, selectorStatement.WithAdditionalAnnotations(Formatter.Annotation))
I have seen several posts giving examples of how to read from text files, and examples on how to make a string 'public' (static or const), but I haven't been able to combine the two inside a 'function' in a way that is making sense to me.
I have a text file called 'MyConfig.txt'.
In that, I have 2 lines.
MyPathOne=C:\TestOne
MyPathTwo=C:\TestTwo
I want to be able to read that file when I start the form, making both MyPathOne and MyPathTwo accessible from anywhere inside the form, using something like this :
ReadConfig("MyConfig.txt");
the way I am trying to do that now, which is not working, is this :
public voice ReadConfig(string txtFile)
{
using (StreamReader sr = new StreamResder(txtFile))
{
string line;
while ((line = sr.ReadLine()) !=null)
{
var dict = File.ReadAllLines(txtFile)
.Select(l => l.Split(new[] { '=' }))
.ToDictionary( s => s[0].Trim(), s => s[1].Trim());
}
public const string MyPath1 = dic["MyPathOne"];
public const string MyPath2 = dic["MyPathTwo"];
}
}
The txt file will probably never grow over 5 or 6 lines, and I am not stuck on using StreamReader or dictionary.
As long as I can access the path variables by name from anywhere, and it doesn't add like 400 lines of code or something , then I am OK with doing whatever would be best, safest, fastest, easiest.
I have read many posts where people say the data should stored in XML, but I figure that part really doesn't matter so much because reading the file and getting the variables part would be almost the same either way. That aside, I would rather be able to use a plain txt file that somebody (end user) could edit without having to understand XML. (which means of course lots of checks for blank lines, does the path exist, etc...I am OK with doing that part, just wanna get this part working first).
I have read about different ways using ReadAllLines into an array, and some say to create a new separate 'class' file (which I don't really understand yet..but working on it). Mainly I want to find a 'stable' way to do this.
(project is using .Net4 and Linq by the way)
Thanks!!
The code you've provided doesn't even compile. Instead, you could try this:
public string MyPath1;
public string MyPath2;
public void ReadConfig(string txtFile)
{
using (StreamReader sr = new StreamReader(txtFile))
{
// Declare the dictionary outside the loop:
var dict = new Dictionary<string, string>();
// (This loop reads every line until EOF or the first blank line.)
string line;
while (!string.IsNullOrEmpty((line = sr.ReadLine())))
{
// Split each line around '=':
var tmp = line.Split(new[] { '=' },
StringSplitOptions.RemoveEmptyEntries);
// Add the key-value pair to the dictionary:
dict[tmp[0]] = dict[tmp[1]];
}
// Assign the values that you need:
MyPath1 = dict["MyPathOne"];
MyPath2 = dict["MyPathTwo"];
}
}
To take into account:
You can't declare public fields into methods.
You can't initialize const fields at run-time. Instead you provide a constant value for them at compilation time.
Got it. Thanks!
public static string Path1;
public static string Path2;
public static string Path3;
public void ReadConfig(string txtFile)
{
using (StreamReader sr = new StreamReader(txtFile))
{
var dict = new Dictionary<string, string>();
string line;
while (!string.IsNullOrEmpty((line = sr.ReadLine())))
{
dict = File.ReadAllLines(txtFile)
.Select(l => l.Split(new[] { '=' }))
.ToDictionary( s => s[0].Trim(), s => s[1].Trim());
}
Path1 = dict["PathOne"];
Path2 = dict["PathTwo"];
Path3 = Path1 + #"\Test";
}
}
You need to define the variables outside the function to make them accessible to other functions.
public string MyPath1; // (Put these at the top of the class.)
public string MyPath2;
public voice ReadConfig(string txtFile)
{
var dict = File.ReadAllLines(txtFile)
.Select(l => l.Split(new[] { '=' }))
.ToDictionary( s => s[0].Trim(), s => s[1].Trim()); // read the entire file into a dictionary.
MyPath1 = dict["MyPathOne"];
MyPath2 = dict["MyPathTwo"];
}
This question is similar to Get parameters out of text file
(I put an answer there. I "can't" paste it here.)
(Unsure whether I should "flag" this question as duplicate. "Flagging" "closes".)
(Do duplicate questions ever get consolidated? Each can have virtues in the wording of the [often lame] question or the [underreaching and overreaching] answers. A consolidated version could have the best of all, but consolidation is rarely trivial.)