Logic issue when saving URLs to file - c#

What my goal with my piece of code is not to save duplicate domains to a .txt file if a checkbox is ticked.
Code:
// save to file here
if (footprints.Any(externalUrl.Contains))
{
// Load all URLs into an array ...
var hash = new List<string>(File.ReadAllLines(#"Links\" + lblFootprintsUsed.Text));
// Find the domain root url e.g. site.com ...
var root = Helpers.GetRootUrl(externalUrl);
if (chkBoxDoNotSaveDuplicateDomains.Checked == true)
{
if (!hash.Contains(Helpers.GetRootUrl(externalUrl)))
{
using (var sr = new StreamWriter(#"Links\" + lblFootprintsUsed.Text, true))
{
// before saving make & to & and get rid of #038; altogether ...
var newURL = externalUrl.Replace("&", "&").Replace("#038;", " ");
sr.WriteLine(newURL);
footprintsCount++;
}
}
}
if (chkBoxDoNotSaveDuplicateDomains.Checked == false)
{
if (!hash.Contains(externalUrl))
{
using (var sr = new StreamWriter(#"Links\" + lblFootprintsUsed.Text, true))
{
// before saving make & to & and get rid of #038; altogether ...
var newURL = externalUrl.Replace("&", "&").Replace("#038;", " ");
sr.WriteLine(newURL);
footprintsCount++;
}
}
}
}
The code above starts off by checking if a certain footprint pattern is found in a URL structure, if it does we load all URLs into a List the way !hash.Contains(externalUrl) should work is NOT to add duplicate URLs to the .txt file, but i can see from testing it does add complete duplicate URLs (the first issue) i never noticed this before, then i tried to add !hash.Contains(Helpers.GetRootUrl(externalUrl)) which should not add duplicate domains to the .txt file.
So unchecked, the code should not add duplicate URLs to file.
And checked the code should not add duplicate domains to file.
Both seem to fail, i cannot see any issue in the code as such, is there anyhting i am missing or could do better? any help is appreciated.

Here you are adding the full URL to the file, but while checking you are comparing only with the root URL
Modify the condition
if (!hash.Contains(Helpers.GetRootUrl(externalUrl)))
to
if (!hash.Any(x => x.Contains(Helpers.GetRootUrl(externalUrl))))

Related

How to check if element exists in my for each loop

I need to check if a element exists basically and if it does I want to open a url then back to the original page and then continue writing as it was. I tried a few approaches but they kept giving throwing exceptions. I added comments to the lines in question. I just cant figure out how to implement it.
foreach (string line in File.ReadLines(#"C:\\tumblrextract\\in7.txt"))
{
if (line.Contains("#"))
{
searchEmail.SendKeys(line);
submitButton.Click();
var result = driver.FindElement(By.ClassName("invite_someone_success")).Text;
if (driver.FindElements(By.ClassName("invite_someone_failure")).Count != 0)
// If invite_someone_failure exists open this url
driver.Url = "https://www.tumblr.com/lookup";
else
// Then back to following page and continue searchEmail.SendKeys(line); submitButton.Click(); write loop
driver.Url = "https://www.tumblr.com/following";
using (StreamWriter writer = File.AppendText("C:\\tumblrextract\\out7.txt"))
{
writer.WriteLine(result + ":" + line);
}
}
}
What is the exception you are getting? probably it may be Null reference exception. Please consider adding Null check in your code for the following
if(By.ClassName("invite_someone_success") != null){
var result = driver.FindElement(By.ClassName("invite_someone_success")).Text;
}
Above is not verified/exact code, just a pseudo code
you are using selenium and your might throw exceptions in some lines of code you have there - also take in consideration that i don't know tumblr website and it's html structure.
But first:
You're in a foreach loop and everytime you load at least once a page, all your elements will Stale, so this lines:
var searchEmail = driver.FindElement(By.Name("follow_this"));
var submitButton = driver.FindElement(By.Name("submit"));
will probably Stale in the next execution. (ElementStaleException).
Paste them too after:
driver.Url = "https://www.tumblr.com/following";
Second:
when using FindElement method you have to make sure the element exists or an ElementNotFoundException will also be thrown.
var result = driver.FindElement(By.ClassName("invite_someone_success")).Text;
var isThere = driver.FindElements(By.ClassName("invite_someone_failure"));
the dotNet selenium client have a static (i believe) class to help with that it's the ExpectedCondition
that you can use to check if an element is present before trying to read it's text..
I Invite you to understand how selenium works, specially StaleElementReferenceException.
Have fun.

Issues creating and writing data to a CSV file using C#

I'm using C# Code in Ranorex 5.4.2 to create a CSV file, have data gathered from an XML file and then have it write this into the CSV file. I've managed to get this process to work but I'm experiencing an issue where there are 12 blank lines created beneath the gathered data.
I have a file called CreateCSVFile which creates the CSV file and adds the headers in, the code looks like this:
writer.WriteLine("PolicyNumber,Surname,Postcode,HouseNumber,StreetName,CityName,CountyName,VehicleRegistrationPlate,VehicleMake,VehicleModel,VehicleType,DateRegistered,ABICode");
writer.WriteLine("");
writer.Flush();
writer.Close();
The next one to run is MineDataFromOutputXML. The program I am automating provides insurance quotes and an output xml file is created containing the clients details. I've set up a mining process which has a variable declared at the top which shows as:
string _PolicyHolderSurname = "";
[TestVariable("3E92E370-F960-477B-853A-0F61BEA62B7B")]
public string PolicyHolderSurname
{
get { return _PolicyHolderSurname; }
set { _PolicyHolderSurname = value; }
}
and then there is another section of code which gathers the information from the XML file:
var QuotePolicyHolderSurname = (XmlElement)xmlDoc.SelectSingleNode("//cipSurname");
string QuotePolicyHolderSurnameAsString = QuotePolicyHolderSurname.InnerText.ToString();
PolicyHolderSurname = QuotePolicyHolderSurnameAsString;
Report.Info( "Policy Holder Surname As String = " + QuotePolicyHolderSurnameAsString);
Report.Info( "Quote Policy Holder Surname = " + QuotePolicyHolderSurname.InnerText);
The final file is called SetDataSource and it puts the information into the CSV file, there is a variable declared at the top like this:
string _PolicyHolderSurname = "";
[TestVariable("222D47D2-6F66-4F05-BDAF-7D3B9D335647")]
public string PolicyHolderSurname
{
get { return _PolicyHolderSurname; }
set { _PolicyHolderSurname = value; }
}
This is then the code that adds it into the CSV file:
string Surname = PolicyHolderSurname;
Report.Info("Surname = " + Surname);
dataConn.Rows.Add(new string[] { Surname });
dataConn.Store();
There are multiple items in the Mine and SetDataSource files and the output looks like this in Notepad++:
Picture showing the CSV file after the code has been run
I believe the problem lies in the CreateCSVFile and the writer.WriteLine function. I have commented this region out but it then produces the CSV with just the headers showing.
I've asked some of the developers I work with but most don't know C# very well and no one has been able to solve this issue yet. If it makes a difference this is on Windows Server 2012r2.
Any questions about this please ask, I can provide the whole files if needed, they're just quite long and repetitive.
Thanks
Ben Jardine
I had the exact same thing to do in Ranorex. Since the question is a bit old I didn't checked your code but here is what I did and is working. I found an example (probably on stack) creating a csv file in C#, so here is my adaptation for using in Ranorex UserCodeCollection:
[UserCodeCollection]
public class UserCodeCollectionDemo
{
[UserCodeMethod]
public static void ConvertXmlToCsv()
{
System.IO.File.Delete("E:\\Ranorex_test.csv");
XDocument doc = XDocument.Load("E:\\lang.xml");
string csvOut = string.Empty;
StringBuilder sColumnString = new StringBuilder(50000);
StringBuilder sDataString = new StringBuilder(50000);
foreach (XElement node in doc.Descendants(GetServerLanguage()))
{
foreach (XElement categoryNode in node.Elements())
{
foreach (XElement innerNode in categoryNode.Elements())
{
//"{0}," give you the output in Comma seperated format.
string sNodePath = categoryNode.Name + "_" + innerNode.Name;
sColumnString.AppendFormat("{0},", sNodePath);
sDataString.AppendFormat("{0},", innerNode.Value);
}
}
}
if ((sColumnString.Length > 1) && (sDataString.Length > 1))
{
sColumnString.Remove(sColumnString.Length-1, 1);
sDataString.Remove(sDataString.Length-1, 1);
}
string[] lines = { sColumnString.ToString(), sDataString.ToString() };
System.IO.File.WriteAllLines(#"E:\Ranorex_test.csv", lines);
}
}
For your information, a simple version of my xml looks like that:
<LANGUAGE>
<ENGLISH ID="1033">
<TEXT>
<IDS_TEXT_CANCEL>Cancel</IDS_TEXT_CANCEL>
<IDS_TEXT_WARNING>Warning</IDS_TEXT_WARNING>
</TEXT>
<LOGINCLASS>
<IDS_LOGC_DLGTITLE>Log In</IDS_LOGC_DLGTITLE>
</LOGINCLASS>
</ENGLISH>
<FRENCH ID="1036">
<TEXT>
<IDS_TEXT_CANCEL>Annuler</IDS_TEXT_CANCEL>
<IDS_TEXT_WARNING>Attention</IDS_TEXT_WARNING>
</TEXT>
<LOGINCLASS>
<IDS_LOGC_DLGTITLE>Connexion</IDS_LOGC_DLGTITLE>
</LOGINCLASS>
</FRENCH>
</LANGUAGE>

When using MergeField FieldCodes in OpenXml SDK in C# why do field codes disappear or fragment?

I have been working successfully with the C# OpenXml SDK (Unofficial Microsoft Package 2.5 from NuGet) for some time now, but have recently noticed that the following line of code returns different results depending on what mood Microsoft Word appears to be in when the file gets saved:
var fields = document.Descendants<FieldCode>();
From what I can tell, when creating the document in the first place (using Word 2013 on Windows 8.1) if you use the Insert->QuickParts->Field and choose MergeField from the Field names left hand pane, and then provide a Field name in the field properties and click OK then the field code is correctly saved in the document as I would expect.
Then when using the aforementioned line of code I will receive a field code count of 1 field. If I subsequently edit this document (and even leave this field well alone) the subsequent saving could mean that this field code no longer is returned in my query.
Another case of the same curiousness is when I see the FieldCode nodes split across multiple items. So rather than seeing say:
" MERGEFIELD Author \\* MERGEFORMAT "
As the node name, I will see:
" MERGEFIELD Aut"
"hor \\* MERGEFORMAT"
Split as two FieldCode node values. I have no idea why this would be the case, but it certainly makes my ability to match nodes that much more exciting. Is this expected behaviour? A known bug? I don't really want to have to crack open the raw xml and edit this document to work until I understand what is going on. Many thanks all.
I came across this very problem myself, and found a solution that exists within OpenXML: a utility class called MarkupSimplifier which is part of the PowerTools for Open XML project. Using this class solved all the problems I was having that you describe.
The full article is located here.
Here are some pertinent exercepts :
Perhaps the most useful simplification that this performs is to merge adjacent runs with identical formatting.
It goes on to say:
Open XML applications, including Word, can arbitrarily split runs as necessary. If you, for instance, add a comment to a document, runs will be split at the location of the start and end of the comment. After MarkupSimplifier removes comments, it can merge runs, resulting in simpler markup.
An example of the utility class in use is:
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
RemoveComments = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = false,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true,
};
MarkupSimplifier.SimplifyMarkup(wordDoc, settings);
I have used this many times with Word 2010 documents using VS2015 .Net Framework 4.5.2 and it has made my life much, much easier.
Update:
I have revisited this code and have found it clears upon runs on MERGEFIELDS but not IF FIELDS that reference mergefields e.g.
{if {MERGEFIELD When39} = "Y???" "Y" "N" }
I have no idea why this might be so, and examination of the underlying XML offers no hints.
Word will often split text runs with into multiple text runs for no reason I've ever understood. When searching, comparing, tidying etc. We preprocess the body with method which combines multiple runs into a single text run.
/// <summary>
/// Combines the identical runs.
/// </summary>
/// <param name="body">The body.</param>
public static void CombineIdenticalRuns(W.Body body)
{
List<W.Run> runsToRemove = new List<W.Run>();
foreach (W.Paragraph para in body.Descendants<W.Paragraph>())
{
List<W.Run> runs = para.Elements<W.Run>().ToList();
for (int i = runs.Count - 2; i >= 0; i--)
{
W.Text text1 = runs[i].GetFirstChild<W.Text>();
W.Text text2 = runs[i + 1].GetFirstChild<W.Text>();
if (text1 != null && text2 != null)
{
string rPr1 = "";
string rPr2 = "";
if (runs[i].RunProperties != null) rPr1 = runs[i].RunProperties.OuterXml;
if (runs[i + 1].RunProperties != null) rPr2 = runs[i + 1].RunProperties.OuterXml;
if (rPr1 == rPr2)
{
text1.Text += text2.Text;
runsToRemove.Add(runs[i + 1]);
}
}
}
}
foreach (W.Run run in runsToRemove)
{
run.Remove();
}
}
I tried to simplify the document with Powertools but the result was a corrupted word file. I make this routine for simplify only fieldcodes that has specifics names, works in all parts on the docs (maindocumentpart, headers and footers):
internal static void SimplifyFieldCodes(WordprocessingDocument document)
{
var masks = new string[] { Constants.VAR_MASK, Constants.INP_MASK, Constants.TBL_MASK, Constants.IMG_MASK, Constants.GRF_MASK };
SimplifyFieldCodesInElement(document.MainDocumentPart.RootElement, masks);
foreach (var headerPart in document.MainDocumentPart.HeaderParts)
{
SimplifyFieldCodesInElement(headerPart.Header, masks);
}
foreach (var footerPart in document.MainDocumentPart.FooterParts)
{
SimplifyFieldCodesInElement(footerPart.Footer, masks);
}
}
internal static void SimplifyFieldCodesInElement(OpenXmlElement element, string[] regexpMasks)
{
foreach (var run in element.Descendants<Run>()
.Select(item => (Run)item)
.ToList())
{
var fieldChar = run.Descendants<FieldChar>().FirstOrDefault();
if (fieldChar != null && fieldChar.FieldCharType == FieldCharValues.Begin)
{
string fieldContent = "";
List<Run> runsInFieldCode = new List<Run>();
var currentRun = run.NextSibling();
while ((currentRun is Run) && currentRun.Descendants<FieldCode>().FirstOrDefault() != null)
{
var currentRunFieldCode = currentRun.Descendants<FieldCode>().FirstOrDefault();
fieldContent += currentRunFieldCode.InnerText;
runsInFieldCode.Add((Run)currentRun);
currentRun = currentRun.NextSibling();
}
// If there is more than one Run for the FieldCode, and is one we must change, set the complete text in the first Run and remove the rest
if (runsInFieldCode.Count > 1)
{
// Check fielcode to know it's one that we must simplify (for not to change TOC, PAGEREF, etc.)
bool applyTransform = false;
foreach (string regexpMask in regexpMasks)
{
Regex regex = new Regex(regexpMask);
Match match = regex.Match(fieldContent);
if (match.Success)
{
applyTransform = true;
break;
}
}
if (applyTransform)
{
var currentRunFieldCode = runsInFieldCode[0].Descendants<FieldCode>().FirstOrDefault();
currentRunFieldCode.Text = fieldContent;
runsInFieldCode.RemoveAt(0);
foreach (Run runToRemove in runsInFieldCode)
{
runToRemove.Remove();
}
}
}
}
}
}
Hope this helps!!!

MultipartMemoryStreamProvider: filename?

I already asked here how I can read uploaded files in Web Api without the need to save them.
This question was answered with the MultipartMemoryStreamProvider, but how do I get the file name with this method to derive the type of the uploaded file from it?
Kind regards
There is an example in this DotNetNuke Code here (See the PostFile() method).
Updated based on #FilipW comment...
Get the content item you require and then access the filename property.
Something like this :
var provider = new MultipartMemoryStreamProvider();
var task = request.Content.ReadAsMultipartAsync(provider).
ContinueWith(o =>
{
//Select the appropriate content item this assumes only 1 part
var fileContent = provider.Contents.SingleOrDefault();
if (fileContent != null)
{
var fileName = fileContent.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);
}
});//Ending Bracket

Can ConfigurationManager retain XML comments on Save()?

I've written a small utility that allows me to change a simple AppSetting for another application's App.config file, and then save the changes:
//save a backup copy first.
var cfg = ConfigurationManager.OpenExeConfiguration(pathToExeFile);
cfg.SaveAs(cfg.FilePath + "." + DateTime.Now.ToFileTime() + ".bak");
//reopen the original config again and update it.
cfg = ConfigurationManager.OpenExeConfiguration(pathToExeFile);
var setting = cfg.AppSettings.Settings[keyName];
setting.Value = newValue;
//save the changed configuration.
cfg.Save(ConfigurationSaveMode.Full);
This works well, except for one side effect. The newly saved .config file loses all the original XML comments, but only within the AppSettings area. Is it possible to to retain XML comments from the original configuration file AppSettings area?
Here's a pastebin of the full source if you'd like to quickly compile and run it.
I jumped into Reflector.Net and looked at the decompiled source for this class. The short answer is no, it will not retain the comments. The way Microsoft wrote the class is to generate an XML document from the properties on the configuration class. Since the comments don't show up in the configuration class, they don't make it back into the XML.
And what makes this worse is that Microsoft sealed all of these classes so you can't derive a new class and insert your own implementation. Your only option is to move the comments outside of the AppSettings section or use XmlDocument or XDocument classes to parse the config files instead.
Sorry. This is an edge case that Microsoft just didn't plan for.
Here is a sample function that you could use to save the comments. It allows you to edit one key/value pair at a time. I've also added some stuff to format the file nicely based on the way I commonly use the files (You could easily remove that if you want). I hope this might help someone else in the future.
public static bool setConfigValue(Configuration config, string key, string val, out string errorMsg) {
try {
errorMsg = null;
string filename = config.FilePath;
//Load the config file as an XDocument
XDocument document = XDocument.Load(filename, LoadOptions.PreserveWhitespace);
if(document.Root == null) {
errorMsg = "Document was null for XDocument load.";
return false;
}
XElement appSettings = document.Root.Element("appSettings");
if(appSettings == null) {
appSettings = new XElement("appSettings");
document.Root.Add(appSettings);
}
XElement appSetting = appSettings.Elements("add").FirstOrDefault(x => x.Attribute("key").Value == key);
if (appSetting == null) {
//Create the new appSetting
appSettings.Add(new XElement("add", new XAttribute("key", key), new XAttribute("value", val)));
}
else {
//Update the current appSetting
appSetting.Attribute("value").Value = val;
}
//Format the appSetting section
XNode lastElement = null;
foreach(var elm in appSettings.DescendantNodes()) {
if(elm.NodeType == System.Xml.XmlNodeType.Text) {
if(lastElement?.NodeType == System.Xml.XmlNodeType.Element && elm.NextNode?.NodeType == System.Xml.XmlNodeType.Comment) {
//Any time the last node was an element and the next is a comment add two new lines.
((XText)elm).Value = "\n\n\t\t";
}
else {
((XText)elm).Value = "\n\t\t";
}
}
lastElement = elm;
}
//Make sure the end tag for appSettings is on a new line.
var lastNode = appSettings.DescendantNodes().Last();
if (lastNode.NodeType == System.Xml.XmlNodeType.Text) {
((XText)lastNode).Value = "\n\t";
}
else {
appSettings.Add(new XText("\n\t"));
}
//Save the changes to the config file.
document.Save(filename, SaveOptions.DisableFormatting);
return true;
}
catch (Exception ex) {
errorMsg = "There was an exception while trying to update the config value for '" + key + "' with value '" + val + "' : " + ex.ToString();
return false;
}
}
If comments are critical, it might just be that your only option is to read & save the file manually (via XmlDocument or the new Linq-related API). If however those comments are not critical, I would either let them go or maybe consider embedding them as (albeit redundant) data elements.

Categories