I'm trying to make a C# program that takes one XML file and turns it into an HTML file, the most obvious way to do such would be with an HtmlTextWriter object, yet I questing needing 6 lines of code to write one tag, an attribute, a line of content, and a closing tag. Is there a cleaner / more efficient way to do this?
The program is using an XML file (format defined by XML Schema) to customize and populate an HTML template with data. An example is shown below:
static string aFileName;
static XmlDocument aParser;
static HtmlTextWriter HTMLIOutput;
static StringWriter HTMLIBuffer;
static StreamWriter HTMLOutIFile;
static HtmlTextWriter HTMLEOutput;
static StringWriter HTMLEBuffer;
static StreamWriter HTMLOutEFile;
HTMLIBuffer = new StringWriter();
HTMLIOutput = new HtmlTextWriter(HTMLIBuffer);
XmlElement feed = aParser.DocumentElement;
HTMLIOutput.WriteBeginTag("em");
HTMLIOutput.WriteAttribute("class", "updated");
HTMLIOutput.Write(HtmlTextWriter.TagRightChar);
HTMLIOutput.Write("Last updated: " +
feed.SelectSingleNode("updated").InnerText.Trim());
HTMLIOutput.WriteEndTag("em");
HTMLIOutput.WriteLine();
HTMLIOutput.WriteLine("<br>");
To write something such as <em class="updated">Last updated: 07/16/2018</em><br />, do I really need to have so many different lines just constructing parts of a tag?
Note: Yes, I could write the contents to the file directly, but if possible I would prefer a more intelligent way so there's less human error involved.
you can always use Obisoft.HSharp:
var Document = new HDoc(DocumentOptions.BasicHTML);
Document["html"]["body"].AddChild("div");
Document["html"]["body"]["div"].AddChild("a", new HProp("href", "/#"));
Document["html"]["body"]["div"].AddChild("table");
Document["html"]["body"]["div"]["table"].AddChildren(
new HTag("tr"),
new HTag("tr", "SomeText"),
new HTag("tr", new HTag("td")));
var Result = Document.GenerateHTML();
Console.WriteLine(Result);
or System.Xml.Linq:
var html = new XElement("html",
new XElement("head",
new XElement("title", "My Page")
),
new XElement("body",
"this is some text"
)
);
Is using something like Razor not applicable here? Because if you're doing a lot of html generation using a view engine can make it a lot easier. It was also built to be used outside of ASP.NET.
However sometimes that's not what you need. Have you considered using the TagBuilder class which is part of .net (mvc)? There is also the HtmlWriter in System.Web.UI (for web forms). I would recommend one of these if you are making Controls or Html Helpers.
This is my suggestion:
Deserialize the XML into C# objects
Use a template engine such as RazorEngine to generate the HTML
I used RazorEngine in the past to generate email templates (in HTML format). They use a similar syntax to ASP.NET MVC views (.cshtml) and you can even make intellisense works with the templates! Also, templates are much easier to create and maintain, compared to XSLT or TagBuilder.
Consider the following model:
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
You can create a string with the HTML template, or use a file. I recommend using a file with a .cshtml extension, so you can have syntax highlighting and intellisense, as already mentioned:
Template text is the following:
#using RazorEngine.Templating
#using RazorDemo1
#inherits TemplateBase<Person>
<div>
Hello <strong>#Model.FirstName #Model.LastName</strong>
</div>
Loading the template and generating the HTML:
using System;
using System.IO;
using RazorEngine;
using RazorEngine.Templating;
namespace RazorDemo1
{
class Program
{
static void Main(string[] args)
{
string template = File.ReadAllText("./Templates/Person.cshtml");
var person = new Person
{
FirstName = "Rui",
LastName = "Jarimba"
};
string html = Engine.Razor.RunCompile(template, "templateKey", typeof(Person), person);
Console.WriteLine(html);
}
}
}
Output:
<div>
Hello <strong>Rui Jarimba</strong>
</div>
Related
I've been following tutorials on how to scrape information using HTMLAgilityPack, here is an example:
using System;
using System.Linq;
using System.Net;
namespace web_scraping_test
{
class Program
{
static void Main(string[] args)
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.yellowpages.com/search?search_terms=Software&geo_location_terms=Sydney2C+ND");
var names = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var item in names)
{
Console.WriteLine(item.InnerText);
}
}
}
}
This was easy to get the data because there's a common class name and it's simple to get to
I'm trying to use this to scrape information from this site, https://osu.ppy.sh/beatmapsets/354163#osu/780200
but I have no idea about the correct markup to get 'Stitches
Shawn Mendes' and the values given in this diagram:Diagram
For the 'Shawn Mendes' the markup is '<a class="beatmapset-header__details-text beatmapset-header__details-text--artist" href="https://osu.ppy.sh/beatmapsets?q=Shawn%20Mendes">Shawn Mendes</a>'
but I'm not sure about how to implement this into the code. I've replaced the url and have changed the classname but the directory of this text seems a lot more complicated on this site. Any advice would be appreciated, thanks!
All of the details you're looking for appear to be in a JSON object in the markup. There is a script block with the ID "json-beatmapset", if you scrape the content of that, and parse the JSON it contains, it should be smooth sailing after that.
Is it possible to use a C#/.Net based library like OpenXmlPowerTools in a Flask/Python 3 web-app?
My research tells me that I might be able to via a wrapper? or creating a microservice?
Would like to know your thoughts.
You can do this with pythonnet. I've created a sample class in C# that references the OpenXmlPowerTools and DocumentFormat.OpenXml assemblies and provides two methods:
using OpenXmlPowerTools;
using DocumentFormat.OpenXml.Packaging;
namespace CodeSnippets.OpenXmlWrapper
{
public class OpenXmlPowerToolsWrapper
{
public static string GetMainDocumentPart(string path)
{
using WordprocessingDocument wordDocument = WordprocessingDocument.Open(path, true);
return wordDocument.MainDocumentPart.GetXElement().ToString();
}
public static string FinishReview(string path)
{
using WordprocessingDocument wordDocument = WordprocessingDocument.Open(path, true);
var settings = new SimplifyMarkupSettings
{
AcceptRevisions = true,
RemoveComments = true
};
MarkupSimplifier.SimplifyMarkup(wordDocument, settings);
return wordDocument.MainDocumentPart.GetXElement().ToString();
}
}
}
The corresponding sample Python client looks like this, noting that clr is a module provided by pythonnet:
import clr
import shutil
clr.AddReference(
r"..\CodeSnippets.OpenXmlWrapper\bin\x64\Debug\net471\CodeSnippets.OpenXmlWrapper")
from CodeSnippets.OpenXmlWrapper import OpenXmlPowerToolsWrapper
wrapper = OpenXmlPowerToolsWrapper()
# Display contents before finishing review, showing that the document contains revision markup.
xml_with_revision_markup = wrapper.GetMainDocumentPart("DocumentWithRevisionMarkup.docx")
print("Document before finishing review:\n")
print(xml_with_revision_markup)
# Finish review, removing all revision markup.
print("\nFinishing review ...")
shutil.copyfile("DocumentWithRevisionMarkup.docx", "Result.docx")
xml_without_revision_markup = wrapper.FinishReview("Result.docx")
# Display contents after finishing review, showing that the revision markup was removed.
print("\nDocument after finishing review:\n")
print(xml_without_revision_markup)
You'll find the full source code in my CodeSnippets GitHub repo. Look at the CodeSnippets.OpenXmlWrapper and CodeSnippets.OpenXmlWrapper.PythonClient projects.
Depending on your use case, you will also be able to use the OpenXmlPowerTools directly. I just implemented a wrapper to provide a simplified interface.
I am using GetSafeHtmlFragment in my website and I found that all of tags except <p> and <a> is removed.
I researched around and I found that there is no resolution for it from Microsoft.
Is there any superseded for it or is there any solution?
Thanks.
Amazing that Microsoft in the 4.2.1 version terribly overcompensated for a security leak in the 4.2 XSS library and now still hasn't updated a year later. The GetSafeHtmlFragment method should have been renamed to StripHtml as I read someone commenting somewhere.
I ended up using the HtmlSanitizer library suggested in this related SO issue. I liked that it was available as a package through NuGet.
This library basically implements a variation of the white-list approach the now accepted answer uses. However it is based on CsQuery instead of the HTML Agility library. The package also gives some additional options, like being able to keep style information (e.g. HTML attributes). Using this library resulted in code in my project something like below, which - at least - is a lot less code than the accepted answer :).
using Html;
...
var sanitizer = new HtmlSanitizer();
sanitizer.AllowedTags = new List<string> { "p", "ul", "li", "ol", "br" };
string sanitizedHtml = sanitizer.Sanitize(htmlString);
An alternative solution would be to use the Html Agility Pack in conjunction with your own tags white list :
using System;
using System.IO;
using System.Text;
using System.Linq;
using System.Collections.Generic;
using HtmlAgilityPack;
class Program
{
static void Main(string[] args)
{
var whiteList = new[]
{
"#comment", "html", "head",
"title", "body", "img", "p",
"a"
};
var html = File.ReadAllText("input.html");
var doc = new HtmlDocument();
doc.LoadHtml(html);
var nodesToRemove = new List<HtmlAgilityPack.HtmlNode>();
var e = doc
.CreateNavigator()
.SelectDescendants(System.Xml.XPath.XPathNodeType.All, false)
.GetEnumerator();
while (e.MoveNext())
{
var node =
((HtmlAgilityPack.HtmlNodeNavigator)e.Current)
.CurrentNode;
if (!whiteList.Contains(node.Name))
{
nodesToRemove.Add(node);
}
}
nodesToRemove.ForEach(node => node.Remove());
var sb = new StringBuilder();
using (var w = new StringWriter(sb))
{
doc.Save(w);
}
Console.WriteLine(sb.ToString());
}
}
With Xamarin Android, it possible to create localized strings for multi-language apps, as is shown in their Android documentation:
http://docs.xamarin.com/guides/android/application_fundamentals/resources_in_android/part_5_-_application_localization_and_string_resources
However, I have various try/catch blocks in my Model which send error messages back as strings. Ideally I'd like to keep the Model and Controller parts of my solution entirely cross platform but I can't see any way to effectively localize the messages without passing a very platform specific Android Context to the Model.
Does anyone have ideas about how this can be achieved?
I'm using .net resource files instead of the Android ones. They give me access to the strings from code, wherever it is.
The only thing I can't do automatically is reference those strings from layouts. To deal with that I've written a quick utility which parses the resx file and creates an Android resource file with the same values. It gets run before the Android project builds so all the strings are in place when it does.
Disclaimer: I haven't actually tested this with multiple languages yet.
This is the code for the utility:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
namespace StringThing
{
class Program
{
static void Main(string[] args)
{
string sourceFile = args[0];
string targetFile = args[1];
Dictionary<string, string> strings = LoadDotNetStrings(sourceFile);
WriteToTarget(targetFile, strings);
}
static Dictionary<string, string> LoadDotNetStrings(string file)
{
var result = new Dictionary<string, string>();
XmlDocument doc = new XmlDocument();
doc.Load(file);
XmlNodeList nodes = doc.SelectNodes("//data");
foreach (XmlNode node in nodes)
{
string name = node.Attributes["name"].Value;
string value = node.ChildNodes[1].InnerText;
result.Add(name, value);
}
return result;
}
static void WriteToTarget(string targetFile, Dictionary<string, string> strings)
{
StringBuilder bob = new StringBuilder();
bob.AppendLine("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
bob.AppendLine("<resources>");
foreach (string key in strings.Keys)
{
bob.Append(" ");
bob.AppendLine(string.Format("<string name=\"{0}\">{1}</string>", key, strings[key]));
}
bob.AppendLine("</resources>");
System.IO.File.WriteAllText(targetFile, bob.ToString());
}
}
}
For Xamarin, you can also look at Vernacular https://github.com/rdio/vernacular
You can write code with minimal effort without worrying about the translation. Feed the generated IL to Vernacular to get translatable strings in iOS, Andorid, Windows Phone formats.
I've created a slightly ugly solution at Xamarin iOS localization using .NET which you might find helpful.
I am looking for an OFX file parser library in C#. I have search the web but there seems to be none. Does anyone know of any good quality C# OFX file parser. I need to process some bank statements files which are in OFX format.
Update
I have managed to find a C# library for parsing OFX parser.
Here is the link ofx sharp. This codebase seems to be the best case to startup my solution.
I tried to use the ofx sharp library, but realised it doesn't work is the file is not valid XML ... it seems to parse but has empty values ...
I made a change in the OFXDocumentParser.cs where I first fix the file to become valid XML and then let the parser continue. Not sure if you experienced the same issue?
Inside of the method:
private string SGMLToXML(string file)
I added a few lines first to take file to newfile and then let the SqmlReader process that after the following code:
string newfile = ParseHeader(file);
newfile = SGMLToXMLFixer.Fix_SONRS(newfile);
newfile = SGMLToXMLFixer.Fix_STMTTRNRS(newfile);
newfile = SGMLToXMLFixer.Fix_CCSTMTTRNRS(newfile);
//reader.InputStream = new StringReader(ParseHeader(file));
reader.InputStream = new StringReader(newfile);
SGMLToXMLFixer is new class I added into the OFXSharp library. It basically scans all the tags that open and verifies it has a closing tag too.
namespace OFXSharp
{
public static class SGMLToXMLFixer
{
public static string Fix_SONRS(string original)
{ .... }
public static string Fix_STMTTRNRS(string original)
{ .... }
public static string Fix_CCSTMTTRNRS(string original)
{ .... }
private static string Fix_Transactions(string file, string transactionTag, int lastIdx, out int lastIdx_new)
{ .... }
private static string Fix_Transactions_Recursive(string file_modified, int lastIdx, out int lastIdx_new)
{ .... }
}
}
Try http://www.codeproject.com/KB/aspnet/Ofx_to_DataSet.aspx. The code uses Framework 3.5 and transforms an ofx into a dataset, this may help with what you're trying to do.