I'm looking for a C# library or Web Service that will parse an HTML email that contains VML and report any syntax errors.
Does anyone know if such a beast exists?
Thanks
I'm not aware of a library or web service - although that could well just be due to a lack of knowledge on my part - but you could probably write your own validator reasonably easily. There are published schemas for the various versions of Word HTML emails and Office (e.g. http://www.microsoft.com/downloads/details.aspx?FamilyId=0B764C08-0F86-431E-8BD5-EF0E9CE26A3A&displaylang=en, http://www.microsoft.com/download/en/details.aspx?id=101) so you should be able to use them to validate your content?
Related
What are my options for editing Word documents? We have a hosted business web application (written in C# using javascript libraries and webapi2) and our users would like some basic document management functionality. From within our application they would like to complete documentation which currently resides as Word 2013 documents.
Is there something that would allow us to upload a docx file, convert to some web friendly XML format that would allow online editing or can Office Web Apps be used directly within the browser for Word edits if the client has a valid Word license? Is SharePoint online integration an option? Or, is there an option out there that I am not aware of? Any direction would be greatly appreciated!
To complete your task, you can use the DevExpress ASPxRichEdit and ASPxSpreadSheet controls. They support the most popular rich text and spreadsheet formats (including MS Office documents).
Both controls are web-based (ActiveX isn't required), standalone (you don't need to integrate external services in your application) and work in all modern browsers. Also, they have the built-in filemanager, so you can use them with minimum coding.
Moreover, both controls are distributed as a part of the ASP.NET controls suite, which includes a lot of other web components.
If you're client have the correct licenses and that you already have a solution develop that have the basic document management features like upload documents, download, etc. Then I would opt for the Office Web Apps. This solution requires some reading and a certain architecture (it's own server for instance). But it is probably one of the best Word Document editor currently out there. You can find the basic information of the Office Web App server 2013 here
These approach will let you either use a sharepoint integration or a custom WOPI-Host. I've analysed and searched for different tools and other the Google Docs, this would be the best option currently out there.
If you actually take the Office Web App server approach with a custom WOPI-Host you can find several WOPI-Host samples on the internet:
MVC6 WopiHost based on marx-yu's WOPI host
Building an Office Web Apps (OWA) WOPI Host
As I know, Google Docs can help you on your issues, but you just cannot build it in your web Apps. And aceoffix can be an alternative too, which can enable your web project edit Ms Office documents full functionally.
I have a CLI application which is able to edit XML files with some parameters.
However I'm needing now a more powerful way to do it.
I want to give users the option to edit XML files using custom code from a .txt for total control over the XML editing.
For example:
#CODE File<file name for XML editing>
<code>
# Custom XML parser/editing code
for elem in tree.iter(tag='location'):
if elem.text == 'J':
elem.text = 'January'
</code>
Which would be the safest way to do this in .net C#? I mean the user only be able to edit the XML file and not doing anything more that compromises the security of the system (like deleting files)?
I'm thinking of using a Javascript engine (like this one) and running javascript code from the file. I believe javascript would limit what the user would be able to do. I also thought in C# code and Python but this ones may introduce the security issues.
Edit:
One requirement is that it must work on mono.
I have choose the IronJS .NET runtime with a javascript XML library discussed here (XML for < script> W3C DOM Parser).
I have also looked for other javascript .NET runtimes like: Javascript .NET, Jurassic and Jint (opted for IronJS because the better performance). Plus tested some .NET Lua libraries, namely Kopilua, but opted for the javascript solution because it seemed more complete, more documented and easier to use.
First of all, I hope my question doesn't bother you. I really need to get and idea of how I can accomplish that, but unfortunatelly, I'm really a beginner, I'm crawling when it comes to programming. I'm struggling to learn it the best way I can. I'll thank you for any help you give me.
Here's the task: I was ordered to find a way to collect some data from a website using a c# application. This will be done everyday, in order to update the data which we'll use to calculate some financial index.
I know my question might sound vague, anyway, even telling me how I can be more precise will help me. I know I seem to know desperate, but putting appart all the personell issues, my scholarship kind of depends on it.
Thanks in advance! (Please, don't mind the bad English, I'm brasilian and my English might not be that good yet.)
First, your English is fine. In fact, I thought you were a native speaker until you said otherwise.
The term you're looking for is 'site scraping'. Observe this question: Options for HTML scraping?. The second answer points to an HTML agility pack library you can use.
Now, there are two possibilities here. The first is you have to parse the HTML and scrape your data out of it. This is more computationally intensive and depends on the layout of the page. If they change the way the site looks, it could break the scraper.
The second possibility is they provide some XML or JSON web service you can consume. In this case you aren't scraping anything, but are rather using a true data feed. If the layout of the site changes, you will not break. Whether your target site supports this form of data feed is up to the site.
If I understand your question, you're being asked to do some Web Scraping, where you 1) download the contents of a web page and 2) try to parse data from that content.
For step #1, you should look into using a WebClient object in C# to download the HTML from the web page. You can give a WebClient object the URL you want to download the content from and obtain a String containing the content (probably HTML) of the URL.
How you go about doing step #2 depends on what content is present at the web site. If you know of certain patterns you're looking for in the HTML, you can search the HTML string using various methods. A more general solution for parsing HTML data can be found through using the Html Agility Pack, which will let you handle the HTML as a tree structure (DOM).
Use the WebClient class to get the page.
Turn the html into xml.
Use XPath to select the data you are interested in.
Ok, this is a pretty straightforward app design, and a lot of the code exists that you can reuse. Since you're a beginner, I'll break down into steps of what you need to do and recommend approaches.
1) You will use classes from System.Net to pull the web pages (WebClient being the easiest to usse). You will want to have this part of the program run on a timer if you can (using the scheduled jobs feature of the OS) and have it just pull the pages and drop them in a folder.
2) You have a second job which will run separately, pulling unread files from that folder, parsing them (using the HtmlAgility pack library is best) and then storing them in an index of some kind (Lucene is best for that)
3) You have a front end application of some sort (web or desktop) which queries that index for the information you're looking for.
I need html parse which have capability to identify error in generated html and if tags are not closed then close it and return the valid html.
More detail: i am getting data from database and break that record to show partial detail on my website to click on more button then show complete content. After breaking string then validate.
I have already used Html Agility Pack but i am new to use it, if this library solve my issue then guide me how (tutorial) or suggest me another library.
I don't think such a library does exist. The problem is, that some libraries can indeed identify errors in your HTML but they cant fix them for you.
I think using the W3C validator as a service is the best starting point here. There's an open source library which uses the API of the W3C validator to validate a document and gives you the response if it is valid or not as well as errors and warnings. I would start with this and then go on from there.
W3C Markup Validator library in C#
Here are a couple of validation programs from the World Wide Web Consortium, the W3C:
Windows: http://validator.w3.org/docs/install_win.html
UNIX / Linux: http://validator.w3.org/docs/install.html
You can also use their web services to validate your CSS, HTML, XML, XHTML, JavaScript and many other web technologies. The W3C is one of the overseers of keeping the Internet highly interoperable and internet devices somewhat compatible with each other.
I need to write a system to generate HTML email from a data model -
I was going to create a templating system to build the model into an HTML representation using HTML 'fragments' stored in an xml template. But it occurs to me that these it might be better to use asp or asp.net than write my own templating system?
What I am wondering is whether/how it would be possible to use asp (maybe asp.net mvc?) to return an HTML string - I wouldn't be running on a web server, or in response to an HTTP request.
I have not done any asp or asp.net yet- My experience of ASP stretches to 'Create new project' in visual studio - but maybe now is a good time to learn!
Thank You!
The standard ASP.NET view engine--ASP.NET web forms--is very difficult to use in this way as it is pretty tied to the HttpContext and really don't want to give you a string back but rather stream into the HttpResponse. So you'd generally need IIS stood up to get it to go.
Xslt (as you are thinking) is a pretty decent option. As is, if things are simple enough, your own template replacement scheme. Now, if things are complex enough, some other options would include:
The Spark View Engine
The new Asp.NET Razor View Engine.
Either of those should let you get a string out of a template without too much trouble.
The simplest way is to make an aspx page that renders the email and then read it on the server using WebClient or and HttpWebRequest.
System.Net.WebClient oClient = new System.Net.WebClient();
string Email = oClient.DownloadString(UrlOfPage);
There are other ways to capture the output and I am sure if you search on Google you can find articles about this, but from personal experience this is the simplest way to go.
Also beware of the Html/Css limitations of many email clients. It is not the same as a browser.