Convert Rtf to HTML [closed]

Convert Rtf to HTML [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
We have a crystal report that we need to send out as an e-mail, but the HTML generated from the crystal report is pretty much just plain ugly and causes issues with some e-mail clients. I wanted to export it as rich text and convert that to HTML if it's possible.
Any suggestions?

I would check out this tool on CodeProject RTFConverter. This guy gives a great breakdown of how the program works along with details of the conversion.
Writing Your Own RTF Converter

There is also a sample on the MSDN Code Samples gallery called Converting between RTF and HTML which allows you to convert between HTML, RTF and XAML.

Mike Stall posted the code for one he wrote in c# here :
https://learn.microsoft.com/en-us/archive/blogs/jmstall/writing-an-rtf-to-html-converter-posting-code-in-blogs

UPDATED:
I got home and tried the below code and it does not work. For anyone wondering, the clipboard does not just magically convert stuff like I'd hoped. Rather, it allows an application to sort of "upload" a data object with a variety of paste formats, and then then you paste (which in my metaphor would be the "download") the program being pasted into specifies its preferred format. I personally ended up using this code, which has been recommended previously, and it was enormously easy to use and very effective. After you have imported the code (in VStudio, Project -> Add Existing Files) you then just go html to rtf like this:
return HtmlToRtfConverter.ConvertHtmlToRtf(myRtfString);
or the opposite direction:
return RtfToHtmlConverter.ConvertHtmlToRtf(myHtmlString);
(below is my previous incorrect answer, in case anyone is interested in the chronology of this answer haha)
Most if not all of the above answers provide comprehensive, often Library-based solutions to the problem at hand.
I am away from my computer and thus cannot test the idea, but one alternative, cheap and vaguely hack-y method would be the following.
private string HTMLFromRtf(string rtfString)
{
Clipboard.SetData(DataFormats.Rtf, rtfString);
return Clipboard.GetData(DataFormats.Html);
}
Again, not totally sure if this would work, but just messing around with some html on my iPhone I suspect it would. Documentation is here. More in depth explanation/docs RE the getting and setting of data models in the clipboard can be found here.
(Yes I am fully aware I'm here years later, but I assume this question is one which some people still want answered).

If you don't mind getting your hands dirty, it isn't that difficult to write an RTF to HTML converter.
Writing a general purpose RTF->HTML converter would be somewhat complicated because you would need to deal with hundreds of RTF verbs. However, in your case you are only dealing with those verbs used specifically by Crystal Reports. I'll bet the standard RTF coding generated by Crystal doesn't vary much from report to report.
I wrote an RTF to HTML converter in C++, but it only deals with basic formatting like fonts, paragraph alignments, etc. My translator basically strips out any specialized formatting that it isn't prepared to deal with. It took about 400 lines of C++. It basically scans the text for RTF tags and replaces them with equivalent HTML tags. RTF tags that aren't in my list are simply stripped out. A regex function is really helpful when writing such a converter.

I think you can load it in a Word document object by using .NET office programmability support and Visual Studio tools for office.
And then use the document instance to re-save as an HTML document.
I am not sure how but I believe it is possible entirely in .NET without any 3rd party library.

I am not aware of any libraries to do this (but I am sure there are many that can) but if you can already create HTML from the crystal report why not use XSLT to clean up the markup?

You can try to upload it to google docs, and download it as HTML.

Related

Difficulty of switching from HTML only e-mail to Text only e-mail in .net C#? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am working with a consulting group on a program which currently uses a .net C# script to send e-mails in HTML format at regular intervals.
The e-mail itself aside from being in HTML format although the content is text with some tags and contains less then a page of text.
I would like the consultant to change this to text format replacing the tags with line feed/carriage returns. I have been told that this is a four hour job but that seems excessive to me.
When I look online at a page such as this http://www.mattvanandel.com/771/c-sending-an-email/ it would seem the change could be completed in less than 4 hours including recompiling the .net code into a DLL, testing and uploading the code to a server.
Not all developers are created equal, but assuming that the .Net developer is experienced enough to warrant a $250 per hour salary does this seem reasonable? If it is something less than 4 hours (i.e. more like 4 min) can someone tell me what might have to be done to make the modification. From what I can see its likely 2 lines of code that need to be modified (i.e. the body string and the IsBodyHtml statement). What else may I be missing?

Dependant upon what kind testing would be required to verify that the system is stable after the change, then perhaps 4 hours may or may not be excessive.
For a simple looking change in a tightly coupled system may have massive implications and risk. On the other hand in a loosely coupled system, the risk should be minimal.
So the question is, why 4 hours. If it was me. I'd request a breakdown of what the 4 hours represents. You are after all the customer and if you need a cost breakdown I'd suggest you're within your purview to request it.
However I'd suggest that you ask in a non confrontational way (i.e. don't jump in with all guns blazing) as the there may well be serious implications that the developer knows about but you don't. Maybe just ask for a simple - 'what is involved in implementing this change'.
And don't feel you have to accept the first answer given, you should if you are dissatified, request further clarification from the developer.

It all depends on how the code is written - and on that we can only speculate currently. It may be that they use a really complex 3rd party tool - in which case it might take four hours.
However, if it is done using System.Net.Mail then it could be as simple as setting the IsBodyHtmlproperty on a MailMessage to true, which is a four-second job.

Changing that 'IsBodyHtml' property would make it send text, but you would also need to modify the text to insert the line feeds - on static text this is not totally difficult, but you need to consider when a line feed is proper (what in the html has "block" layout and what is simple in-line styled). Also you do not mention if the text is dynamic or static which adds complications if it IS dynamicly generated.
Time you pay for, but also knowlege. I get someone else to fix things on my car, not because I can't, but because they are better and have the tools I might not have.
Just from a time spent perspective:
Get knowlege/use knowlege already present
Estimate time to communicate with you
Design the change
Code the change
Deploy the change
Test the change/functional test
Solicit feedback on the change/acceptance test (from you?)

There is only one property "IsBodyHtml" of MailMessage Class in .net to switch between Html/Text mail message type.
So you can check yourself, how big is the job excepting removing html tags and pumblishing the updated dll on server.

The mechanics of switching the code itself is as simple as you say above, replacing the HTML body string with the new string and changing the IsBodyHtml property. (Assuming the code uses the built in .NET Framework mailing components).
Remember though, that text based emails will remove all formatting, so you won't be able to have font colours, images, hyperlinks or anything else in the content except as plain text.
If you really want to cut the estimate down, get someone internal to edit the text and all the developer will have to do is switch 2 lines of code and then test/deploy.
I can't comment on the time required to test/deploy as that's entirely dependent on your system.

Rules/guidelines for documenting C# code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am a relatively new developer and have been assigned the task of documenting code written by an advanced C# developer. My boss told me to look through it, and to document it so that it would be easier to modify and update as needed.
My question is: Is there a standard type of Documentation/Comment structure I should follow? My boss made it sound like everyone knew exactly how to document the code to a certain standard so that anyone could understand it.
I am also curious if anyone has a good method for figuring out unfamiliar code or function uncertainty. Any help would be greatly appreciated.

The standard seems to be XML Doc (MSDN Technet article here).
You can use /// at the beginning of each line of documentation comments. There are standard XML style elements for documenting your code; each should follow the standard <element>Content</element> usage. Here are some of the elements:
<c> Used to differentiate code font from normal text
<c>class Foo</c>
<code>
<example>
<exception>
<para> Used to control formatting of documentation output.
<para>The <c>Foo</c> class...</para>
<param>
<paramref> Used to refer to a previously described <param>
If <paramref name="myFoo" /> is <c>null</c> the method will...
<remarks>
<returns>
<see> Creates a cross-ref to another topic.
The <see cref="System.String" /><paramref name="someString"/>
represents...
<summary> A description (summary) of the code you're documenting.

Sounds like you really did end up getting the short straw.
Unfortunately I think you've stumbled on one of the more controversial subjects of software development in general. Comments can be seen as extremely helpful where necessary, and unnecessary cruft when used wrongly. You'll have to be careful and decide quite carefully what goes where.
As far as commenting practice, it's usually down to the corporation or the developer. A few common rules I like to use are:
Comment logic that isn't clear (and consider a refactor)
Only Xml-Doc methods / properties that could be questioned (or, if you need to give a more detailed overview)
If your comments exceed the length of the containing method / class, you might want to think about comment verbosity, or even consider a refactor.
Try and imagine a new developer coming across this code. What questions would they ask?
It sounds like your boss is referring to commenting logic (most probably so that you can start understanding it) and using xml-doc comments.
If you haven't used xml-doc comments before, check out this link which should give you a little guidance on use and where appropriate.
If your workloadi s looking a little heavy (ie, lots of code to comment), I have some good news for you - there's an excellent plugin for Visual Studio that may help you out for xml-doc comments. GhostDoc can make xml-doc commenting methods / classes etc much easier (but remember to change the default placeholder text it inserts in there!)
Remember, you may want to check with your boss on just what parts of the code he wants documented before you go on a ghostdoc spree.

It's a bit of a worry that the original programmer didn't bother to do one of the most important parts of his job. However, there are lots of terrible "good" programmers out there, so this isn't really all that unusual.
However, getting you to document the code is also a pretty good training mechanism - you have to read and understand the code before you can write down what it does, and as well as gaining knowledge of the systems, you will undoubtedly pick up a few tips and tricks from the good (and bad!) things the other programmer has done.
To help get your documentation done quickly and consistently, you might like to try my add-in for Visual Studio, AtomineerUtils Pro Documentation. This will help with the boring grunt work of creating and updating the comments, make sure the comments are fully formed and in sync with the code, and let you concentrate on the code itself.
As to working out how the code works...
Hopefully the class, method, parameter and variable names will be descriptive. This should give you a pretty good starting point. You can then take one method or class at a time and determine if you believe that the code within it delivers what you think the naming implies. If there are unit tests then these will give a good indication of what the programmer expected the code to do (or handle). Regardless, try to write some (more) unit tests for the code, because thinking of special cases that might break the code, and working out why the code fails some of your tests, will give you a good understanding of what it does and how it does it. Then you can augment the basic documentation you've written with the more useful details (can this parameter be null? what range of values is legal? What will the return value be if you pass in a blank string? etc)
This can be daunting, but if you start with the little classes and methods first (e.g. that Name property that just returns a name string) you will gain familiarity with the surrounding code and be able to gradually work your way up to the more complex classes and methods.
Once you have basic code documentation written for the classes, you should then be in a position to write external overview documentation that describes how the system as a whole functions. And then you'll be ready to work on that part of the codebase because you'll understand how it all fits together.
I'd recommend using XML documentation (see the other answers) as this is immediately picked up by Visual Studio and used for intellisense help. Then anyone writing code that calls your classes will get help in tooltips as they type the code. This is such a major bonus when working with a team or a large codebase, but many companies/programmers just don't realise what they've been missing, banging their (undocumented) rocks together in the dark ages :-)

I suspect your boss is referring to the following XML Documentation Comments.
XML Documentation Comments (C# Programming Guide)

It might be worth asking your boss if he has any examples of code that is already documented so you can see first-hand what he is after.
Mark Needham has written a few blog posts about reading/documenting code (see Archive for the ‘Reading Code’ Category.
I remember reading Reading Code: Rhino Mocks some time ago that talks about diagramming the code to help keep track of where you are and to 'map out' what's going on.
Hope that helps - good luck!

Looking for a tool to quickly test C# format strings [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am constantly forgetting what the special little codes are for formatting .NET strings. Either through ToString() or using String.Format(). Alignment, padding, month vs. minute (month is uppercase M?), abbreviation vs. full word, etc. I can never remember.
I have the same problem with regexes, but luckily there's Expresso to help me out. It's awesome.
Is there a tool like Expresso for experimenting with formatted strings on standard types like DateTime and float and so on?

PowerShell works great for testing format strings. From PowerShell you can load your assembly and work with the objects and methods you want to test. You could also just create a string on the command line and test out different formatting options.
You can use the static method from the string class:
$teststring = 'Currency - {0:c}. And a date - {1:ddd d MMM}. And a plain string - {2}'
[string]::Format($teststring, 160.45, Get-Date, 'Test String')
Or PowerShell has a built in format operator
$teststring = 'Currency - {0:c}. And a date - {1:ddd d MMM}. And a plain string - {2}'
$teststring -f 160.45, Get-Date, 'Test String'

I just found this:
http://rextester.com/
Simply paste in your format string, and run the code.
It would also be simple enough to create a windows or console project that does exactly that.

Snippet Compiler is a great tool in general for quick small app testing. Instead of cluttering your Visual Studio with a million ConsoleApplication79 projects, just use this. I have it and use it constantly.

LinqPad is a great tool that handles this sort of thing brilliantly, even though it's tangential to its primary function (of troubleshooting Linq syntax).
Just enter the expression with the language selector set to "C# Expression" (or "VB.net Expression") and the database set to "None." For example:
String.Format("{0:d}-{1:d}", new DateTime(2012, 1, 6), null)
When you press Run, you'll get the result:
1/6/2012-

http://www.sellsbrothers.com/tools/#FormatDesigner

You could use the Snippy plugin for Reflector to run little code snippets.
Looks like the link is dead - just use LinqPad!

Just another simple utility, avaliable on MSDN: http://go.microsoft.com/fwlink/?LinkId=209564, description is:
an application that enables you to apply format strings to either numeric or date and time values and displays the result string.
But you need to compile it by yourself.

How do I print a check? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need to write a .NET library for printing checks. Nothing fancy: you pass in the data, out comes the printed check. What's the best way to do this?
Constraints: The format of the check.

A lot of people are using report generators for this. It's a bit overkill, but crystal reports will certainly do the job.
Other than that, this is a basic question about formatting printed output. Is that your intention?
Check out the printdocument class and you can do this yourself:
http://msdn.microsoft.com/en-us/magazine/cc188767.aspx
If you're printing checks remotely (ie, you need to provide a check on the website that the user can print out) then using PDF is the easiest and most certain way to accomplish that, but be careful of the security implications.
-Adam

Wow... that takes me back! In the old days printers where dot matrix and cheques where a continous feed. I suppose nowadays cheques are preprinted single sheets and are printed with lasers/inkjets. Back then we'd just write plain ascii to the printer and send printer specific control/escape sequences for any specific formatting needs (picking the font size, line spacing, and page sizes).
Now I would like try generating a PDF and then submitting that file for printing. It out to be possible to do this with a plain text file too... though that's getting pretty close to old school. The report generator suggestion by Adam is pretty good idea too.
Generally with cheque printing it is a lot of trial and error to get the formatting right. Printing on plain paper and holding it and a preprinted cheque up to the window is an easy way to check positioning without burning through tons of cheques.
One thing to note though is whether or not there is a requirement to track the control numbers preprinted on the cheques (aka cheque number). Auditors sometimes require this and it is also a reasonable guard against fraud (accounting for every preprinted cheque is not a terrible idea). To do this you need to handle reprinting, and markng individual cheques/cheque runs as "spoiled". You also need a manual process to collect and store spoiled cheques (for the auditors). On whole it's a giant pain to get this right and can take more time than you might imagine.

Unless you're really ambitious, you order pre-printed checks and look at the check template. Fill in the blanks and there you are.

Since the format would be fairly fixed, I but you could create a Word doc that holds the format and then programmatically insert the correct information and print it
EDIT
Wow, pretty anti MS eh? You can use the full power of Words to visually set the format for the cheque and there are libraries to modify Word docs in .net, so I don't see why this isn't a slick solution

What's the best way of parsing strings? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We've got a scenario that requires us to parse lots of e-mail (plain text), each e-mail 'type' is the result of a script being run against various platforms. Some are tab delimited, some are space delimited, some we simply don't know yet.
We'll need to support more 'formats' in the future too.
Do we go for a solution using:
Regex
Simply string searching (using string.IndexOf etc)
Lex/ Yacc
Other
The overall solution will be developed in C# 2.0 (hopefully 3.5)

Regex.
Regex can solve almost everything except for world peace. Well maybe world peace too.

The three solutions you stated each cover very different needs.
Manual parsing (simple text search) is the most flexible and the most adaptable, however, it very quickly becomes a real pain in the ass as the parsing required is more complicated.
Regex are a middle ground, and probably your best bet here. They are powerful, yet flexible as you can yourself add more logic from the code that call the different regex. The main drawback would be speed here.
Lex/Yacc is really only adapted to very complicated, predictable syntaxes and lacks a lot of post compile flexibility. You can't easily change parser in mid parsing, well actually you can but it's just too heavy and you'd be better using regex instead.
I know this is a cliché answer, it all really comes down to what your exact needs are, but from what you said, I would personally probably go with a bag of regex.
As an alternative, as Vaibhav poionted out, if you have several different situations that can arise and that you cna easily detect which one is coming, you could make a plugin system that chooses the right algorithm, and those algorithms could all be very different, one using Lex/Yacc in pointy cases and the other using IndexOf and regex for simpler cases.

You probably should have a pluggable system regardless of which type of string parsing you use. So, this system calls upon the right 'plugin' depending on the type of email to parse it.

You must architect your solution to be updatable, so that you can handle unknown situations when they crop up. Create an interface for parsers that contains not only methods for parsing the emails and returning results in a standard format, but also for examining the email to determine if the parser will execute.
Within your configuration, identify the type of parser you wish to use, set its configuration options, and the configuration for the identifiers which determine if a parser will act or not. Name the parsers by assembly qualified name so that the types can be instantiated at runtime even if there aren't static links to their assemblies.
Identifiers can implement an interface as well, so you can create different types that check for different things. For instance, you might create a regex identifier, which parses the email for a specific pattern. Make sure to make as much information available to the identifier, so that it can make decisions on things like from addresses as well as the content of the email.
When your known parsers can't handle a job, create a new DLL with types that implement the parser and identifier interfaces that can handle the job and drop them in your bin directory.

It depends on what you're parsing. For anything beyond what Regex can handle, I've been using ANTLR. Before you jump into recursive descent parsing for the first time, I would research how they work, before attempting to use a framework like this one. If you subscribe to MSDN Magazine, check the Feb 2008 issue where they have an article on writing one from scratch.
Once you get the understanding, learning ANTLR will be a ton easier. There are other frameworks out there, but ANTLR seems to have the most community support and public documentation. The author has also published The Definitive ANTLR Reference: Building Domain-Specific Languages.

Regex would probably be you bes bet, tried and proven. Plus a regular expression can be compiled.

Your best bet is RegEx because it provides a much greater degree of flexibility than any of the other options.
While you could use IndexOf to handle somethings, you may quickly find yourself writing code that looks like:
if(s.IndexOf("search1")>-1 || s.IndexOf("search2")>-1 ||...
That can be handled in one RegEx statement. Plus, there are a lot of place like RegExLib.com where you can find folks who have shared regular expressions to solve problems.

#Coincoin has covered the bases; I just want to add that with regex it's particularly easy to end up with hard-to-read, hard-to-maintain code. Regex is a powerful and very compact language, so that's how it often goes.
Using whitespace and comments within the regex can go a long way to make it easier to maintain regexes. Eric Gunnerson turned me on to this idea. Here's an example.

Use PCRE. All other answers are just 2nd Best.

With as little information you provided, i would choose Regex.
But what kind of information you want to parse and what you would want to do will change the decision to Lex/Yacc maybe..
But it looks like you've already made your mind up with String search :)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.