Using RichEditDocumentServer to change from rtf to text

Using RichEditDocumentServer to change from rtf to text - c#

I am trying to convert RTF to plain text in a c# program. I figured out how to do it but it isn't very clean. It uses RichTextBox which I'm not a huge fan of:
using (System.Windows.Forms.RichTextBox rtfBox = new System.Windows.Forms.RichTextBox())
{
rtfBox.Rtf = cTrans.NoteDescription;
tItem.ProcedureShortDescription = rtfBox.Text;
}
I was wondering if there is a better way to go about accomplishing this. Perhaps using RichEditDocumentServer? I could not find a ton of info on it though and was wondering if I could get some help on it. My thought was:
var documentServer = new RichEditDocumentServer();
documentServer.Document.RtfText = cTrans.NoteDescription;
tItem.ProcedureShortDescription = documentServer.Document.Text;
I did some more digging and this works. I figured I'd just post this as I couldn't see it answered anywhere on the site. I'm not sure if that is proper protocol.

I ended up putting it in a helper class so it can be called if needed again:
namespace ABELSoft.Dental.Interface.Helper
{
public class RtfToText
{
public static string convert(string rtfText)
{
string _text;
var documentServer = new RichEditDocumentServer();
documentServer.Document.RtfText = rtfText;
_text = documentServer.Document.Text;
return _text;
}
}
}
This is how I called it:
tItem.ProcedureShortDescription = RtfToText.convert(cTrans.NoteDescription);

Related

How to find the other part of a string

I'm trying to make C# program that gets a line on a website and use it.
Unfortunately, I don't know the full line on the site. I only know "steam://joinlobby/730/". Although, what comes after "/730/" is always different.
So i need help getting the full line that comes after it.
What I've got:
public void Main()
{
WebClient web = new WebClient();
// here is the site that i want to download and read text from it.
string result = web.DownloadString("http://steamcommunity.com/id/peppahtank");
if (result.Contains("steam://joinlobby/730/"))
{
//get the part after /730/
}
}
I can tell you that it always ends with "xxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxx"
so: steam://joinlobby/730/xxxxxxxxx/xxxxxxxx.

What's to prevent you from just splitting the string on '/730/'?
result.Split(#"/730/")[1]
https://msdn.microsoft.com/en-us/library/system.string.split(v=vs.110).aspx

The easiest method for this particular case would be to take the first part, and then just skip that many characters
const string Prefix = #"steam://joinlobby/730/";
//...
if(result.StartsWith(Prefix))
{
var otherPart = result.SubString(Prefix.Length);
// TODO: Process other part
}

Make sure your result is not null and begins with steam://joinlobby/730/
if(string.IsNullOrWhiteSpaces(result) && result.StartsWith("steam://joinlobby/730/"))
{
string rest = result.SubString(("steam://joinlobby/730/").Length);
}

How to set PageOptions in TuesPechkin

I'm using TuesPechkin (the C# wrapper of wkhtmltopdf) and have it generating PDF files from HTML.
However, I would like to set the --disable-smart-shrinking option, which is listed in the wkhtmltopdf documentation as a PageOption
How can I do that?
public sealed class PdfConverter
{
static readonly PdfConverter instance = new PdfConverter();
private IConverter converter;
static PdfConverter()
{
}
PdfConverter()
{
// Keep the converter somewhere static, or as a singleton instance! Do NOT run this code more than once in the application lifecycle!
this.converter = new ThreadSafeConverter( new RemotingToolset<PdfToolset>( new Win32EmbeddedDeployment( new TempFolderDeployment())));
}
public static PdfConverter Instance
{
get { return instance; }
}
public byte[] ConvertHtmlToPdf(string html)
{
var document = new HtmlToPdfDocument
{
Objects = { new ObjectSettings { HtmlText = html } }
// Where are PageOptions? Thats where --disable-smart-shrinking is
};
return converter.Convert(document);
}
}

The --disable-smart-shrinking option does not exist in the API -- well, it kind of does, but in the form of it's opposite sibling: --enable-smart-shrinking.
That property is available in the TuesPechkin API as WebSettings.EnableIntelligentShrinking as seen in the TuesPechkin source code. It was named that way in TuesPechkin because that is how it is named in wkhtmltopdf's API as seen in the wkhtmltopdf source code.
You can also see there that the default value is true (from wkhtmltopdf), so if you set WebSettings.EnableIntelligentShrinking to false you should get the result you're aiming for.

It seems this functionality hasn't been implemented in Tuespechkin. I can't find it here, where most of the page options are located.
I guess he forgot to implement the option, so probably best to request the feature here. Or you can also add the feature yourself. :)

How to get the value with VR=FL, VM=2 in Evil DICOM

I tried to get the Tag value by using:
var vSAD = sel.VirtualSourceAxisDistance.Data;
I also tried var vSAD = dcm.FindAll("300A030A");
And it only returned one number (suppose to have 2).
Then I tried to read elements and save to another dicom file only and found for VR=FL, VM=2 case only one number showed up in the new file.
How can I fix this to get 2 numbers?
Does it mean when I use var dcm = DICOMFileReader.Read(openFileDialog1.FileName);
It already return with only one number?
I saw in the FloatingPiontSingle.cs file:
public class FloatingPointSingle : AbstractElement<float?>
{
public FloatingPointSingle() { }
public FloatingPointSingle(Tag tag, float? data)
{
Tag = tag;
Data = data;
VR = Enums.VR.FloatingPointSingle;
}
}

I didn't realize the FL VM could be more than one. I just looked at the DICOM specification though and realize that it is possible. It is actually an easy fix. Could you post a link to a sample (anonymized) DICOM file that contains such a value and I will patch the core framework.
FYI: To patch yourself, you would need to change the FloatingPointSingle to:
public class FloatingPointSingle : AbstractElement<float[]>
{
public FloatingPointSingle() { }
public FloatingPointSingle(Tag tag, float[] data)
{
Tag = tag;
Data = data;
VR = Enums.VR.FloatingPointSingle;
}
}
Then in the LittleEndianReader.ReadSinglePrecision(), and BigEndianReader.ReadSinglePrecision() method you will need to change out the logic to allow concatenated floating point numbers (no delimiter).

AutoCorrect Text C# Word

I'm trying to use word to automatically correct some text that is not in English the problem is that when i use the SpellCheck function the "Spell and Grammar" dialog box pop-up and waits for users input and i want the text to be corrected automatically. So my question is how do i solve this ?
using System.Collections.Generic;
using Microsoft.Office.Interop.Word;
using Word = Microsoft.Office.Interop.Word;
using TobyCL.ro.toby.StringOperations;
namespace namespace.ro.toby
{
class WordProofing:IProof
{
private readonly Word.Application _wordApp;
private readonly Word.Document _wordDoc;
private static object _oEndOfDoc = "\\endofdoc";
public WordProofing()
{
_wordApp = new Word.Application {Visible = false};
_wordDoc = _wordApp.Documents.Add();
}
public void Close()
{
object obj = Word.WdSaveOptions.wdDoNotSaveChanges;
_wordDoc.Close(ref obj);
_wordApp.Quit(ref obj);
}
#region Implementation of IProof
public string Proof(string proofText)
{
Range wRng = _wordDoc.Bookmarks.get_Item(ref _oEndOfDoc).Range;
wRng.Text = proofText;
_wordDoc.CheckSpelling(IgnoreUppercase: true,AlwaysSuggest:false);
string str = wRng.Text;
wRng.Text = "";
return str;
}
#endregion
}
}
I wrote this code a few days ago and it worked. The problem is that i uninstall proofing tools to run some tests and now i keep getting that dialog so i'm thinking that may i have to set some Word settings or i've changed something in my code without knowing. Any help would be greatly appreciated.
I am using Microsoft Office Word 2010

For whoever might be interested this is the way i managed to solve it, but it really takes a lot of time so any improvements or new ideas are welcomed.
using Microsoft.Office.Interop.Word;
class WordProofing
{
private Application _wordApp;
private readonly Document _wordDoc;
private static object _oEndOfDoc = "\\endofdoc";
public WordProofing()
{
_wordApp = new Application { Visible = false };
_wordDoc = _wordApp.Documents.Add();
}
public void Close()
{
_wordDoc.Close(WdSaveOptions.wdDoNotSaveChanges);
_wordApp.Quit();
}
public string Proof(string proofText)
{
Range wRng = _wordDoc.Bookmarks.get_Item(ref _oEndOfDoc).Range;
wRng.Text = proofText;
ProofreadingErrors spellingErros = wRng.SpellingErrors;
foreach (Range spellingError in spellingErros)
{
SpellingSuggestions spellingSuggestions =
_wordApp.GetSpellingSuggestions(spellingError.Text,IgnoreUppercase:true);
foreach (SpellingSuggestion spellingSuggestion in spellingSuggestions)
{
spellingError.Text = spellingSuggestion.Name;
break;
}
}
string str = wRng.Text;
wRng.Text = "";
return str;
}
}

Which MS Word version are you using?
By default the spell checker will show you the dialog box. To disable the dialog box there are two ways that I know.
1) Using Code, automatically choose the first option from Auto Correct.
It is something like this
AutoCorrect.Entries.Add Name:="AdSAD", Value:="Assad"
2) Or use the menu option. Please refer to this link.
Topic: Automatically correct spelling with words from the main dictionary
Link: http://office.microsoft.com/en-us/word-help/automatically-correct-spelling-with-words-from-the-main-dictionary-HA010174790.aspx
Do let me know if this is not what you want?

Html parser to get blog posts

I need to create a html parser, that given a blog url, it returns a list, with all the posts in the page.
I.e. if a page has 10 posts, it
should return a list of 10 divs,
where each div contains h1 and
a p
I can't use its rss feed, because I need to know exactly how it looks like for the user, if it has any ad, image etc and in contrast some blogs have just a summary of its content and the feed has it all, and vice-versa.
Anyway, I've made one that download its feed, and search the html for similar content, it works very well for some blogs, but not for others.
I don't think I can make a parser that works for 100% of the blogs it parses, but I want to make the best possible.
What should be the best approach? Look for tags that have its id attribute equal "post", "content"? Look for p tags? etc etc etc...
Thanks in advance for any help!

I don't think you will be successful on that. You might be able to parse one blog, but if the blog engine changes stuff, it won't work any more. I also don't think you'll be able to write a generic parser. You might even be partially successful, but it's going to be an ethereal success, because everything is so error prone on this context. If you need content, you should go with RSS. If you need to store (simply store) how it looks, you can also do that. But parsing by the way it looks? I don't see concrete success on that.

"Best possible" turns out to be "best reasonable," and you get to define what is reasonable. You can get a very large number of blogs by looking at how common blogging tools (WordPress, LiveJournal, etc.) generate their pages, and code specially for each one.
The general case turns out to be a very hard problem because every blogging tool has its own format. You might be able to infer things using "standard" identifiers like "post", "content", etc., but it's doubtful.
You'll also have difficulty with ads. A lot of ads are generated with JavaScript. So downloading the page will give you just the JavaScript code rather than the HTML that gets generated. If you really want to identify the ads, you'll have to identify the JavaScript code that generates them. Or, your program will have to execute the JavaScript to create the final DOM. And then you're faced with a problem similar to that above: figuring out if some particular bit of HTML is an ad.
There are heuristic methods that are somewhat successful. Check out Identifying a Page's Primary Content for answers to a similar question.

Use the HTML Agility pack. It is an HTML parser made for this.

I just did something like this for our company's blog which uses wordpress. This is good for us because our wordress blog hasn't changed in years, but the others are right in that if your html changes a lot, parsing becomes a cumbersome solution.
Here is what I recommend:
Using Nuget install RestSharp and HtmlAgilityPack. Then download fizzler and include those references in your project (http://code.google.com/p/fizzler/downloads/list).
Here is some sample code I used to implement the blog's search on my site.
using System;
using System.Collections.Generic;
using Fizzler.Systems.HtmlAgilityPack;
using RestSharp;
using RestSharp.Contrib;
namespace BlogSearch
{
public class BlogSearcher
{
const string Site = "http://yourblog.com";
public static List<SearchResult> Get(string searchTerms, int count=10)
{
var searchResults = new List<SearchResult>();
var client = new RestSharp.RestClient(Site);
//note 10 is the page size for the search results
var pages = (int)Math.Ceiling((double)count/10);
for (int page = 1; page <= pages; page++)
{
var request = new RestSharp.RestRequest
{
Method = Method.GET,
//the part after .com/
Resource = "page/" + page
};
//Your search params here
request.AddParameter("s", HttpUtility.UrlEncode(searchTerms));
var res = client.Execute(request);
searchResults.AddRange(ParseHtml(res.Content));
}
return searchResults;
}
public static List<SearchResult> ParseHtml(string html)
{
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var results = doc.DocumentNode.QuerySelectorAll("#content-main > div");
var searchResults = new List<SearchResult>();
foreach(var node in results)
{
bool add = false;
var sr = new SearchResult();
var a = node.QuerySelector(".posttitle > h2 > a");
if (a != null)
{
add = true;
sr.Title = a.InnerText;
sr.Link = a.Attributes["href"].Value;
}
var p = node.QuerySelector(".entry > p");
if (p != null)
{
add = true;
sr.Exceprt = p.InnerText;
}
if(add)
searchResults.Add(sr);
}
return searchResults;
}
}
public class SearchResult
{
public string Title { get; set; }
public string Link { get; set; }
public string Exceprt { get; set; }
}
}
Good luck,
Eric

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using RichEditDocumentServer to change from rtf to text - c#

Related

How to find the other part of a string

How to set PageOptions in TuesPechkin

How to get the value with VR=FL, VM=2 in Evil DICOM

AutoCorrect Text C# Word

Html parser to get blog posts

Categories

Resources