How to search the text in WebBrowser Control - c#

Hai, we have a requirement like previewing the documents as like in
windows7/vista "preview Pane" and have to implement the search
functionality in Winforms.For that, I have loaded the Ms office
docuemnts, and Excel Sheets and Power Point documents into Webbrowser
Control in html format.Now we need to implement the search
functionality in that webbrowser Control. For that we have
implemented the search functionality using the following code
public bool FindNext(string text, WebBrowser webBrowser1)
{
IHTMLDocument2 doc = webBrowser1.Document.DomDocument as IHTMLDocument2;
IHTMLSelectionObject sel =doc.selection;// (IHTMLSelectionObject)doc.selection;
IHTMLTxtRange rng = sel.createRange() as IHTMLTxtRange;
rng.collapse(false); // collapse the current selection so we start from the end of the previous range
if (rng.findText(text, 1000000, 0))
{
rng.select();
return true;
}
else
FindFirst(text, webBrowser1);
return false;
}
This code is working fine for searching the string values in word
Document for all the occurences. But when it Comes to Excel and PPt
documents this code doesn't work properly.it finds only the first
occurrence of the string other occurrences doesn't find. while
debugging i found that,for word Documents "IHTMLDocument2" object
stores the html Content in "innerHTML" and "innerText" with some
values. But for Excel it stores the text in "innerHTML" only by using
the Frames, and sheets are referenced to local temporary .html files
and it didn't have any "innerText" content it showing as null.
Please provide the solution to search the text in webbrowser control which loaded the html content, that is converted from
Excel,PPT to html type and displaying in webbrowser control.
If you have any queries please feel to ask me.
Thank You.

Related

C# Paste HTML to Excel or PowerPoint

How to paste HTML ( tables ) code into Excel or PowerPoint?
I've overcome some issues concerning pasting HTML into Excel and PowerPoint and noticed that a lot of people are asking that.
I'd like to share my research, solution I made out for it.
Let's say we have a html file named html and we would like to access it in Excel, let's do following:
Clipboard.SetText(html);
We copy our html into the Clipboard. The clipboard generates from the html a real table or image/chart from the input file.
System.Threading.Thread.Sleep(2000);
Let's wait a second to have a preview
sheet.Range(cellmapp).PasteSpecial();
Now, we paste the content into a range that we could like to paste it, by defining cellmap.
System.Threading.Thread.Sleep(1000);
Let's wait a second to see the output
sheet.UsedRange.Copy(Missing.Value);
Now, in order to copy the table image into PowerPoint, we must work the with UsedRange.Copy, because it will copy the currently selected Excel area.
In order to check that we paste it into the correct Powerpoint slide
foreach (PowerPoint.Slide slide in presentation.Slides)
{
foreach (PowerPoint.Shape pptshape in slide.Shapes)
{
if(<your condition satisfies>)
{
slide.Select(); // some position in any slide
pptshape.Delete();//delete old content that was in that slide
ppApp.ActiveWindow.View.PasteSpecial(); //paste the Excel content
}
}
}
Of course there are other solutions, like making an image out of the html code and pasting that, which was my initial idea.
Another post refering that manipulation:
Showing HTML in PowerPoint

I want to select all and copy it to clipboard

I have a WebBrowser displaying text.
If i copy it to clipbaord it copy's all the html tags to and i don't want that.
I want to be able to select all then copy to clipboard.
I want to copy the text and its formatting to the clipboard.
When i highlight the text my self and click copy when i paste, its perfect just how i want it.
But when i use this code to copy just the Document text i get the Html tags to.
This is how i copy to clipboard:
void CopyCellText()
{
Clipboard.Clear();
if (webBrowser1 != null)
{
Clipboard.SetText(webBrowser1.DocumentText.ToString().Trim());
}
}
To Select all and copy to clipboard:
webBrowser1.Document.ExecCommand("SelectAll", true, null);
webBrowser1.Document.ExecCommand("Copy", true, null);
You wont see the html tags but have all there formatting.
You mean you want to convert your html code to text and copy to clipboard? You will need HTML Agility Pack. Check this page for an easy guide.
http://www.dreamincode.net/code/snippet1921.htm << check this code snippet. it would be better, if you strip the string while using regex!
I think the reason you are getting the HTML tags is webBrowser1.DocumentText will take the entire content of the HTML document itself, which will include all the generated HTML.
A quick search gave me the following:
Retrieving Selected Text from Webbrowser control in .net(C#)
Get all text from WebBrowser control

How do I get the equivalent of CTRL-A / CTRL-C in a WPF WebBrowser

I'm new to WPF and also C# so I'll try to be as specific as possible so you'll understand.
What am I trying to do?
I have a WPF Page with a WebBrowser control on it. I am navigating to a specific URL which displays perfectly in the control. Now, I would like to programmatically select all and copy the content of the webpage to my clipboard.
What have I tried
dynamic doc = webbrowser1.Document;
var htmlText = doc.documentElement.InnerText;
This however removes some formatting like empty tablecolumns so it will not be the same data as CTRL-A / CTRL-C
I have also tried the above with InnerHTML and that gives me the HTML code. When I then paste that into an empty notepad and save it as .html file, externally open in IE and perform the CTRL-A / CTRL-C it gives me the desired result.
Any idea how to get the EXACT same result through code?!
Use the following code:
dynamic document = browser.Document;
document.ExecCommand("SelectAll", true, null);
document.ExecCommand("Copy", false, null);

Embedded WebBrowser in Windows Form C# project

I have a form with an embedded web browser control on it. I am currently using WebBrowser and use it like so:
webBrowser1.Navigate("about:blank");
HtmlDocument doc = this.webBrowser1.Document;
doc.Write(string.Empty);
String htmlContent = GetHTML();
doc.Write(htmlContent);
This writes the HTML correctly to the web browser control BUT it never clears the existing data and it just appends, so I end up with N web pages stacked on top of each other.
Is this the best control to use? If so why is it not clearing existing data?
You need to use:
HtmlDocument doc = this.webBrowser1.Document.OpenNew(true);
now the contents of the document will be cleared before writing.
All calls to Write should be preceded
by a call to OpenNew, which will clear
the current document and all of its
variables. Your calls to Write will
create a new HTML document in its
place. To change only a specific
portion of the document, obtain the
appropriate HtmlElement and set its
InnerHtml property.
Yes, it is.
You should be able to call the Clear method if you need to clear contents.
Check this article for in-depth details and sample code:
http://www.codeproject.com/KB/miscctrl/simplebrowserformfc.aspx
Call HtmlDocument.OpenNew between pages:
OpenNew will clear the previous loaded
document, including any associated
state, such as variables. It will not
cause navigation events in WebBrowser
to be raised.

Broken tables in RichTextBox control (word wrap) [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn’t the richtextbox displaying this table properly?
We are having problems with the Windows.Forms.RichTextBox control in Visual Studio 2008.
We are trying to display text supplied as an RTF file by a 3rd party in a windows forms application (.NET 3.5). In this RTF text file there are tables, which contain text that spans multiple lines. The RTF file displays correctly when opened with either WordPad or Word 2003.
However, when we load the RTF file into the RichTextBox control, or copy & paste the whole text (including the table) into the control, the table does not display correctly - the cells are only single line, without wrapping.
Here are links to images showing the exact problem:
Correctly displayed in WordPad
Incorrectly displayed in RichTextBox control
I have googled for solutions and 3rd party .net RTF controls without success. I have found this exact problem asked on another forum without an answer (in fact that's where the link to the images come from) so I'm hoping stack overflow does better ;-)
My preferred solution would be to use code or a 3rd party control that can correctly render the RTF. However, I suspect the problem is that the RichTextBox control only supports a subset of the full RTF spec, so another option would be to modify the RTF directly to remove the unsupported control codes or otherwise fix the RTF file itself (in which case any information as to what control codes need to be removed or modified would be a huge help).
The Rich Text box from .NET is extremely buggy.
In RTF, the way a table is defined is actually quite different from what you could expect if you are used to HTML.
HTML:
<table>
<tr>
<td>Mycell</td>
</tr>
</table>
In RTF, a table is simply a series of paragraphs with control words defining rows, cells, borders. There is no group tag for the start/end of a table.
RTF:
\trowd\trgraph \cellx1000 Mycell \cell\row\pard\par
If you want to add a paragraph inside a cell, you use \par and the control \intbl (in table) to indicate the paragraph is inside the table.
.NET RTB can handle only a very small subset of RTF control words and doesn't support the vast majority of available commands. By the looks of things, \intbl is part of the long long list of control words it doesn't support, and if it actually parses \par at that point, the display is trashed.
Unfortunately, I don't have a solution for that but I hope the small explanation above helps you make some sense of the problem.
Don't put too much faith on my RTF sample. It works, but it's absolutely bare-bones. You can download the RTF specifications from Microsoft's website:
Word 2007 RTF specs.
Can you use the old COM control instead of the new .NET control, or do you require a "pure" .NET solution?
In other words, go into the Visual Studio toolbox, right click, choose "Choose Items", look in the COM Components tab and check Microsoft Rich Textbox Control 6.0.
Answering my own question here, but only due to the help from Joel and sylverdrag...
The short answer is that both the .Net and underlying COM RichTextBox do not support word wrap in tables. I ended up knocking up a test application and using both the COM and .Net RichTextBox controls and they both exhibited the same (broken) behaviour.
I also downloaded the RTF spec from the link supplied by sylverdrag and after tinkering with hand-made RTF documents in MS Word and RichTextEdit controls, I can confirm that TichTextBox does not correctly support the \intbl control word - which is required for word wrap in tables.
There appear to be three possible solutions:
Use TX Text Control. I have confirmed this works using a trial version but it is expensive - prices start at US$549 per developer.
Use an embedded MS Word instance as discussed on Code Project. Note that the code example provided on Code Project didn't work out of the box but I did get it working with Office 2003 & VS 2008. After much mucking around we hit an unexpected show stopper - we want the document to be read-only so we Protect() the document. While this works, when a user tries to edit the document the MS Word "Protect Document" side bar pops out from the right hand side of the control. We can't live with this and I was not able to turn it off (and from googling it looks like I'm not alone).
Give up on RTF and use HTML instead and then render the document in a WebBrowser control instead of a RichTextEdit control. That is the option we are taking as it turns out the source document is available in either format.
Step 1, Use the old COM Microsoft Rich Textbox Control 6.0;
Step 2, Make a copy of Windows\System32\MsftEdit.dll and then rename it to riched20.dll;
Step 3, Copy riched20.dll to your app folder such as bin\bebug.
This works fine, table displays correctly.
Wordpad is generally a very thin wrapper over the rich edit control, so if it appears properly there then Windows should be able to handle it.
Perhaps you're instantiating the wrong version of the rich edit control? There have been many, and Windows continues to supply the older ones for backwards compatibility. http://msdn.microsoft.com/en-us/library/bb787873(VS.85).aspx
Just create a new Control. It works fine for me.
using System;
using System.ComponentModel;
using System.Windows.Forms;
using System.Runtime.InteropServices;
public class RichTextBox5 : RichTextBox {
private static IntPtr moduleHandle;
protected override CreateParams CreateParams {
get {
if (moduleHandle == IntPtr.Zero) {
moduleHandle = LoadLibrary("msftedit.dll");
if ((long)moduleHandle < 0x20) throw new Win32Exception(Marshal.GetLastWin32Error(), "Could not load Msftedit.dll");
}
CreateParams createParams = base.CreateParams;
createParams.ClassName = "RichEdit50W";
if (this.Multiline) {
if (((this.ScrollBars & RichTextBoxScrollBars.Horizontal) != RichTextBoxScrollBars.None) && !base.WordWrap) {
createParams.Style |= 0x100000;
if ((this.ScrollBars & ((RichTextBoxScrollBars)0x10)) != RichTextBoxScrollBars.None) {
createParams.Style |= 0x2000;
}
}
if ((this.ScrollBars & RichTextBoxScrollBars.Vertical) != RichTextBoxScrollBars.None) {
createParams.Style |= 0x200000;
if ((this.ScrollBars & ((RichTextBoxScrollBars)0x10)) != RichTextBoxScrollBars.None) {
createParams.Style |= 0x2000;
}
}
}
if ((BorderStyle.FixedSingle == base.BorderStyle) && ((createParams.Style & 0x800000) != 0)) {
createParams.Style &= -8388609;
createParams.ExStyle |= 0x200;
}
return createParams;
}
}
// P/Invoke declarations
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
private static extern IntPtr LoadLibrary(string path);
}
This is not a issue of RitchText Control provided in .net . some Ritchtext rules (Ritchtext Synatax) has been changed in new version of Ms-office (2007). however the component used in .net cannot update to cater the new rules so the issue occours.
Anand

Categories