saving webbrowser url to xml and retriving

saving webbrowser url to xml and retriving - c#

I am trying to save my webbrowser controler URL to a xml file but I am getting an issue with certain characters preventing the save.
When I open a simple URL like this:
www.saypeople.com
It succesfully saves, however when I want to save webpage url like this:
http://scholar.google.com.pk/scholar?as_q=filetype:pdf +transistor+ AND&num=10&btnG=Search+Scholar&as_epq=&as_oq=unknown+unclear&as_eq=&as_occt=any&as_sauthors=+ &as_publication=+ &as_ylo=&as_yhi=&as_sdt=1.&as_sdtp=on&as_sdtf=&as_sdts=5&hl=en
The save fails.
I have checked a lot things and have discovered that my code only does not save when the url contains any of the two characters &<.
Please help me out.
Here is my code...
public static DialogResult Show(string Title, String url)
{
MsgBox = new addfav();
MsgBox.textBox1.Text = Title;
MsgBox.textBox2.Text = url;
MsgBox.ShowDialog();
return result;
}
const string dataxml = "data.xml";
private void button1_Click(object sender, EventArgs e)
{
//textBox2.Text containing webpage url
//textBox1.Text containing webpage title
try
{
XmlTextReader reader = new XmlTextReader(dataxml);
XmlDocument doc = new XmlDocument();
doc.Load(reader);
reader.Close();
XmlNode currNode;
XmlDocumentFragment docFrag = doc.CreateDocumentFragment();
docFrag.InnerXml = "<fav>" + "<Title>" + textBox1.Text + "</Title>" + "<url>"+ textBox2.Text + "</url>" + "</fav>";
// insert the availability node into the document
currNode = doc.DocumentElement;
currNode.InsertAfter(docFrag, currNode.LastChild);
//save the output to a file
doc.Save(dataxml);
this.DialogResult = DialogResult.OK;
MessageBox.Show("Sucessfully Added");
}
catch (Exception ex)
{
Console.WriteLine("Exception: {0}", ex.ToString());
this.DialogResult = DialogResult.Cancel;
}
MsgBox.Close();
}
and
how can i retrive url by searching specific title in xml.
<fav>
<Title>hello</Title>
<url><![CDATA[http://scholar.google.com.pk/scholar?as_q=filetype:pdf +hello+ AND&num=10&btnG=Search+Scholar&as_epq=&as_oq=unknown+unclear&as_eq=&as_occt=any&as_sauthors=+ &as_publication=+ &as_ylo=&as_yhi=&as_sdt=1.&as_sdtp=on&as_sdtf=&as_sdts=5&hl=en]]></url>
</fav>
<fav>
<Title>toad</Title>
<url><![CDATA[http://www.sciencedaily.com/search/?keyword=toad+ AND unknown OR unclear]]></url>
</fav>
i want to search and save the url of toad title in string... please help me out...
thx

Wrap the URL in a CDATA section like:
<![CDATA[THE URL CONTENT]]>
Your problems result from the fact that you can not used & and < as XML-data, as they have special meanings in XML: & starts an XML entity, < starts an XML tag. So when you need to add & and < as values, it's easiest to used a CDATA section.
EDIT
You may try the following:
XmlDocumentFragment docFrag = doc.CreateDocumentFragment();
docFrag.InnerXml = "<fav>";
docFrag.InnerXml += String.Format("<Title>{0}</Title>", textBox1.Text);
docFrag.InnerXml += String.Format("<Url><![CDATA[{0}]]></Url>", textBox2.Text);
docFrag.InnerXml += "</fav>";

You could use HttpUtility.HtmlEncode(url).

Your problem is here:
docFrag.InnerXml = "<fav>" + "<Title>" + textBox1.Text + "</Title>"
+ "<url>"+ textBox2.Text + "</url>" + "</fav>";
<, > and & that caused you problems are markup in XML. InnerXML does not escape markup and those characters get written as they are which would result in an invalid XML fragment. For adding the URL, use InnerText instead. It escapes those characters.

To navigate through an XML file, you have to use a navigator as shown here.
XPathDocument xpathDoc = new XPathDocument([location of the file]);
XPathNavigator Navigator = xpathDoc.CreateNavigator();
String url_nav = "fav/url/text()";
XPathNodeIterator url_iterator = Navigator.Select(url_nav);
String URL_value = url_iterator.Current.Value;
url_iterator.MoveNext();
If the file is too heavily nested, go for XML serialization.

Related

Creating an xml parsing function in C#

I have an XML that's obtained from a web service, i'm using an HttpClient for it. This is what the XML looks like:
<respuesta>
<entrada>
<rut>7059099</rut>
<dv>9</dv>
</entrada>
<status>
<code>OK</code>
<descrip>Persona tiene ficha, ok</descrip>
</status>
<ficha>
<folio>3204525</folio>
<ptje>7714</ptje>
<fec_aplic>20080714</fec_aplic>
<num_integ>2</num_integ>
<comuna>08205</comuna>
<parentesco>1</parentesco>
<fec_puntaje>20070101</fec_puntaje>
<personas>
<persona>
<run>7059099</run>
<dv>9</dv>
<nombres>JOSE SANTOS</nombres>
<ape1>ONATE</ape1>
<ape2>FERNANDEZ</ape2>
<fec_nac>19521101</fec_nac>
<sexo>M</sexo>
<parentesco>1</parentesco>
</persona>
<persona>
<run>8353907</run>
<dv>0</dv>
<nombres>JUANA DEL TRANSITO</nombres>
<ape1>MEDINA</ape1>
<ape2>ROA</ape2>
<fec_nac>19560815</fec_nac>
<sexo>F</sexo>
<parentesco>2</parentesco>
</persona>
</personas>
</ficha>
I'm trying to make a function that can parse this and, right now (just for the purpose of testing my understanding of the language since i'm new to it) i just need it to find the VALUE inside an "rut" tag, the first one, or something like that. More precisely I need to find a value inside the XML and return it, so i can show it on a label that's on my .aspx page. The code of my parsing function looks like this:
public static String parseXml(String xmlStr, String tag)
{
String valor;
using (XmlReader r = XmlReader.Create(new StringReader(xmlStr)))
{
try
{
r.ReadToFollowing(tag);
r.MoveToContent();
valor = r.Value;
}
catch (Exception ex)
{
throw new Exception(ex.Message, ex.InnerException);
}
}
return valor;
}
This code is based on an example I found on youtube made by the guys from microsoft where they "explain" how to use the parser.
Also, this function is being called from inside one of the tasks of the HttpClient, this is it:
protected void rutBTN_Click(object sender, EventArgs e)
{
if (rutTB.Text != "")
{
HttpClient client = new HttpClient();
String xmlString = "";
String text = "";
var byteArray = Encoding.ASCII.GetBytes("*******:*******"); //WebService's server authentication
client.BaseAddress = new Uri("http://wschsol.mideplan.cl");
var par = "mod_perl/xml/fps-by-rut?rut=" + rutTB.Text;
client.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray));
client.GetAsync(par).ContinueWith(
(requestTask) =>
{
HttpResponseMessage resp = requestTask.Result;
try
{
resp.EnsureSuccessStatusCode();
XmlDocument xmlResp = new XmlDocument();
requestTask.Result.Content.ReadAsStreamAsync().ContinueWith(
(streamTask) =>
{
xmlResp.Load(streamTask.Result);
text = xmlResp.InnerXml.ToString();
xmlString = parseXml(text, "rut"); //HERE I'm calling the parsing function, and i'm passing the whole innerXml to it, and the string "rut", so it searches for this tag.
Console.WriteLine("BP");
}).Wait();
}
catch (Exception ex)
{
throw new Exception(ex.Message, ex.InnerException);
}
}).Wait();
testLBL.Text = xmlString; //Finally THIS is the label i want to show the "rut" tag's value to be shown.
testLBL.Visible = true;
}
else
{
testLBL.Text = "You must enter an RUT number";
testLBL.Visible = true;
}
}
The problem is that when i put some breakpoints into the parsing function i can see that it's receiving correctly the innerxml string (as a string) but it's not finding the tag called "rut", or rather not finding anything at all, since it's returning an empty string ("").
I know that maybe this is not the correct way to parse an xmlDocument, so if someone can help me out i'd be really really thankful.
EDIT:
Ok, so i won't ask for any tutorial or such (I requested that to avoid asking noob questions). But anyway, please, instead of just answering "you better do it like this", I'd appreciate if you could explain me things like "THIS is what you're doing wrong and THAT'S why your code isn't working", and THEN tell me how you guys would do it instead.
Thanks in advance!

As you only want to retrieve a single field value I would recommend using Xpath.
Basically you create a XpathNavigator from a XpathDocument or xmlDocument and then use Select to get the content of the rut node:
XPathNavigator navigator = xmlResp.CreateNavigator();
XPathNodeIterator rutNode = navigator.SelectSingleNode("/respuesta/entrada/rut");
string rut = rutNode.Value

How to display hierarchical XML from a DataSet

I am trying to generate a XML using [WebMethod]..
Here is my webmethod in MyMethod.asmx.cs:
[WebMethod]
public string GetTransactionList(string AccNo)
{
TransactionRepository transRep = new TransactionRepository();
IList<Transaction> listTrans = transRep.GetTransactionList(AccNo);
IList<Parameter> listPara = transRep.GetAllParameter();
DataTable tblTrans = CommonDatatableMethods.ConvertToDataTable<Transaction>(listTrans, true);
tblTrans.TableName = "transaction";
tblTrans.Columns.Remove("SequenceNo");
tblTrans.Columns.Remove("ID");
/** Note: Require insert into dataset to gain custom root name**/
DataSet dsTrans = new DataSet("Transactions");
dsTrans.Tables.Add(tblTrans);
StringWriter sw = new StringWriter();
tblTrans.WriteXml(sw);
if (listTrans.Count <= 0)
{
string emptydata = string.Empty;
emptydata = "<Transactions><transaction>No record for " + AccNo + " within " + listPara[0].ParameterYears + " years data.</transaction></Transactions>";
return emptydata;
}
else
{
/** Note: Need to remove all the schemas in meta tag**/
return sw.ToString().Replace(" xsi:type=\"xs:string\" xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"", "").Replace(" xml:space=\"preserve\"", "");
}
}
But when I click button invoke on my webservice,
The XML display only one line like this:
<Transactions><transaction><AccNo>ABC15279182719</AccNo><FirstName>AHMAD ALI</FirstName></transaction></Transactions>
What I want is something hierarchical like this:
<Transactions>
<transaction>
<AccNo>ABC15279182719</AccNo>
<FirstName>AHMAD ALI</FirstName>
</transaction>
</Transactions>
How do I make like that?
Thank You!

As sa_ddam213 said, your web method is working fine and returns the string value.
If you need to format the output result. open your notepad in visual studio and press ctrl+k+d for formatting.
--SJ

fetch google map data from external website

Is this possible to read or parse google map data(i-e http://www.ukairquality.net/MicrosoftMapFast2.aspx ) from external website
I just want to get data from this and store this data in my sql...
private void button1_Click(object sender, EventArgs e)
{
XmlDocument XmlDoc = new XmlDocument();
XmlNamespaceManager XmlNs = new XmlNamespaceManager(XmlDoc.NameTable);
XmlNs.AddNamespace("def", "http://earth.google.com/kml/2.0");
string url = "http://www.ukairquality.net/MicrosoftMapFast2.aspx ";
XmlDoc.Load(url);
//XmlDoc.Save(MapPath(#"~\xml\test.xml"));
XmlNodeList Nodes = XmlDoc.SelectNodes("//def:coordinates", XmlNs);
foreach (XmlNode Node in Nodes)
{
textBox1.Text= Response.Write(Node.InnerText + "<br />");
}
}

Sure you can load data from other webstes, but the URL you give is not XML. You'll get an error if you try to parse it as XML.
If you want to get the data from this UK site, then you'll need to look at their javascript and use string functions to get the parts you want.

The Code return error A column named 'link' already belongs

public partial class Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
try
{
DataSet ds = new DataSet();
ds.ReadXml(#"http://tecnologia.ig.com.br/rss.xml");
XmlDocument doc = new XmlDocument();
XmlUrlResolver resolver = new XmlUrlResolver();
resolver.Credentials = new System.Net.NetworkCredential("bruno", "*****");//intentionally hiding real password from stackoverflow
doc.XmlResolver = resolver;
foreach (DataRow dr in ds.Tables["item"].Rows)
{
Response.Write("Item TITLE: " + dr["title"].ToString() + "<br />");
Response.Write("Descrição : " + dr["description"].ToString() + "<br />");
Response.Write("Data de Publicação: " + dr["pubDate"].ToString() + "<br />");
}
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
}
I execute the Code, and the system return two error's
1 - A column named 'link' already belongs to this DataTable: cannot set a nested table name to the same name.
2 - The remote server returned an error: (407) Proxy Authentication Required.
Thanks!

Just provide an idea for reading RSS Feed:
Create a WebRequest and WebReponse object:
WebRequest request=WebRequest.Create("your url");
WebReponse response=request.GetRespose();
Create a XML document and load the XML document with stream from response object:
Stream rssStream=response.GetResponseStream();
XMLDocument xmlDoc=new XMLDocument();
xmlDoc.Load(rssStream);
Retrieve matching XML nodes from XMLDocument with XMLNodeList:
XmlNodeList xmlNodeList = xmlDoc.SelectNodes("your XPath expression");
Now you can loop the RSS feed items to get what you want:
for (int i = 0; i < xmlNodeList.Count; i++)
{
XmlNode xmlNode;
xmlNode = xmlNodeList.Item(i).SelectSingleNode("ProductName");
//xmlNode.InnerText;
}

The 2nd problem:
You probably have a corporate proxy, try using this in the web.config:
<system.net>
<defaultProxy useDefaultCredentials="true" />
</system.net>
As for the 1st problem:
http://forums.asp.net/t/1220157.aspx/1
The issue is likely because XML will allow duplicate nodes and a data table will not allow duplicate columns, so you cannot parse directly to a data table.
I would suggest doing some research into parsing and using XML data before going to far down the data table route

C# WebBrowser control not applying css

I have a project that I am working on in VS2005. I have added a WebBrowser control. I add a basic empty page to the control
private const string _basicHtmlForm = "<html> "
+ "<head> "
+ "<meta http-equiv='Content-Type' content='text/html; charset=utf-8'/> "
+ "<title>Test document</title> "
+ "<script type='text/javascript'> "
+ "function ShowAlert(message) { "
+ " alert(message); "
+ "} "
+ "</script> "
+ "</head> "
+ "<body><div id='mainDiv'> "
+ "</div></body> "
+ "</html> ";
private string _defaultFont = "font-family: Arial; font-size:10pt;";
private void LoadWebForm()
{
try
{
_webBrowser.DocumentText = _basicHtmlForm;
}
catch(Exception ex)
{
MessageBox.Show(ex.Message);
}
}
and then add various elements via the dom (using _webBrowser.Document.CreateElement). I am also loading a css file:
private void AddStyles()
{
try
{
mshtml.HTMLDocument currentDocument = (mshtml.HTMLDocument) _webBrowser.Document.DomDocument;
mshtml.IHTMLStyleSheet styleSheet = currentDocument.createStyleSheet("", 0);
TextReader reader = new StreamReader(Path.Combine(Path.GetDirectoryName(Application.ExecutablePath),"basic.css"));
string style = reader.ReadToEnd();
styleSheet.cssText = style;
}
catch(Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Here is the css page contents:
body {
background-color: #DDDDDD;
}
.categoryDiv {
background-color: #999999;
}
.categoryTable {
width:599px; background-color:#BBBBBB;
}
#mainDiv {
overflow:auto; width:600px;
}
The style page is loading successfully, but the only elements on the page that are being affected are the ones that are initially in the page (body and mainDiv). I have also tried including the css in a element in the header section, but it still only affects the elements that are there when the page is created.
So my question is, does anyone have any idea on why the css is not being applied to elements that are created after the page is loaded? I have also tried no applying the css until after all of my elements are added, but the results don't change.

I made a slight modification to your AddStyles() method and it works for me.
Where are you calling it from? I called it from "_webBrowser_DocumentCompleted".
I have to point out that I am calling AddStyles after I modify the DOM.
private void AddStyles()
{
try
{
if (_webBrowser.Document != null)
{
IHTMLDocument2 currentDocument = (IHTMLDocument2)_webBrowser.Document.DomDocument;
int length = currentDocument.styleSheets.length;
IHTMLStyleSheet styleSheet = currentDocument.createStyleSheet(#"", length + 1);
//length = currentDocument.styleSheets.length;
//styleSheet.addRule("body", "background-color:blue");
TextReader reader = new StreamReader(Path.Combine(Path.GetDirectoryName(Application.ExecutablePath), "basic.css"));
string style = reader.ReadToEnd();
styleSheet.cssText = style;
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Here is my DocumentCompleted handler (I added some styles to basic.css for testing):
private void _webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElement element = _webBrowser.Document.CreateElement("p");
element.InnerText = "Hello World1";
_webBrowser.Document.Body.AppendChild(element);
HtmlElement divTag = _webBrowser.Document.CreateElement("div");
divTag.SetAttribute("class", "categoryDiv");
divTag.InnerHtml = "<p>Hello World2</p>";
_webBrowser.Document.Body.AppendChild(divTag);
HtmlElement divTag2 = _webBrowser.Document.CreateElement("div");
divTag2.SetAttribute("id", "mainDiv2");
divTag2.InnerHtml = "<p>Hello World3</p>";
_webBrowser.Document.Body.AppendChild(divTag2);
AddStyles();
}
This is what I get (modified the style to make it as ugly as a single human being can hope to make it :D ):

one solution is to inspect the html prior to setting the DocumentText and inject CSS on the client side. I don't set the control url property but rather get the HTML via WebCLient and then set the DocumentText. maybe setting DocumentText (or in your case Document) after you manipulate the DOM could get it to re-render properly
private const string CSS_960 = #"960.css";
private const string SCRIPT_FMT = #"<style TYPE=""text/css"">{0}</style>";
private const string HEADER_END = #"</head>";
public void SetDocumentText(string value)
{
this.Url = null; // can't have both URL and DocText
this.Navigate("About:blank");
string css = null;
string html = value;
// check for known CSS file links and inject the resourced versions
if(html.Contains(CSS_960))
{
css = GetEmbeddedResourceString(CSS_960);
html = html.Insert(html.IndexOf(HEADER_END), string.Format(SCRIPT_FMT,css));
}
if (Document != null) {
Document.Write(string.Empty);
}
DocumentText = html;
}

It would be quite hard to say unless you send a link of this.
but usually the best method for doing style related stuff is that you have the css already in the page and in your c# code you only add ids or classes to elements to see the styles effects.

I have found that generated tags with class attribute does not get their styles applied.
This is my workaround that is done after the document is generated:
public static class WebBrowserExtensions
{
public static void Redraw(this WebBrowser browser)
{
string temp = Path.GetTempFileName();
File.WriteAllText(temp, browser.Document.Body.Parent.OuterHtml,
Encoding.GetEncoding(browser.Document.Encoding));
browser.Url = new Uri(temp);
}
}

I use similiar control instead of WebBrowser, I load HTML page with "default" style rules and I change the rules within the program.
(DrawBack - maintainance, when I need to add a rule, I also need to change it in code)
' ----------------------------------------------------------------------
Public Sub mcFontOrColorsChanged(ByVal isRefresh As Boolean)
' ----------------------------------------------------------------------
' Notify whichever is concerned:
Dim doc As mshtml.HTMLDocument = Me.Document
If (doc.styleSheets Is Nothing) Then Return
If (doc.styleSheets.length = 0) Then Return
Dim docStyleSheet As mshtml.IHTMLStyleSheet = CType(doc.styleSheets.item(0), mshtml.IHTMLStyleSheet)
Dim docStyleRules As mshtml.HTMLStyleSheetRulesCollection = CType(docStyleSheet.rules, mshtml.HTMLStyleSheetRulesCollection)
' Note: the following is needed seperately from 'Case "BODY"
Dim docBody As mshtml.HTMLBodyClass = CType(doc.body, mshtml.HTMLBodyClass)
If Not (docBody Is Nothing) Then
docBody.style.backgroundColor = colStrTextBg
End If
Dim i As Integer
Dim maxI As Integer = docStyleRules.length - 1
For i = 0 To maxI
Select Case (docStyleRules.item(i).selectorText)
Case "BODY"
docStyleRules.item(i).style.fontFamily = fName ' "Times New Roman" | "Verdana" | "courier new" | "comic sans ms" | "Arial"
Case "P.myStyle1"
docStyleRules.item(i).style.fontSize = fontSize.ToString & "pt"
Case "TD.myStyle2" ' do nothing
Case ".myStyle3"
docStyleRules.item(i).style.fontSize = fontSizePath.ToString & "pt"
docStyleRules.item(i).style.color = colStrTextFg
docStyleRules.item(i).style.backgroundColor = colStrTextBg
Case Else
Debug.WriteLine("Rule " & i.ToString & " " & docStyleRules.item(i).selectorText)
End Select
Next i
If (isRefresh) Then
Me.myRefresh(curNode)
End If
End Sub

It could be that the objects on the page EXIST at the time the page is being loaded, so each style can be applied. just because you add a node to the DOM tree, doesnt mean that it can have all of its attributes manipulated and rendered inside of the browser.
the methods above seem to use an approach the reloads the page (DOM), which suggests that this may be the case.
In short, refresh the page after you've added an element

It sounds as though phq has experienced this. I think the way I would approach is add a reference to jquery to your html document (from the start).
Then inside of the page, create a javascript function that accepts the element id and the name of the class to apply. Inside of the function, use jquery to dynamtically apply the class in question or to modify the css directly. For example, use .addClass or .css functions of jquery to modify the element.
From there, in your C# code, after you add the element dynamically invoke this javascript as described by Rick Strahl here: http://www.west-wind.com/Weblog/posts/493536.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

saving webbrowser url to xml and retriving - c#

You could use HttpUtility.HtmlEncode(url).

Related

Creating an xml parsing function in C#

How to display hierarchical XML from a DataSet

fetch google map data from external website

The Code return error A column named 'link' already belongs

C# WebBrowser control not applying css

Categories

Resources