I originally come from PHP and I have just started with ASP/.NET. I am aware that a direct equivalent of include("filename.php"); does not exist, but that's not exactly what I want to achieve.
I am attempting to create a header file which I can use on every page. I have read from many sources that making a user control is the solution. After creating all of the necessary code to make this work, I arrived at a point where the actual element on the page isn't actually created at the point when I call .InnerHtml. For example:
breadcrumbContainer.InnerHtml = "testing text";
The above code does not work when called within my file which is called header.ascx.cs even though there is a div on the page with runat="server" and the correct ID.
I am trying to find out if there is an easier way to resolve this problem. I have been told that I should avoid masterpages (even though I don't know if they are relevant in this situation). Should I create some sort of method which creates the html for the header, this way I can easily call it on every page? Are there any other solutions I haven't thought of?
If there are any good articles which clearly explain this problem, I would love the links. I have literally searched hundreds of pages on the web and found nothing that is giving me a clear understanding of how to resolve this problem.
Master pages can be relevant and very helpful in this case. Check out them out.!
When someone says "don't do something" always ask why. Do not take such advice at face value. That's exactly how phantom fears are spread and thousands of developers end of treating some programmer's pet peeve as an an absolute rule! Besides, asking "why?" will strengthen your own as well as the more senior developer's understanding of the issue at hand.
From the link:
ASP.NET master pages allow you to create a consistent layout for the
pages in your application. A single master page defines the look and
feel and standard behavior that you want for all of the pages (or a
group of pages) in your application. You can then create individual
content pages that contain the content you want to display. When users
request the content pages, they merge with the master page to produce
output that combines the layout of the master page with the content
from the content page.
Related
I run my application with VERACODE tool but I got struggling with some issues.
One of the issue which I face is Improper Neutralization of the Script-Related HTML Tags in a Web Page (Basic XSS) (CWE ID 80).
This happens in many screens in my application.
In the following particular line:
NewDivButton.Style["display"] = SearchParameters.NewDivButtonVisibility;
Does anyone have any suggestion on how to fix this issue?
Welcome Manikandan. The best answer to this question would involve knowledge of the language/framework you're using if you could share that?
One thing to note, is that there are many things you could do that would make the warning "go away", but wouldn't make your app any more secure. For that reason, it's best to understand the core of the problem, and then apply the standard fix for the language/framework you're working in. If in doubt, check with a security professional.
In general, XSS is a set of issues where you (potentially) render user input as part of your output.
In this example if I send you a link that says yoursite.com?NewDivButtonVisibility=">SendYourPrivateInfoSomewhereBad();
If you click a link like this, and the site blindly inserts the script into the page, it could steal data.
The best protection is often to validate input, only allowing known-valid input through.
Another common approach is to HTML-encode the unknown value being displayed. However, more care is needed depending on where the output is rendered (e.g. if already within a script tag)
There's much more general information on this type of issue here: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html#cross-site-scripting-prevention-cheat-sheet
I am trying to make this feature, and I'm really stuck.
I have two applications that run on the same domain. and I need to have one application load pages from the other one inside it's own (the first) master page.
I have full control of the code of both sides, of course.
I have tries using HTTPRequest, and HTTPResponse, and I have tried using WebBrowser. Both work great as long as I have static(plain HTML) pages. However,
those pages are actually dynamic. the user need to press server-side buttons (postback) and generally use the session, viewstate, and/or cookies.
because of that, HTTPRequest and WebBrowser fail me, as they do not cause postback, and therefore those server-side controls are not working. more so, if I try to "fake" a postback by saving the ViewState after each response and than resend it on the next request, after a few (3-4) times the original page will return a "The state information is invalid for this page and might be corrupted" error, even if I use
EnableViewStateMac ="false" EnableSessionState="True" EnableEventValidation ="false" ValidateRequest ="false" ViewStateEncryptionMode ="Never
So... any ideas how can I solve this issue?
Thanks in advance
What is the main desire here?
Wrap one site's content in another without any architecture changes?
ANSWER: Iframe
Have a single submit button submit from two sites?
ANSWER: Not a good idea. You might be able to kludge this by creating a scraper and parser, but it would only be cool as an "I can do it trophy". Better to rearchitect the solution. But assuming you really want to do this, you will have to parse the result from the embedded site and redirect the submit to the new site. That site will then take the values and submit the form to the first site and wait for the result, which it will scrape to give a response to the user. It is actually quite a bit more complex, as you have to parse the HTML DOM (easier if all of the HTML is XHTML compliant, of course) to figure out what to intercept.
Caveat: Any changes to the embedded site can blow up your code, so the persons who maintain the first site must be aware of this artificially created dependency so they don't change anything that might cause problems. Ouch, that sounds brittle! ;-)
Other?
If using an iFrame does not work, then I would look at the business problem and draw up an ideal architecture to solve it, which might mean making the functionality of the embedded site available via a web service for the second site.
I'm having a web application project which is running .NET 4.0. I've plenty of .aspx page and now I would like to add in a block of script code to all the .aspx page header, for example Google Analytics.
I know there is a solution to do is add in every single page, but I would like to know is there any other's way to do this instead modify every single .aspx page?
*My header is not runat server
I got an idea to do but not sure it's work or not.
Get the page class in Global.asax
Get the output stream from the page class.
Insert the Google Analytics code in the HTML header.
I couldn't get the Page.Response in the Global.asax as I tried in the Application_PostRequestHandlerExecute & also Application_EndRequest. Does anyone know is this work and how it's work?
Thanks.
Use master pages. This is the ASP.NET way of putting the same content on multiple pages without repeating yourself.
All of our aspx pages code-behind classes inherit from the same base class, which allows us to inject standard client side elements (controls, script, etc) into every page using a single point of control.
Our design was implemented before the advent of master pages, but while it could possibly be converted to a master-page design, we have found this implementation to be extremely flexible and responsive to changing needs.
For example, we have two completely separate application designs (different skin, some different behavior) that is based off of the same code base and page sets. We were able to dynamically swap out banners and other UI and script elements by simple modifications to the base class in order to support this without having to duplicate every page.
Unfortunately, if you want the script to be in the head element, you will need to ensure that they are all marked as runat=server.
Our base class itself inherits from Page, so it can intercept all of the page's events and act on them either instead of or in addition to the inheriting classes (we actually have internal overrideable methods that inheritors should use instead of the page events in order to ensure order of execution).
This is our (VB) code for adding script to the header (in the Page's LoadComplete method):
' sbscript is a stringbuilder that contains all of the javascript we want to place in the header
Me.Page.Header.Controls.Add(New LiteralControl(sbScript.ToString))
If it is not possible to change the heads to runat server, you could look into ClientScriptManager method RegisterClientScriptBlock which places the script at the top of the page.
You can create a basic page with the header with the custom code such as Google analytics and have the other pages inherit from that. It will facilitate two things:
1) In case you ever want to change the custom code you will only have to do it in one place
2) No repetitive code hence more maintainable
I am trying to do the same thing on a legacy app that we're trying to decommission. I need to display a popup on all the old pages to nag users to update their bookmarks to use the new sites, without forcing them to stop using the legacy site (yet). It is not worth the time to convert the site to run on a master page when I can just plop in a popup script, since this whole thing is getting retired soon. The main new site uses a master page, which obviously simplifies things there.
I have this line in a file that has some various constants in it.
Public Shared ReadOnly RetirementNagScript As String = "<Script Language='javascript'> alert('[app name] is being retired and will be shut down [in the near future]. Please update your bookmarks and references to the following URL: [some URL]'); </script>"
Then I am inserting it in Global.asax, in Application_PostAcquireRequestState:
Response.Write(Globals.RetirementNagScript)
Hopefully this is useful to you; I still need to be able to present a clickable URL to the user that way, on each page of the legacy site, and JS alert doesn't do that for me.
Wondering if anyone knows of any open source code about contextualization via JS (javascript) or ASP.NET ? That is, contextualization of content - determining "what" content is?
Its an interesting area and I cant seem to find any previous projects on it ?
Really appreciate any help ?
Presumably you are looking to build something like a search engine that can find a relevant document in a sea of nondescript documents which do not contain any metadata, only their textual content.
Computers are notoriously bad at this kind of categorization, for the same reasons that they can identify spelling, but not grammar errors. It's a pattern matching problem that relies on human context to determine the correct solution.
Google is good at this because it relies on human behaviors to create relevance (like how many links from other sites a page has).
The closest thing I can think of that will do what you want (without actually attaching genuine metadata to each document by hand) is full text search. The Wikipedia article has several references to software that does this.
Depending on what you want to do, it may be easier to mine your page for context after the conent has been rendered. That way you are ensured that you have the context that the user is viewing the page. Here is a post to a jQuery plugin that highlights target words on a html page.
Here are some other plugins you might want to review:
quickSearch plugin
QuickSilver Search plugin
I'm looking for an algorithm (or some other technique) to read the actual content of news articles on websites and ignore anything else on the page. In a nutshell, I'm reading an RSS feed programatically from Google News. I'm interested in scraping the actual content of the underlying articles. On my first attempt I have the URLs from the RSS feed and I simply follow them and scrape the HTML from that page. This very clearly resulted in a lot of "noise", whether it be HTML tags, headers, navigation, etc. Basically all the information that is unrelated to the actual content of the article.
Now, I understand this is an extremely difficult problem to solve, it would theoretically involve writing a parser for every website out there. What I'm interested in is an algorithm (I'd even settle for an idea) on how to maximize the actual content that I see when I download the article and minimize the amount of noise.
A couple of additional notes:
Scraping the HTML is simply the first attempt I tried. I'm not sold that this is the best way to do things.
I don't want to write a parser for every website I come across, I need the unpredictability of accepting whatever Google provides through the RSS feed.
I know whatever algorithm I end up with is not going to be perfect, but I'm interested in a best possible solution.
Any ideas?
As long as you've accepted that fact that whatever you try is going to be very sketchy based on your requirements, I'd recommend you look into Bayesian filtering. This technique has proven to be very effective in filtering spam out of email.
When reading news outside of my RSS reader, I often use Readability to filter out everything but the meat of the article. It is Javascript-based so the technique would not directly apply to your problem, but the algorithm has a high success rate in my experience and is worth a look. Hope this helps.
Take a look at templatemaker (Google code homepage). The basic idea is that you request a few different pages from the same site, then mark down what elements are common across the set of pages. From there you can figure out where the dynamic content is.
Try running diff on two pages from the same site to get an idea of how it works. The parts of the page that are different are the places where there is dynamic (interesting) content.
Here's what I would do after I checked the robots.txt file to make sure it's fine to scrap the article and parsed the document as an XML tree:
Make sure the article is not broken into many pages. If it is, 'print view', 'single page' or 'mobile view' links may help to bring it to single page. Of course, don't bother if you only want the beginning of the article.
Find the main content frame. To do that, I would count the amount of information in every tag. Now, what we're looking is a node that is big but consists of many small subnodes.
Now I would try to filter out any noise inside the content frame. Well, the websites I read don't put any crap there, only useful images, but you do need to kill anything that has inline javascript and any external links.
Optionally, flatten that into plain text (that is, go into the tree and open all elements; block elements create a new paragraph).
Guess the header. It's usually something with h1, h2 or at least big font size, but you can simplify life by assuming that it somehow resembles the page title.
Finally, find the authors (something with names and email), the copyright notice (try metadata or the word copyright) and the site name. Assemble these somewhere together with the the link to original and state clearly it's probably fair use (or whatever legal doctrine you feel like applies to you.)
There is an almost perfect tool for this job, Boilerpipe.
In fact it has its own tag here though it's little used, boilerpipe. Here's the description right from the tag wiki:
The boilerpipe library for Java provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The source is all there in the project if you just want to learn the algorithms and techniques, but in fact somebody has already ported it to C# which is quite possibly perfect for your needs: NBoilerpipe.
BTE (Body Text Extraction) is a Python module that finds the portion of a document with the highest ratio of text to tags on a page.
http://www.aidanf.net/archive/software/bte-body-text-extraction
It's a nice, simple way of getting real text out of a website.
Here's my a (probably naive) plan of how to approach this:
Assuming the RSS feed contains the opening words of the article, you could use these to locate the start of the article in the DOM. Walk back up the DOM a little (first parent DIV? first non-inline container element?) and snip. That should be the article.
Assuming you can get the document as a XML (HtmlAgilityPack can help here), you could (for instance) grab all descendant text from <p> elements with the following Linq2Xml:
document
.Descendants(XName.Get("p", "http://www.w3.org/1999/xhtml"))
.Select(
p=>p
.DescendantNodes()
.Where(n => n.NodeType == XmlNodeType.Text)
.Select(t=>t.ToString())
)
.Where(c=>c.Any())
.Select(c=>c.Aggregate((a,b)=>a+b))
.Aggregate((a,b)=>a+"\r\n\r\n"+b);
We successfully used this formula for scraping, but it seems like the terrain you have to cross is considerably more inhospitable.
Obviously not a whole solution, but instead of trying to find the relevant content, it might be easier to disqualify non-relevant content. You could classify certain types of noises and work on coming up with smaller solutions that eliminate them. You could have advertisement filters, navigation filters, etc.
I think that the larger question is do you need to have one solution work on a wide range of content, or are you willing to create a framework that you can extend and implement on a site by site basis? On top of that, how often are you expecting change to the underlying data sources (i.e. volatility)?
You might want to look at Latent Dirichlet Allocation which is an IR technique to generate topics from text data that you have. This should help you reduce noise and get some precise information on what the page is about.