Simple solution to create tree-like structure in html - c#

I am trying to create a tree-like structure for a webpage. In my case a user has a set of projects, each project has a set of tasks, tasks can have sub-tasks, and sub-tasks can have their own sub-tasks. So, basically I am looking for an elegant solution to display the tree-like structure using html elements in an aesthetically pleasing manor. Any tips or ideas would be appreciated.

a good place to start would be to make some psudo-code. most people on this website wont give you an entire solution. i have almost no knowledge of html but in all other programming languages the first place to start would be psudo code. also this sounds like you need to make some objects and structure your code around them.

I would leave this as a comment but I'm lacking a couple of rep points. You may want to focus more on the javascript part of building the tree. The HTML is the easy part. Do some searching on jQuery treeview control and see what you come back with. Here's an example:
http://www.jstree.com/

Follow the link -
here
Very good explanation. You can customize according to your need.

Related

Loading content from DB to aspx page based on URL

I'm currently trying to create a small scale CMS for my personal website and thought I'd like to try to make some sort of a page layout from a basic aspx file with some placeholders and load content based on the URL, without the use of url query strings such as ?pageid=1.
I'm trying to wrap my head around how this can be achieved without getting errors of a physical file not existing when I e.g. type in http://mywebsite.com/projects/w8apps/clock.
I've read a lot about BLOB and storing files binarily in the database. But I haven't come across a blog which points in the direction of using a so called page layout and loading content based on the URL instead of a query string.
I'm not asking for a solution, just some hints - blogs mostly - which can point me in the right direction and help me achieve this goal.
To deal with loading a page with a URL that is more friendly, rather than ?page_id=1, you may want to have a look at this article about URL Rewriting and URL Mapping.
http://www.codeproject.com/Articles/18318/URL-Mapping-URL-Rewriting-Search-Engine-Friendly-U
Hope you can find a way of fitting this kind of code into your application!
You questions is too broad but here are couple hints that will point you in the right direction.
Create clear specs before you start working on this. Do you really need to have URLs like this http://mywebsite.com/projects/w8apps/clock ? If yes then check out MVC since it has best support for this
Storing binary files in database doesn’t have much to do with this. You first need to think of how your tables will look like and that is based on what are you trying to achieve…
I’d suggest you install some CRM that if open source and analyze this first. You’ll probably find a lot better ideas this way. Just go to CodePlex and search for CMS.

How can I make an application in c# collect data from a website?

First of all, I hope my question doesn't bother you. I really need to get and idea of how I can accomplish that, but unfortunatelly, I'm really a beginner, I'm crawling when it comes to programming. I'm struggling to learn it the best way I can. I'll thank you for any help you give me.
Here's the task: I was ordered to find a way to collect some data from a website using a c# application. This will be done everyday, in order to update the data which we'll use to calculate some financial index.
I know my question might sound vague, anyway, even telling me how I can be more precise will help me. I know I seem to know desperate, but putting appart all the personell issues, my scholarship kind of depends on it.
Thanks in advance! (Please, don't mind the bad English, I'm brasilian and my English might not be that good yet.)
First, your English is fine. In fact, I thought you were a native speaker until you said otherwise.
The term you're looking for is 'site scraping'. Observe this question: Options for HTML scraping?. The second answer points to an HTML agility pack library you can use.
Now, there are two possibilities here. The first is you have to parse the HTML and scrape your data out of it. This is more computationally intensive and depends on the layout of the page. If they change the way the site looks, it could break the scraper.
The second possibility is they provide some XML or JSON web service you can consume. In this case you aren't scraping anything, but are rather using a true data feed. If the layout of the site changes, you will not break. Whether your target site supports this form of data feed is up to the site.
If I understand your question, you're being asked to do some Web Scraping, where you 1) download the contents of a web page and 2) try to parse data from that content.
For step #1, you should look into using a WebClient object in C# to download the HTML from the web page. You can give a WebClient object the URL you want to download the content from and obtain a String containing the content (probably HTML) of the URL.
How you go about doing step #2 depends on what content is present at the web site. If you know of certain patterns you're looking for in the HTML, you can search the HTML string using various methods. A more general solution for parsing HTML data can be found through using the Html Agility Pack, which will let you handle the HTML as a tree structure (DOM).
Use the WebClient class to get the page.
Turn the html into xml.
Use XPath to select the data you are interested in.
Ok, this is a pretty straightforward app design, and a lot of the code exists that you can reuse. Since you're a beginner, I'll break down into steps of what you need to do and recommend approaches.
1) You will use classes from System.Net to pull the web pages (WebClient being the easiest to usse). You will want to have this part of the program run on a timer if you can (using the scheduled jobs feature of the OS) and have it just pull the pages and drop them in a folder.
2) You have a second job which will run separately, pulling unread files from that folder, parsing them (using the HtmlAgility pack library is best) and then storing them in an index of some kind (Lucene is best for that)
3) You have a front end application of some sort (web or desktop) which queries that index for the information you're looking for.

Detecting hyperlinks in WPF RichTextBox

Hey folks, I'm wanting to write some rudimentary support for detecting hyperlinks in a WPF RichTextBox control. My plan is to use a regex to identify any links and then manually replace them with real hyperlink objects.
However the part I am having trouble with is getting the correct textpointers, etc. once I find a link. For example, I can flatten the entire document to a text string and find links, but once I do that how can I get the proper pointer to the block that needs url-ifying?
Perhaps a better approach would be to iterate over blocks in the document, assuming a url would not span multiple blocks, however even then I have very little experience working with the RichTextBox/FlowDocument object model so any pointers (pun intended) would be helpful. Thanks!
I think you might find this useful:
http://blogs.msdn.com/b/prajakta/archive/2006/10/17/autp-detecting-hyperlinks-in-richtextbox-part-i.aspx

Algorithm for reading the actual content of news articles and ignoring "noise" on the page?

I'm looking for an algorithm (or some other technique) to read the actual content of news articles on websites and ignore anything else on the page. In a nutshell, I'm reading an RSS feed programatically from Google News. I'm interested in scraping the actual content of the underlying articles. On my first attempt I have the URLs from the RSS feed and I simply follow them and scrape the HTML from that page. This very clearly resulted in a lot of "noise", whether it be HTML tags, headers, navigation, etc. Basically all the information that is unrelated to the actual content of the article.
Now, I understand this is an extremely difficult problem to solve, it would theoretically involve writing a parser for every website out there. What I'm interested in is an algorithm (I'd even settle for an idea) on how to maximize the actual content that I see when I download the article and minimize the amount of noise.
A couple of additional notes:
Scraping the HTML is simply the first attempt I tried. I'm not sold that this is the best way to do things.
I don't want to write a parser for every website I come across, I need the unpredictability of accepting whatever Google provides through the RSS feed.
I know whatever algorithm I end up with is not going to be perfect, but I'm interested in a best possible solution.
Any ideas?
As long as you've accepted that fact that whatever you try is going to be very sketchy based on your requirements, I'd recommend you look into Bayesian filtering. This technique has proven to be very effective in filtering spam out of email.
When reading news outside of my RSS reader, I often use Readability to filter out everything but the meat of the article. It is Javascript-based so the technique would not directly apply to your problem, but the algorithm has a high success rate in my experience and is worth a look. Hope this helps.
Take a look at templatemaker (Google code homepage). The basic idea is that you request a few different pages from the same site, then mark down what elements are common across the set of pages. From there you can figure out where the dynamic content is.
Try running diff on two pages from the same site to get an idea of how it works. The parts of the page that are different are the places where there is dynamic (interesting) content.
Here's what I would do after I checked the robots.txt file to make sure it's fine to scrap the article and parsed the document as an XML tree:
Make sure the article is not broken into many pages. If it is, 'print view', 'single page' or 'mobile view' links may help to bring it to single page. Of course, don't bother if you only want the beginning of the article.
Find the main content frame. To do that, I would count the amount of information in every tag. Now, what we're looking is a node that is big but consists of many small subnodes.
Now I would try to filter out any noise inside the content frame. Well, the websites I read don't put any crap there, only useful images, but you do need to kill anything that has inline javascript and any external links.
Optionally, flatten that into plain text (that is, go into the tree and open all elements; block elements create a new paragraph).
Guess the header. It's usually something with h1, h2 or at least big font size, but you can simplify life by assuming that it somehow resembles the page title.
Finally, find the authors (something with names and email), the copyright notice (try metadata or the word copyright) and the site name. Assemble these somewhere together with the the link to original and state clearly it's probably fair use (or whatever legal doctrine you feel like applies to you.)
There is an almost perfect tool for this job, Boilerpipe.
In fact it has its own tag here though it's little used, boilerpipe. Here's the description right from the tag wiki:
The boilerpipe library for Java provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The source is all there in the project if you just want to learn the algorithms and techniques, but in fact somebody has already ported it to C# which is quite possibly perfect for your needs: NBoilerpipe.
BTE (Body Text Extraction) is a Python module that finds the portion of a document with the highest ratio of text to tags on a page.
http://www.aidanf.net/archive/software/bte-body-text-extraction
It's a nice, simple way of getting real text out of a website.
Here's my a (probably naive) plan of how to approach this:
Assuming the RSS feed contains the opening words of the article, you could use these to locate the start of the article in the DOM. Walk back up the DOM a little (first parent DIV? first non-inline container element?) and snip. That should be the article.
Assuming you can get the document as a XML (HtmlAgilityPack can help here), you could (for instance) grab all descendant text from <p> elements with the following Linq2Xml:
document
.Descendants(XName.Get("p", "http://www.w3.org/1999/xhtml"))
.Select(
p=>p
.DescendantNodes()
.Where(n => n.NodeType == XmlNodeType.Text)
.Select(t=>t.ToString())
)
.Where(c=>c.Any())
.Select(c=>c.Aggregate((a,b)=>a+b))
.Aggregate((a,b)=>a+"\r\n\r\n"+b);
We successfully used this formula for scraping, but it seems like the terrain you have to cross is considerably more inhospitable.
Obviously not a whole solution, but instead of trying to find the relevant content, it might be easier to disqualify non-relevant content. You could classify certain types of noises and work on coming up with smaller solutions that eliminate them. You could have advertisement filters, navigation filters, etc.
I think that the larger question is do you need to have one solution work on a wide range of content, or are you willing to create a framework that you can extend and implement on a site by site basis? On top of that, how often are you expecting change to the underlying data sources (i.e. volatility)?
You might want to look at Latent Dirichlet Allocation which is an IR technique to generate topics from text data that you have. This should help you reduce noise and get some precise information on what the page is about.

Multiline ddl Custom Control

One of the guys I work with needs a custom control that would work like a multiline ddl since such a thing does not exist as far as we have been able to discover
does anyone have any ideas or have created such a thing before
we have a couple ideas but they involve to much database usage
We prefer that it be FREE!!!
We use a custom modified version of suckerfish at work. DB performance isn't an issue for us because we cache the control.
The control renders out nested UL/LIs either for all nodes in the web.sitemap or for a certain set of pages pulled from the DB. We then use jQuery to do all the cool javascript stuff. Because it uses such basic HTML, it's pretty easy to have multi-line or wrapped long items once you style it with CSS.
Have a look at EasyListBox. I used on a project and while a bit quirky at first, got the job done.
I'm not sure exactly what you mean by multi-line, but if it is selecting multiple elements in a drop down list, see this demo.
If its showing elements that wrap mulitple lines in a drop down, see this demo. You can put a break in the HTML to achieve what you might be looking for. I've used this control in this manner before, so I can confirm it works.
Good luck.

Categories