C# - Show the differences when comparing strings - c#

In my asp.net project, I have two strings (actually, they are stored in a Session object, then i do a .ToString() )
This project is part of my free Japanese language exercises on my website (Italian only for now, so i won't link/spam)
For now i do an if (original == inputted.ToLower()) , but I would like to compare the strings and highlight the differences on the screen
like this:
original: hiroyashi
wrote by user: hiroyoshi
i was thinking to compare the two strings and save the differences in another variable, with HTML tags, and then show it on a Literal control... but... if the differences are many, or the input is shorter... how to do that?
It looks there is the needing of an huge amount of coding... or not?

I seem to remember someone asking this not too long ago, and essentially they were pointed at difference engines.
A quick search on codeplex brings up:
http://www.codeplex.com/site/search?projectSearchText=diff
May be worth a hunt through some of those that come up - you may be able to plug something into your existing code?
Cheers,
Terry

John Resig wrote a javascript diff algorithm, but he's removed the page explaining what it does from his site. It's still available through the google cache though. Apologies if linking that is bad John. It should do what you want, someone else took it, tweaked it and put an article up about it here - complete with a test page

I am not sure if this would be helpful, but this is a way I would do:
I would use a hashmap, and store all words seperate by space there.
Then using that I would map with the original.
You can add html tags or whatever if they are different.
There is bound to be a performance issue here on a large dictionary of words
The coding itself would not be long though.

Related

Any tips on how i would go about extract Pandora likes and putting them on a spreadsheet? (C++/C#)

Fairly new to coding and i want a project to work on that could help me advance my skills. I'm not sure what language would be best for this sort of undertaking but i would definitely prefer to use C++ or C#.
For the first part of the program i basically would like to try and take all my pandora likes and put them on a spreadsheet with song name is one column and artist in the other. I don't see the formatting being too hard once i actually get the data i need, but i'm not really sure how to communicate with a server at all in this point in time. I'm guessing i probably won't be able to grab a raw list of likes so the i'm thinking my best course of action will be to first expand the likes list all the way, and then i need to read the text on the screen ro in the source code.
For the first step, expanding my like i found the HTML source code that actually does this:
<div class="show_more tracklike" data-nextLikeStartIndex="0" data-nextThumbStartIndex="5">Show more</div>"
Not sure if this is something i can work with but i was thinking if i could set data-nextThumbStartIndex="5" to be equal to the # of likes - 5 (the amount it shows by default) it would be fairly easy to expand the list. If not i would probably have to click the "show more" link repeatedly until i have all the likes on the page.
For the next step, getting the data i want, i think my best option would be to basically just grab the text that i physically see on the screen and worry about filtering and manipulating the data afterwards. The other option is looking at the source code, which i actually found the pieces of code where the info i want is stored. If i could retrieve the page's source code i think it would be relatively easy to pick out the data i actually want from that.
So yea that's about it, i know i'm pretty noob atm and what i'm saying is probably wrong and/or much more complicated than i think but i'm a pretty quick learner and at the very least if someone could point me in the right direction to communicate with a server that would be much appreciated.
This question is quite "wide" (and I have absolutely no knowledge of Pandora itself - can't access it from where I live).
In general, there are several different ways to solve this type of problem:
Screen Scraping - basically access the website as if you were a web-server, and from the HTML string that comes back, dig out the information you need. The problem here is that the data is not very suitable for "machine reading", as it often has no distinct points for the "reader" to find the relevant information, and it's difficult to sort the data from the "chaff".
AJAX api - "Asynchronous Java Script and XML" where the provider of the website has an interface to fetch certain data within to the web-browser - of course, if you "pretend" to be the web-browser, requesting the same type of information. You are relying on the website to have such an interface, but if it exists, the data is generally in a "more suitable form to be machine read" (typically XML, but not always).
JSON api - "Java Script Object Notation" is a similar solution to AJAX - like XML, JSON is a "human and machine readable format".
The latter two are definitely preferable, as the data coming back is meant for machine reading. The drawback is that you need to have "server side cooperation". The good thing here is that Pandora does have a JSON API. The bad thing is that it seems to be hard to use... Here's one discussion on the subject:
Making JSON calls to Unoffical Pandora API
The main principle here is that you send some stuff to the webserver, and receive a reply with the requested information. Exactly how this is done depends on the language/programming environment. A popular C++ solution is libcurl.
There is a Ruby Client here, using the JSON interface
https://github.com/nixme/pandora_client
A C# implementation to interface with Pandora is here:
http://pandoraunleashed.googlecode.com/svn/trunk/PandoraUnleashed/Pandora.cs
Unfortunately, I can't find any direct reference to "listing likes".

Parse numbers from large text, possibly without regex (performance critical)

I'm extremely familiar with regex before you all start answering with variations of: /d+
I want to know if there are alternatives to regex for parsing numbers out of a large text file.
I'm parsing through tons of huge files and need to do some group/location analysis on the positions of keywords. I'm now at the point where i need to start finding groups of numbers as well nested closely to my content of interest. I want to avoid regex if at all possible because this needs to be a speedy process.
It is possible to take chunks of a file to inspect for the numbers of interest. That however would require more work and add hard coded limits for searching. (i'd like to avoid this)
I'm open to any suggestions.
UPDATE
Sorry for the lack of sample data. For HIPAA reasons I'd rather not even consider scrambling the text and posting it.
A great substitute would be the HTML source of any stackoverflow.com question page. Imagine I needed to grab the reputation (score) of all people that posted an answer to a question. This also means that the comma (,) is needed as well. I can't remove the html to simplify the content because I'm using some density analysis to weed out unrelated content. Removing the HTML would mix content too close together.
Unless the file is some sort of SGML, then I don't know of any method (which is not to say there isn't, I just don't know of one)
However, it's not to say that you can't create your own parser; you could eliminate some of the overheads of the .Net regex library by writing something that only finds ranges of numbers.
Fundamentally, I guess that that's all any library would do, at the most basic level.
Might help if you can post a sample of the sort of data you'll be processing?

Sum 2 numbers in a text field

I am using Telerik controls in an ASP.Net application for invoice entry.
I'm looking for the ability to enter multiple amounts in one numeric field and add them together, just like in Quickbooks.
Keystrokes:
=0.12+3.45TAB or ENTER
adds values to 3.57 and jumps to the next field.
Anyone has any ideas?
You have three choices here:
Write your own method to parse the formula (more complicated than it sounds, even if you think it sounds complicated).
Use an pre-written math parser. This is a decent one.
Use eval().
NOTE: Should you choose option #3, exercise extreme caution and do your research first. eval() can leave your application open to all sorts of nasty code injection.
You'll need to build a lexical parser for in-fix (rather than prefix) notation. There should be plenty of examples around - it's a pretty standard concept used on a lot of degree courses.
As to whether it's a good idea to do this (I assume client-side)...

How to detect abusive comments on my website?

I have my website where I have given users the opportunity to share their status. How can I detect that if any abusive or slang words are used so as to block such comments?
Is there any library or trick to detect such kind of comments in .NET?
It is not a trick; use a dictionary of bad words, and add some logic to detect "bad words" in good places. Add the ability for users to post complains about mis-correction of your logic (so you can fine tune it) and that's it.
Implementation is pretty easy, and a dictionary of "bad words" - either look it up, or write one your own.
(I used to collect bad words from customer complains on a chat service - after a year it was almost bulletproof.)
This is actually quite difficult to automate and do accurately without unintended side effects. You can maintain a dictionary of bad words, and use regular expressions to replace occurrences of those bad words. Please see my answer to the following question for example code, plus some of the issues:
Replace Bad words using Regex
Automated approaches have a number of shortcommings: false positives, missing bad words that are not in the dictionary, and minor variations of bad words that are not detected. Involvement from users can be used to bolster or as an alternative approach e.g SO has the abiliy to flag comments and moderators can delete or censor them.
There are some bad word lists around which you can download and use.
eg. http://urbanoalvarez.es/blog/2008/04/04/bad-words-list/
The best thing to do is to start with a small list and add to it based on the real comments made on your site. You can put a report link on the comments so other visitors can notify you if there are bad comments made.

Filter out common words for search query

Are there any easy ways to implement filtering a user's input (possibly a question) by extracting the meaningful data in the query?
I basically want to filter out any noise words so I can send a 'clean' query to Google's search api.
Um, won't Google do this for you? Send all those dirty, filthy words to Google and let them clean them up for you.
Jeff talked about "stop words" in one of the previous stackoverflow podcasts. You might try searching for that phrase on google. The wikipedia page seems to have some overview and pointers to options.
http://en.wikipedia.org/wiki/Stop_words
You can try removing the top X most common English words, but you will always run into trouble with a naive approach like this.
This is because common English words can have special significance in the realm of Computer Science (or other areas). A recent SO podcast (#32) mentions this very issue.
I used the stop words approach when implementing a basic search engine and it worked fine.
Try a sample list like the one here
Based on feedback from your users, you can modify your stop word list accordingly.

Categories