I am trying to detect is a visitor is human or not. I just got an idea but not sure if this will work or not. But if I can store a cookie on the persons browser and retrieve it when they are browsing my site. If I successfully retrieve the cookie can this be a good technique to detect bots and spiders?
A well-designed bot or spider can certainly store -- and send you back -- whatever cookies you're sending. So, no, this technique won't help one bit.
Browsers are just code. Bots are just code. Code can do anything you program it too. That includes cookies.
Bots, spammers and the like work on the principle of low-hanging fruit. They're after as many sites or users as they can get with as little effort as possible. Thus they go after popular packages like phpBB and vBulletin because getting into those will get them into a lot of sites.
By the same token, they won't spend a lot of effort to get into your site if the effort is only for your site (unless your site happens to be Facebook or the like). So the best defense against malicious activity of this kind of simply to be different in such a way that an automatic script already written won't work on your site.
But an "I am human" cookie isn't the answer.
No, as Alex says this won't work; the typical process is to use a robots.txt to get them to behave. Further to that, you start to investigate the user-agent string (but this can be spoofed). Any more work than this and you're into CAPTCHA territory.
What are you actually trying to avoid?
You should take a look at the information in the actual http headers and how .Net exposes these things to you. The extra information you have about the person hitting your website is there. Take a look at what Firefox is doing by downloading Live Http Headers plugin and go to your own site. Basically, at a page level, the Request.Headers property exposes this information. I don't know if it's the same in asp.net mvc though. So, the important header for what you want is the User-Agent. This can be altered, obviously, but the major crawlers will let you know who they are by sending a unique UserAgent that identifies them. Same thing with the major browsers.
I wrote a bot that works with cookies and javascript. The easiest way of bot/spam prevention is use Nobot component in Ajax Control Toolkit.
http://www.asp.net/AJAX/AjaxControlToolkit/Samples/NoBot/NoBot.aspx
Related
I would like to integrate google analytics experiments to our website.
My situation: We have a solution for feature toggles, that also allows A/B testing. The features are stored in the database and have a percentage that defines how many users will see the feature. We also store the features in a cookie, so the users will see the same view when he refreshs the page.
Now I want to use server the javascript API to track the state of the feature (Google Experiments Documentation). In my understanding I have to send a request to google whenever an experiment is used and I also have to tell google the ID of the alternative. It must happen on the right page to correlate the experiment with the right page view. The problem are ajax requests which might check the split tested features. In this case it is hard to say which features are used for the current page.
I see 3 options:
Track all experiments, even if they are not used on this site. (I think it does not make a lot of sense)
Make a tool/configuration to define the experiments and feature toggles that are used on each site (very easy to make mistakes here).
Track the experiments from server-side code (but I don't know how GA will connect these calls to a page view).
What is the official guideline for ajax requests and google analytics experiments?
I would like to do as follows. What would be the best way? A general answer will also be fine.
I would like to intercept an HTTP request at the client end to alter some html content. For example, I go to CNN.com and rather than an article that's displaying "Two LA Dogs marry", it should say "Ridiculous title blocked".
It should be smooth that even a secure certificate won't be disturbed.
I am using C#.
Thanks!
UPDATE: Thank you all for your answers!
You can do this with Privoxy via their filter files. Their fun filter is a good example of exactly the sort of substitutions you want to do.
To replace "Two LA Dogs Marry" with "Ridiculous Title Blocked" on cnn.com your action file would look something like this:
{ +filter{ridiculous-title-censor} }
.cnn.com
and your filter file would look like
# FILTER: ridiculous-title-censor Remove ridiculous titles
# This keeps CNN from getting too ridiculous
#
s/Two LA Dogs Marry/Ridiculous Title Blocked/ig
Local HTTP proxy is possible and most generic approach.
I.e. you can use Fiddler to see if it works for you. I supports modifying requests/responses in addition to regular watching for traffic.
Another option if you're using Firefox is to use Greasemonkey scripts. Here's an example script which changes the main headline on cnn.com
If you're not familar with the Javascript coding needed to make Greasemoney scripts, you can use the Platypus add-on to edit the page in place and automatically generate a script file.
You could setup a proxy with HTTPListener. But I would think if you wanted to do it right, you'll need a program that is more low level.
Open 2 TCP Ports (80 & 443) and actively listen for incoming connections.
Once received
Go out and make the request on behalf of the requester
Retrieve HTTP Response
Inspect and change the HTTP Response (where appropriate)
Perhaps modifying headers (where appropriate)
Forward on the response to the requester
I'd start with a simple proxy that just forwards all requests and returns back all responses. Once that is in place you can start inspecting the responses.
That is a good place to start.
Such an approach is the least efficient method of doing what you want to achieve.
If this is a client side application, the client may disable it and thus render it useless.
It is also hard to maintain and requires more complex programming to ensure that it works with SSL.
If using a browser plugin, or toolbar, it would need to be made for a specific browser.
If using a listening server to intercept the HTTP request, this provides complexity and difficulty when the content is encrypted, also unnecessary overhead.
If using a local proxy (meaning that the client's browser needs to point to a local proxy service), maybe the most effective client side method, but still have the disadvantages mentioned above (hard to maintain etc.)
I believe that what you are looking to do is completely reinventing the wheel.
The fact that you have offered a bounty begs the question that you indeed need to do this in C# and client side, but 'censoring bad things' means you need to prohibit content, and any client side method would eventually give the power to the client to remove this limitation.
Personally, I have had great success with Squid and it's content adaptation features.
This means, that the clients need to have a controlled Internet source. Meaning that, if they are all in a LAN and sharing a common Internet gateway, this is easily feasible if you have spare a server to act as a proxy.
I recommend you get a small linux box, which can have a simple Ubuntu Server Edition, then add Squid. The net is full of tutorials, but the level of implementation has become easy enough to do even without them.
I may have gone completely off-topic, but I hope I could assist.
you can come to China ^_^
censorship like this is everywhere, you don't have to implement your own.
Ok, that is a joke, the answer is you can implement browser plugins for such kind of tasks. or maybe you need to implement a ROUTING filter ( GFW like ) on the router.
Taken from here.
It can be done via a Layered Service Provider on Windows.
From Wikipedia:
"A Layered Service Provider (LSP) is a feature of the Microsoft Windows Winsock 2 Service Provider Interface (SPI). A Layered Service Provider is a DLL that uses Winsock APIs to insert itself into the TCP/IP stack. Once in the stack, a Layered Service Provider can intercept and modify inbound and outbound Internet traffic. It allows processing all the TCP/IP traffic taking place between the Internet and the applications that are accessing the Internet (such as a web browser, the email client, etc). "
AdMuncher for example is intercepting and inserting http code to avoid ads. Another suggestion is to find an open-source ad blocking program and see how they've implemented it.
Are you saying you want to intercept this for your own site?
In any case, it needs to be done in javascript or jQuery as C# is not a client-side language.
or, you can code a toolbar, or maybe a simple chrome addon, it's really easy but its not C#
you can search for libraries to monitor browsing trough proxy, like this:
http://httpproxynet.codeplex.com/
The same concept used by java in this project:
http://www.charlesproxy.com/
sounds intresting, good luck :)
A long time ago I've implemented this feature for IE using Plugable Mime Filters so after I've searched about it in c# here in stackoverflow I've founded this post that should help you start with it.
Hope this is useful for you.
Asynchronous Pluggable Protocols can be used for this type of thing. Although, as stated here INFO: Implementing HTTP-like Asynchronous Pluggable Protocols: "For various reasons, Microsoft neither supports nor recommends that you replace or wrap the default HTTP protocol."
Could you please refer me to an example of an Asp.Net application that would allow the owner to collect a fee for viewing certain pages content?
I have no idea what to start with. Besides the technical aspect of this question, I don't know where one would get a server to install such an application. Can any computer work as such?
Thanks, and sorry if my question is too naive.
You need to look for payment gateways. They will provide you with an API to call that can process the payments. An example would rely on the payment gateway that you are using as all of them are different.
I'd have a look at Stripe it is quite developer friendly (you won't have to trawl through loads of docs to find out what you want to do) and the rates arn't too bad. They are similar to sagepay (much harder to figure out, out of date)
I'm considering writing my own tool for tracking visitors/sales as Google Analytics and others are just not comprehensive enough in the data dept. They have nice GUIs but if you have SQL skills those GUIs are unnecessary.
I'm wondering what the best approach is to do this.
I could simply just log the IP, etc to a text file and then have an async service run in the background to dump it into the DB. Or, maybe that's overkill and I can just put it straight in the DB. But one DB WRITE per web request seems like a poor choice where scalability is concerned. Thoughts?
As a sidenote, it is possible to capture the referring URL or any incoming traffic, right? So if they came from a forum post or something, you can track that actual URL, is that right?
It just seems that this is a very standard requirement and I don't want to go reinventing the wheel.
As always, thanks for the insight SOF.
The answer to this question mentions the open-source GAnalytics alternative Piwik - it's not C# but you might get some ideas looking at the implementation.
For a .NET solution I would recommend reading Matt Berseth's Visit/PageView Analysis Services Cube blog posts (and earlier and example and another example, since they aren't easy to find on his site).
I'm not sure if he ever posted the server-side code (although you will find his openurchin.js linked in his html), but you will find most of the concepts explained. You could probably get something working pretty quickly by following his instructions.
I don't think you'd want to write to a text file - locking issues might arise; I'd go for INSERTs into a database table. If the table grows too big you can always 'roll up' the results periodically and purge old records.
As for the REFERER Url, you can definitely grab that info from the HTTP HEADERS (assuming it has been sent by the client and not stripped off by proxies or strict AV s/w settings).
BTW, keep in mind that Google Analytics adds a lot of value to stats - it geocodes IP addresses to show results by location (country/city) and also by ISP/IP owner. Their javascript does Flash detection and segments the User-Agent into useful 'browser catagories', and also detects other user-settings like operating system and screen resolution. That's some non-trivial coding that you will have to do if you want to achieve the same level of reporting - not to mention the data and calculations to get entry & exit page info, returning visits, unique visitors, returning visitors, time spent on site, etc.
There is a Google Analytics API that you might want to check out, too.
Have you looked at Log Parser to parse the IIS logs?
I wouldn't have though writing to a text file would be more efficient than writing to a database - quite the opposite, in fact. You would have to lock the text file while writing, to avoid concurrency problems, and this would probably have more of an impact than writing to a database (which is designed for exactly that kind of scenario).
I'd also be wary of re-inventing the wheel. I'm not at all clear what you think a bespoke hits logger could do better than Google Analytics, which is extremely comprehensive. Believe me, I've been down the road and written my own, and Analytics made it quite redundant.
I have a asp.net 3.5 application hosted on IIS 7.0. I'm looking for a comprehensive system to monitor traffic, down to page level minimum. Does .net have any specific tools or is it better to write my own, or what systems/software is freely available to use
Thanks
Use Google Analytics. Its a small piece of Javascript code that is inserted before the tag. Its based on Urchin analytics tracking software which Google bought. They've been doing this for a long long time.
As long as your site is referenced using a fully qualified domain name, Google Analytics can track what you need. It's got lots of flexibility with the filter mechanism as well (let's you rewrite URLs based on query string parameters, etc.)
LOTS of functionality and well thought out as well as a pretty good API if you need to do tracking on things other than clicks.
If you have access to the IIS logs, you can use a log analyzer to interpret the data. An example is the free AWStats analyzer:
http://awstats.sourceforge.net/
An alternative (and one I recommend) is Google Analytics (http://www.google.com/analytics). This relies on you embedding a small chunk of Javascript in each page you want tracking, then Google does the grunt work for you, presenting the results in an attractive Flash-rich site.
I'd suggest trying both and seeing which suits your needs. I'd definitely recommend against rolling your own system, as the above solutions are very mature and capable. Best of luck!
You'll need a client-side / javascript tracking service (such as Google Analytics but there are other good free alternatives out there) because it runs even when the user clicks the back button and the previous page (on your site) is loaded from the browser cache and not the server. The IIS won't "see" the reload since no request is made to it.