Intercept an HTTP request at browser end to alter some html content - c#

I would like to do as follows. What would be the best way? A general answer will also be fine.
I would like to intercept an HTTP request at the client end to alter some html content. For example, I go to CNN.com and rather than an article that's displaying "Two LA Dogs marry", it should say "Ridiculous title blocked".
It should be smooth that even a secure certificate won't be disturbed.
I am using C#.
Thanks!
UPDATE: Thank you all for your answers!

You can do this with Privoxy via their filter files. Their fun filter is a good example of exactly the sort of substitutions you want to do.
To replace "Two LA Dogs Marry" with "Ridiculous Title Blocked" on cnn.com your action file would look something like this:
{ +filter{ridiculous-title-censor} }
.cnn.com
and your filter file would look like
# FILTER: ridiculous-title-censor Remove ridiculous titles
# This keeps CNN from getting too ridiculous
#
s/Two LA Dogs Marry/Ridiculous Title Blocked/ig

Local HTTP proxy is possible and most generic approach.
I.e. you can use Fiddler to see if it works for you. I supports modifying requests/responses in addition to regular watching for traffic.

Another option if you're using Firefox is to use Greasemonkey scripts. Here's an example script which changes the main headline on cnn.com
If you're not familar with the Javascript coding needed to make Greasemoney scripts, you can use the Platypus add-on to edit the page in place and automatically generate a script file.

You could setup a proxy with HTTPListener. But I would think if you wanted to do it right, you'll need a program that is more low level.
Open 2 TCP Ports (80 & 443) and actively listen for incoming connections.
Once received
Go out and make the request on behalf of the requester
Retrieve HTTP Response
Inspect and change the HTTP Response (where appropriate)
Perhaps modifying headers (where appropriate)
Forward on the response to the requester
I'd start with a simple proxy that just forwards all requests and returns back all responses. Once that is in place you can start inspecting the responses.
That is a good place to start.

Such an approach is the least efficient method of doing what you want to achieve.
If this is a client side application, the client may disable it and thus render it useless.
It is also hard to maintain and requires more complex programming to ensure that it works with SSL.
If using a browser plugin, or toolbar, it would need to be made for a specific browser.
If using a listening server to intercept the HTTP request, this provides complexity and difficulty when the content is encrypted, also unnecessary overhead.
If using a local proxy (meaning that the client's browser needs to point to a local proxy service), maybe the most effective client side method, but still have the disadvantages mentioned above (hard to maintain etc.)
I believe that what you are looking to do is completely reinventing the wheel.
The fact that you have offered a bounty begs the question that you indeed need to do this in C# and client side, but 'censoring bad things' means you need to prohibit content, and any client side method would eventually give the power to the client to remove this limitation.
Personally, I have had great success with Squid and it's content adaptation features.
This means, that the clients need to have a controlled Internet source. Meaning that, if they are all in a LAN and sharing a common Internet gateway, this is easily feasible if you have spare a server to act as a proxy.
I recommend you get a small linux box, which can have a simple Ubuntu Server Edition, then add Squid. The net is full of tutorials, but the level of implementation has become easy enough to do even without them.
I may have gone completely off-topic, but I hope I could assist.

you can come to China ^_^
censorship like this is everywhere, you don't have to implement your own.
Ok, that is a joke, the answer is you can implement browser plugins for such kind of tasks. or maybe you need to implement a ROUTING filter ( GFW like ) on the router.

Taken from here.
It can be done via a Layered Service Provider on Windows.
From Wikipedia:
"A Layered Service Provider (LSP) is a feature of the Microsoft Windows Winsock 2 Service Provider Interface (SPI). A Layered Service Provider is a DLL that uses Winsock APIs to insert itself into the TCP/IP stack. Once in the stack, a Layered Service Provider can intercept and modify inbound and outbound Internet traffic. It allows processing all the TCP/IP traffic taking place between the Internet and the applications that are accessing the Internet (such as a web browser, the email client, etc). "
AdMuncher for example is intercepting and inserting http code to avoid ads. Another suggestion is to find an open-source ad blocking program and see how they've implemented it.

Are you saying you want to intercept this for your own site?
In any case, it needs to be done in javascript or jQuery as C# is not a client-side language.

or, you can code a toolbar, or maybe a simple chrome addon, it's really easy but its not C#
you can search for libraries to monitor browsing trough proxy, like this:
http://httpproxynet.codeplex.com/
The same concept used by java in this project:
http://www.charlesproxy.com/
sounds intresting, good luck :)

A long time ago I've implemented this feature for IE using Plugable Mime Filters so after I've searched about it in c# here in stackoverflow I've founded this post that should help you start with it.
Hope this is useful for you.

Asynchronous Pluggable Protocols can be used for this type of thing. Although, as stated here INFO: Implementing HTTP-like Asynchronous Pluggable Protocols: "For various reasons, Microsoft neither supports nor recommends that you replace or wrap the default HTTP protocol."

Related

I'm making a JavaScript extension for a site. I'd like to use Java or C# instead. Is it possible?

This is hard for me to explain and even harder for me to visualize how I'd do it, since I don't know the bounds of communication with websites with Java and C#, so if I banter a lot/make no sense in the process of describing this, I apologize.
Basically, I'm making a 'bot' for www.plug.dj. This bot is able to do things on command like kick users, ban users, send chat messages, delete chat messages, say random things, etc. As of right now, it's powered by a simple one-file JavaScript code with a ton of listeners and callbacks using the Plug.dj API to handle them. This is ALL engineered by JS -- on the back-end, I think Plug.dj is powered by Python, I could be wrong.
Anyway, what I would LIKE to do is create this bot on a language other than JS. It's really basic and not super powerful, and there are things like communicate with databases and such that I'd like to implement that aren't possible/convenient with JS. I just want to know if this is possible, and if so, where should I start looking?
I'd prefer a language like Java or C#. If there's any more info you need to know in order to answer this, let me know, please! I'd like to start working on this, I think it'd be fun to learn how to communicate with websites with Java/C#/whatever.
If the bot javascript runs on "their" server, then there is no simple solution. They are providing a mechanism for running "your" javascript on their server, but the chances are that they don't support other languages. (And the only way to find out would be to ask "them"). Assuming that the answer is "no", you would need to investigate whether you can implement your "bot"s functionality in client-side code; e.g. a custom client that you implemented from scratch in Java or C# or whatever. That's a big "if" ... because it will only work if they expose the server functionality you need in their external APIs.
OTOH, if the bot javascript runs on "your" server, then you should be able to change it to support other languages. (It wouldn't necessarily be easy though ...)
My advice would be to take a deep breath ... and stick with Javascript. We all have to use languages that we don't think are "fun".
I honestly would just leave it in javascript if it is something you need to have run in the client.
If you need to make database calls, you can introduce a web services layer in between against which you can make AJAX calls which interact with databases.
I think your perception of javascript as basic and not very powerful is not a very good one. They are very complex apps build today in just javascript and HTML5. You just might need to start looking at things like backbone.js, underscore.js, and similar libraries that can help provide more advanced code organization functionality available to you.
If however you are looking at building something that individual clients are not going to have installed in their browsers, but rather would just interact with the website in an automated admin, then certainly you can establish you own web service in whatever language you like to interact with their API's and perform admin tasks.
If they provide a JavaScript library that runs client side, it seems likely that it will be communicating with the server over HTTP. Therefore, it should be possible for you to analyse the library and the calls it makes to reverse engineer the server API (which would be the HTTP calls) and re-implement it in the language of your choice.
Looking at the code of bot.js:
https://github.com/backus/Plug.DJ-Javascript-Chatbot/blob/master/bin/bot.js
it seems everything comes down to calls against their API object, eg API.getDJs(), API.getWaitList() etc. If you can determine how this API object works, then you might be able to reverse engineer and re-implement.

webbrowser wrapper

Is there any wrapper that would allow me to access and modify raw request data (like the headers, body, cookies directly from the webbrowser object in winforms application using c# ?
The only thing I've seen out there which would let you interrogate what the browser is doing is Fiddler, which has an API. You might want to check that out, but it's not something that I would personally consider using to ship inside my production software unless I had a real solid requirement for it.
You can also try to implement your own Asynchronous Pluggable Protocol. So you'll be able to access and modify requests and then forward them to the destination using, for example, HttpWebRequest or raw sockets.
Some links you might consider useful to get started with:
A Simple protocol to view aspx pages without IIS implemented in C#
Internet Explorer Asynchronous Protocol Library
Although, using Asynchronous Pluggable Protocol in this case still looks like a hack (at least for me).
OK... So I've spent the last 4 days scouring the net for a solid solution to needing functionality like this. Mainly, being able to access the raw outbound request, and the raw inbound response associated with the webBrowser control in .Net. The results are absolutely disappointing. Why is this such a huge deal? Why can't MSFT just fix that control and add properties for the rawRequest and RawResponse? If you look you'll find developers trying to hack their way around this for the last 5-10 years. And nobody has come up with a solution? Really? WTF???
"WebBrowser.CreateSink Method"
Huge can of worms. Going here will break your mind.
"The most complete C# Webbrowser wrapper control"
On Win7, trying to regsrv32 the required old ass ATL DLL from 2006 results in an
Error.
"Subclass the WebBrowser control"
Won't work to gain access to the actual request/response HTTP packet data.
...sigh

monitor incoming/outgoing http traffic with .net

I want to write a small application that can:
1. monitor URLs requested via a web browser and/or
2. monitor incoming http responses on the local machine
I have been doing some Googling, but I am not finding any clear answers. I am thinking maybe System.Net.Sockets.TCPListener and messing around with it, but I am under the impression that it is either not what I"m looking for or can't handle both things.
I don't need a detailed step by step explanation. Just a small overview would be helpful (if this is even possible) such as (what classes to use, what events to subscribe to, any additional details needed to instantiate necessary objects) I can google the details.
Why not just extend Fiddler to suit your needs?
You may want to use a utility like Wireshark (graphical, but not scriptable from C#, AFAIK). Wireshare uses a library called WinPcap, which does the actual monitoring of the TCP streams on the computer. Decoding of the HTTP headers will need to be done in your application (this is what wireshark/tcpdump/windump should be able to do). You will need to use PInvoke to call the WinPcap DLL. You can find library wrappers by performing a web search for "WinPcap PInvoke".
This question sound very similar to the following:
Is there any way in .NET to programmatically listen to HTTP traffic?
EDIT: Also, Capturing HTTP requests may help

Monitoring outgoing internet traffic

Is there a way to monitoring internet traffic programatically? I would like to log the pages users are visiting on the Internet. Can this be achieved with .NET code, is there a 3rd party .NET component that could be used to retrieved data.
Information about Internet traffic must be stored to a database so I cannot use a plugin or something for IE. We are also looking to include this code into our existing product so we cannot use a 3rd party product that cannot be redistributed.
It would be cool if this thing could monitor traffic for all browsers but monitoring IE traffic might also be sufficient.
Setting up a sniffer is doable via the WinPCap library, which has several projects wrapping it to .NET:
Link
http://www.codeproject.com/KB/IP/dotnetwinpcap.aspx
http://coolthingoftheday.blogspot.com/2006/02/sharppcap-net-winpcap-wrapper.html
and probably some others as well, just a matter of Googling.
You would need to build some software that acts as a proxy. You should start by looking at programs like "Fiddler" to understand the concepts and what you need to implement.
If you want my profesional opionon you should go to Server Fault and ask for opionons for low cost internet proxy solution. Writing this thing yourself while challenging and fun will not make good business sense.
If you want to monitor all traffic from everyone on your network consider a product like Microsoft's Internet Security and Acceleration (ISA) Server
While this is probably overkill for what you want, the point is that you need a way to have all traffic go through a single point (a proxy server) where the traffic can be logged. Since all traffic goes through this one point, users can't avoid detection by using an alternate browser etc.
You can also try Pcap.Net to easily use WinPcap in .NET.
It is a wrapper for WinPcap written in C++/CLI and C# and includes a packet interpretation and creation framework.
Like JD already mentioned you should take a look into Fiddler. With this proxy you are able to monitor all the web traffic (if your browser is configured to use fiddler as proxy).
If you don't want fiddler as a standalone application and need its functionality within your own app, you should take a look into FiddlerCore. It is a .Net assembly which encapsulates the core functionality of Fiddler.
If you need a more raw way to sniff into the data (maybe you don't want to [or can't] configure the proxy of the client) the answer from Vinko will help you further.

Capturing HTTP requests

Is there a way to monitor and capture all outgoing HTTP requests from a machine using C#?
I need a browser independent way of logging visited URLs.
I use fiddler ( http://www.fiddler2.com )
You may want to use existing network interfaces capturing libraries like pcap or winpcap to do so.
Rewriting all the necessary stuff by yourself would be quite time expensive.
Link to Pcap
Link to WinPcap
Edit : Just saw someone also wrote the C# bindings to winpcap : SharpPcap
Sounds like you're after some kind of "packet sniffing" utility.
Here's a couple of links to articles on the Code Project site for packet sniffers (with downloadable source code) written in C#:
Packet Capture and Analayzer
A Network Sniffer in C#
If you're just after capturing visited URL's, these utilities may be overkill, however, you'll be able to extract a URL from your HTTP packets and discard the rest, however, you may also wish to capture all packet information, in which case, these utilities will help.
You're probably going to save lots of time and effort with some kind of proxy setup. A decent local-machine solution would be Fiddler (requires Windows), or something like a Squid server for a networked solution.
I hope if Wirehark works for you.
It's free and cross-platform.
Also,
"Wireshark is the world's foremost network protocol analyzer, and is the de facto (and often de jure) standard across many industries and educational institutions"
Have look,
Wireshark-Wikipedia
Also, writing a simple http proxy for this purpose in C# is a trivial task.

Categories