Using C# to efficiently pull data from a webpage with changing sourcecode?

Using C# to efficiently pull data from a webpage with changing sourcecode? - c#

I have already put together code using the System.net.Webclient class to pull source code from a webpage, which I then use a string search on, to get specific information. This in itself works fine, but my issue is that the source code changes every few seconds, and I would like the data I have received to change accordingly. I understand that I could simply set up a loop to have this process repeat, but unfortunately my current code take a full 2.7 seconds to complete, and I would like to avoid this large lag time. In addition, I want to avoid spamming the webpage with requests if possible. I was thinking about a streamread that stays open, so that multiple requests wouldn't have to be sent, but I wasn't entirely sure how to go about this...
So to sum it up, is there a way that I can pull updating information from a website using the System.Net namespace in a manner that is both fast, and avoids spamming the website with requests?

I am afraid that HTTP protocol is not adapted to your real-time data refresh requirement. Other than polling with HTTP requests at regular intervals you cannot know whether the data changed on the server and get this fresh data.
For example the WebSocket technology is more adapted to those scenarios. Of course the data provider must implement it so that clients could subscribe to this live feed.
There's also another way to implement this feature over the HTTP protocol. It uses an iframe to implement long polling. Here's an example. The idea is that the server uses chunked transfer encoding and sends continuous streams of data to the socket. The client subscribes to this stream and is able to be notified of changes occurring on the server. Once again, it's a technology that must be implemented by the server side so that you, as a client, could take advantage of it.
If all that the server provides is data via HTML page you are doomed to do screen scraping by hammering this server with HTTP requests until your IP address gets black listed and denied access.

Related

Approach for cross platform chat application with back-end in c#

I want to create a cross platform chat app with backend in c#
I searched for an approach to do so and found that I can do so with http requests to handlers on my server and use the response accordingly.
So till now I made the handlers which can add users, login, send and receive messages using database for storage.
Now I am making android client for that and to get messages for user I need to do http requests at a specific interval (3 seconds).
I feel it is not a good approach to do this. I am making this app for a target audience of nearly 30000. They would be able to chat one on one at a single session.
I just want to know if I am going in right direction or There is far more better ways to make chat apps using backend.
I have heard about wcf but I am not clear with what approach should I take. Please guide me about approaches for chat application.
Edit
An example of little working of any famous chat app like whats app, facebook messenger would be a great help.
Thanks.

You could do it with HTTP, but I'd suggest using TCP instead. There's a very solid base for a C# based TCP server on codereview right here which will outline how to deal with Socket objects how to handle connections properly.
The main perk of going about it this way is that you can connect your client to the server, and the client can be virtually any language, it doesn't have to be C# - as long as the language supports sockets, you'll be fine.
On top of that you can have the client listen to the server, which removes the need of polling the server for new messages every couple of seconds; the client socket will receive data when the server sends it, and you can handle it right away, nearly in real-time, whereas if you'd poll the server for new messages over HTTP every - say 3 seconds - you'll always end up with a delay in your chat service, which is something I think you will want to avoid.
See the code sample on CodeReview I linked above, and read up on how Sockets work in C#, how TCP works in terms of guarantees (TCP guarantees that whatever is sent over it will end up on the other side in the same order, but not necessarily in one packet, etc) and I'm pretty confident you'll be able to make a excellent chat app if you put it all to good use.
Edit: I just noticed the WCF tag on your post. I'd personally steer clear of it for this specific project since you want to achieve cross-platform support; try going as low-level as you possibly can for that.

php soap client and c# soap server

I am currently working on a project which I think using soap as part of it would be a good idea but I can't find how it will work in the way that I need.
I have a C# Console Application called ConsoleApp, ConsoleApp will also have a PHP web interface. What I'm thinking of doing, is the PHP web interface controls the ConsoleApp in some way, so I click a button on the web interface, then this does a sends a soap request to a soap service and then the soap service, sends the information on to the consoleApp, and the result is returned back to the SoapService and then returned back to PHP.
This seems like it would need to separate soap services, one for php to interface with and one within the ConsoleApp but this doesn't sound right, I think I might be misunderstanding the purpose of Soap.
How can this be achieved. Thanks for any help you can provide
UPDATE
As requested I thought I'd add a bit more information on what I am trying to achieve.
In the console app, it is acting as an email server sending out emails that are given to the program and then being sent on, and if it can't send it retries a couple of times until the email goes into a failed state.
The web interface will provide a status of what the email server is doing, i.e. how many emails are incoming, how many are yet to be processed, how many have sent and how many have failed.
From the web page you will be able to shutdown or restart the email server or put one of the failed emails back into the the queue to be processed.
The idea is, when the user adds a failed email back into the queue it sends a soap message that the console app will receive, add the information back into the queue, log the event in the console apps log file, increment a counter which is how it keep track of emails that need to be processed. Once this has been done it should then send a response back to the web interface to say whether or not the email was successfully added back into the queue or whether it failed for some reason.
I don't really want to keep on polling the database every so many seconds as there could be the potential for their to be a large number of emails that will be being processed so polling the database would put a large load on the MySQL server which I don't want, which is why I thought soap as the email server would only need to do something when it receives a soap request to do something.
Thanks for any help.

Every web service is going to need a client (in your case PHP) and a server (ConsoleApp). Even though there are two endpoints, it is still one web service. Your PHP will send a SOAP request which ConsoleApp will receive, process and respond to with a SOAP response.
So when someone clicks the button on the web page, you can use JavaScript to build and send the SOAP envelope in the browser. The alternative is to POST the values to a PHP page that will build and send the SOAP.
I have to admit though, your scenario sounds a unusual. I personally haven't heard of web pages talking directly with console apps. Web pages usually talk to web servers, and the servers are usually the ones issuing atypical requests, like your request to ConsoleApp. While it is technically possible, but I think it is going to be harder then you are expecting.
Personally, I would ditch SOAP in favor of a much more simple and scalable solution. Assuming you have access to a database, I would have the PHP create a record in the database when the user clicks the button. ConsoleApp would then poll the database every X seconds to look for new records. When it finds a new record, it processes it.
This has the benefit of being simple (database access is almost always easier than SOAP) and scalable (you could easily run an arbitrary number of ConsoleApps to process all of the incoming requests if you are expecting heavy loads). Also, neither the PHP page nor the ConsoleApp have a direct dependency on the other so each individual component is less likely to cause a failure in the whole system.

Push content to client after page is loaded

I wonder if the following is possible:
I have a dll that I have referenced in my web site. This dll makes a remote socket connection. The socket connection is waiting in the background and the dll reports back data through events after the socket has received some data.
The connection is opened during load of page.
Now, I would, for example, like to update a label on the page when new data has arrived.
I am not sure how this would work. I assume that I could set some kind of timer on page that updates a control but it does not seem "optimal" as I already call code behind through my events. "Optimally" the UpdatePanel or whatever updates the interface would wait for events and "update" when events has occurred and not based on time.
My question is - is this possible?

You can use techniques called "long pulling" or "web sockets".
There are libraries, like SignalR that can help you.
They generally use web sockets when client's browser supports and long pulling when not.
Using these libraries you can "push" commands/data from server to client's browser, just as you want.

Server cannot push the client, this is the rule how web works.
There are two possible ways to complete your task.
Put timer on the page and make requests on server too see, if something changed
Open long pulling connection between client and server, like facebook does. That listens
to the events and gets data from the server. It can be, for example xmpp or any other.

You should think about turning this around; rather than initiate the call from the code behind asynchronously and then update the client, you should deliver your original page to the client.
Once the page has been loaded client side, you can make an AJAX call to the server to retrieve the data you want - and display a little "I'm loading" symbol while this happens.
Page Methods are ideal for this.
The classic ASP.Net web page is supposed to last for the duration of the users request. When a postback happens, it effectively rebuilds the state of the call from the user's viewstate, session and any other state you've maintained. It never cares if the user goes away, for example (although it might feel a bit lonely).
Having it hang around for longer is problematic in a number of ways, however, you can implement client callbacks. However, although the server initiates this, the client manages the lifecycle, so it's analogous to page methods.

CF app two way communications with server

Users in field with PDA's will generate messages and send to the server; users at the server end will generate messages which need to be sent to the PDA.
Messages are between the app and server code; not 100% user entered data. Ie, we'll capture some data in a form, add GPS location, time date and such and send that to the server.
Server may send us messages like updates to database records used in the PDA app, messages for the user etc.
For messages from the PDA to server, that's easy. PDA initiates call to server and passes data. Presently using web services at the server end and "add new web reference" and associated code on the PDA.
I'm coming unstuck trying to get messages from the the server to the PDA in a timely fashion. In some instances receiving the message quickly is important.
If the server had a message for a particular PDA, it would be great for the PDA to receive that within a few seconds of it being available. So polling once a minute is out; polling once a second will generate a lot of traffic and, maybe draim the PDA battery some ?
This post is the same question as mine and suggests http long polling:
Windows Mobile 6.0/6.5 - Push Notification
I've looked into WCF callbacks and they appear to be exactly what I want however, unavailable for compact framework.
This next post isn't for CF but raises issues of service availability:
To poll or not to poll (in a web services context)
In my context i'll have 500-700 devices wanting to communicate with a small number of web services (between 2-5).
That's a lot of long poll requests to keep open.
Is sockets the way to go ? Again that's a lot of connections.
I've also read about methods using exchange or gmail; i'm really hesitant to go down those paths.
Most of the posts i've found here and in google are a few years old; something may have come up since then ?
What's the best way to handle 500-700 PDA CF devices wanting near-instant communication from a server, whilst maintaing battery life ? Tall request i'm sure.

Socket communication seems like the easiest approach. You say you're using webservices for client-server comms, and that is essentially done behind the scenes by the server (webservice) opening a socket and listening for packets arriving, then responding to those packets.
You want to take the same approach in reverse, so each client opens a socket on its machine and waits for traffic to arrive. The client will basically need to poll its own socket (which doesnt incur any network traffic). Client will also need to communicate its ip address and socket to the server so that when the server needs to communicate back to the client it has a means of reaching it. The server will then use socket based comms (as opposed to webservices) to send messages out as required. Server can just open a socket, send message, then close socket again. No need to have lots of permanently open sockets.
There are potential catches though if the client is roaming around and hopping between networks. If this is the case then its likely that the ip address will be changing (and client will need to open a new socket and pass the new ip address/socket info to the server). It also increases the chances that the server will fail to communicate with the client.
Sounds like an interesting project. Good luck!

Ages ago, the CF team built an application called the "Lunch Launcher" which was based on WCF store-and-forward messaging. David Kline did a nice series on it (here the last one, which has a TOC for all earlier articles).
There's an on-demand Webcast on MSDN given by Jim Wilson that gives an outline of store-and-forward and the code from that webcast is available here.
This might do what you want, though it got some dependencies (e.g. Exchange) and some inherent limitations (e.g. no built-in delivery confirmation).

Ok, further looking and I may be closer to what I want; which I think i a form of http long poll anyway.
This article here - http://www.codeproject.com/KB/IP/socketsincsharp.aspx - shows how to have a listener on a socket. So I do this on the server side.
Client side then opens a socket to the server at this port; sends it's device ID.
Server code first checks to see if there is a response for that device. If there is, it responds.
If not, it either polls itself or subscribes to some event; then returns when it's got data.
I could put in place time out code on the server side if needed.
Blocking on the client end i'm not worried about because it's a background thread and no data is the same as blocking at the app level; as to CPU & batter life, not sure.
I know what i've written is fairly broad, but is this a strategy worth exploring ?

Problems Sending Large Volume of Emails using ASP.Net

I'm having an issue sending large volumes of emails out from an ASP.Net application. I won't post the code, but instead explain what's going on. The code should send emails to 4000 recipients but seems to stall at 385/387.
The code creates the content for the email in a string.
It then selects a list of email address to send to.
Looping through the data via a datareader it picks out the email address and sends an email.
The email sending is done by a separate method which can handle failures and returns it's outcome.
As each record is sent I produce an XML node in an XML document to log each specific attempt to send.
The loop seems to end prematurely and the XML document is saved to disk.
Now I know the code works. I have run it locally using the same SMTP machine and it worked fine with 500 records. Granted there was less content, but I can't see how that would make any difference.
I don't think the page itself times out, but even if it did, I was sure .Net would continue processing the page, even if the user saw a page time out error.
Any suggestions appreciate because I'm pretty stumped.

You're sending lots of emails. During the span of a single request? IIS will kill a request if it takes longer than a certain (configurable) amount of time.
You need to use a separate process to do stuff like this. Whether that's a Timer you start from within global.asax, or a Thread which checks for a list of emails in a database/app_data directory, or a service you send a request to via WCF, or some combination of these.

The way I've handled this in the past is to queue the emails into a SQL Server table and then launch another thread to actually process/send the emails. Another aspx utility page can give me the status of the queue or restart the processing.
I also highly recommend that use an existing, legit, third-party mailing service for your SMTP server if you are sending mail out to the general public. Otherwise you run the risk of your ISP shutting off your mail access or (worse) your own server being blacklisted.

If the web server has a timeout setting, it will kill the page if it runs too long.

I recommend you check the value of HttpServerUtility.ScriptTimeout - if this is set then when a script has run for that length of time, it will be shut down.
Something you could do to help is go completely old-school - combine some Response.Writes with a few Response.Flush to send some data back to the client browser, and this tends to keep the script alive (certainly worked on an old ASP.NET 1.1 site we had).
Also, you need to take into account when this script is being run - the server may well also have been configured to perform an application reset (by default this is set to every 29 hours in IIS), if your server is set to something like 24 hours and this coincides with the time your script it run, you could be seeing that too - although the fact that the script's logging its response probably rules that out - unless your XML document is badly formed?
All that being said, I'd go with Will's answer of using a seperate process (not just a thread hosted by the site), or as Bryan said, go with a proper mailing service, which will help you with things like bounce backs, click tracking, reporting, open counts, etc, etc.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.