How can I get the loading time and volume of a website? - c#

I'm working on a program that should measure loading time and volume of a website that I give as an input.
Here I have some code that returns just response time of website but I want the total loading time and total volume of items such (pictures, JavaScript, HTML, etc.).
public string Loading_Time(string url)
{
Stopwatch stopwatch = new Stopwatch();
WebClient client = new WebClient();
client.Credentials = CredentialCache.DefaultCredentials;
stopwatch.Start();
string result = client.DownloadString(url);
stopwatch.Stop();
return stopwatch.Elapsed.Milliseconds.ToString();
}
How can I achieve that?

This is going to be a little bit tough. Start by using something like HTMLAgilityPack or something similar to parse the returned html from your original request (dont try to parse HTML yourself!)
Scan through the object representation of the HTML once parsed, and decide what you want to measure the size of. Typically this will be
Includes, such as CSS, or javascript
Images in IMG and BUTTON elements, as well as background images
The difficulty is that often images are specified as part of a css stylesheet - so are you going to try to parse every css file to obtain these too?
The original request you made for the HTML you could have obtained the byte size of the downloaded string. Start with this number as your "volume".
Now make a separate request for each js, css, image etc file in the same way. But all you're interested in is the byte size of each download - its readily available when you make an HTTP request. Add each item's byte size to the total.
When you're finished you will have the total byte size for all artifacts of that web page.

Related

How to show progress of zip download without zip file existing before hand but knowing the file sizes

I have logic that downloads a group of files as a zip. The issue is there is no progress so the user does not know how far along the download is.
This Zip file doesn't exist before hand, the user selects the files they want to download and then I use the SharpZipLib nuget package to create a zip
and stream it to the response.
It seems I need to set the Content-Length header for the browser to show a total size progress indicator. The issue I'm having is it seems
this value has to be exact, if its too low or too high by 1 byte the file does not get downloaded properly. I can get an approximate
end value size by adding all the files size together and setting there to be no compressions level but I don't see a way I can calculate the final zip size exactly.
I hoped I could of just overesitmated the final size a bit and the browser would allow that but that doesn't work, the file isn't downloaded properly so you cant access it.
Here are some possible solution I've come up with but they have there own issues.
1 - I can create the zip on the server first and then stream it, therefore knowing the exact size I can set the Content-length. Issue with this
is the user will have to wait for all the files to be streamed to the web server, the zip to be created and then I can start streaming it to the user. While this is going on the user wont even see the file download as being started. This also results in more memory usage of the web server as it has to persist the entire zip file in memory.
2 - I can come up with my own progress UI, I will use the combined file sizes to get a rough final size estimation and then as the files are streamed I push updates to the user via signalR indicating the progress.
3- I show the user the total file size before download begins, this way they will at least have a way to assess themselves how far along it is. But the browser has no indication of how far along it is so if they may forget and when they look at the browser download progress there will be no indication how far along it is
These all have their own drawbacks. Is there a better way do this, ideally so its all handled by the browser?
Below is my ZipFilesToRepsonse method. It uses some objects that aren't shown here for simplicity sake. It also streams the files from azure blob storage
public void ZipFilesToResponse(HttpResponseBase response, IEnumerable<Tuple<string,string>> filePathNames, string zipFileName)
{
using (var zipOutputStream = new ZipOutputStream(response.OutputStream))
{
zipOutputStream.SetLevel(0); // 0 - store only to 9 - means best compression
response.BufferOutput = false;
response.AddHeader("Content-Disposition", "attachment; filename=" + zipFileName);
response.ContentType = "application/octet-stream";
Dictionary<string,long> sizeDictionary = new Dictionary<string, long>();
long totalSize = 0;
foreach (var file in filePathNames)
{
long size = GetBlobProperties(file.Item1).Length;
totalSize += size;
sizeDictionary.Add(file.Item1,size);
}
//Zip files breaks if we dont have exact content length
//and it isn't nesccarily the total lengths of the contents
//dont see a simple way to get it set correctly without downloading entire file to server first
//so for now we wont include a content length
//response.AddHeader("Content-Length",totalSize.ToString());
foreach (var file in filePathNames)
{
long size = sizeDictionary[file.Item1];
var entry = new ZipEntry(file.Item2)
{
DateTime = DateTime.Now,
Size = size
};
zipOutputStream.PutNextEntry(entry);
Container.GetBlockBlobReference(file.Item1).DownloadToStream(zipOutputStream);
response.Flush();
if (!response.IsClientConnected)
{
break;
}
}
zipOutputStream.Finish();
zipOutputStream.Close();
}
response.End();
}

Can't get favicon from instagram

I need to retrieve favicon of instagram. Now my program can parse html code and retrieve the appropriate url of the icon: http://d36xtkk24g8jdx.cloudfront.net/bluebar/d1f7ba7/images/ico/apple-touch-icon-precomposed.png.
But I can't read this icon from my program because the instagram puts some extra bytes in the beginning, the middle and the end of this file when my program tries to download it:
var wc = new WebClient();
var iconBytes = wc.DownloadData(#"http://d36xtkk24g8jdx.cloudfront.net/bluebar/d1f7ba7/images/ico/apple-touch-icon-precomposed.png");
var converter = new ImageConverter();
var image = (Image)converter.ConvertFrom(iconBytes); // Crash here 'parameter is invalid'
I tried to save the png file from the web browser directly. Then I analyzed its content and came to conclusion that the bytes array which WebClient returns is almost identical but it contains 15 extra bytes in the beginning, 8 extra bytes in the end and 5 extra bytes in the middle of the array. I can easily clean this 'salt' from the beginning and the end based on knowledge of png format, but I have no idea how to remove the garbage from the middle of the array.
Could you please help me figure out how to download and process instagram favicon?
This Google Service will return the image:
http://www.google.com/s2/favicons?domain=www.instagram.com
Example
http://www.google.com/s2/favicons?domain= *Domain*

Read only the title and/or META tag of HTML file, without loading complete HTML file

Scenario :
I need to parse millions of HTML files/pages (as fact as I can) & then read only only Title or Meta part of it & Dump it to Database
What I am doing is using System.Net.WebClient Class's DownloadString(url_path) to download & then Saving it to Database by LINQ To SQL
But this DownloadString function gives me complete html source, I just need only Title part & META tag part.
Any ideas, to download only that much content?
I think you can open a stream with this url and use this stream to read the first x bytes, I can't tell the exact number but i think you can set it to reasonable number to get the title and the description.
HttpWebRequest fileToDownload = (HttpWebRequest)HttpWebRequest.Create("YourURL");
using (WebResponse fileDownloadResponse = fileToDownload.GetResponse())
{
using (Stream fileStream = fileDownloadResponse.GetResponseStream())
{
using (StreamReader fileStreamReader = new StreamReader(fileStream))
{
char[] x = new char[Number];
fileStreamReader.Read(x, 0, Number);
string data = "";
foreach (char item in x)
{
data += item.ToString();
}
}
}
}
I suspect that WebClient will try to download the whole page first, in which case you'd probably want a raw client socket. Send the appropriate HTTP request (manually, since you're using raw sockets), start reading the response (which will not be immediately) and kill the connection when you've read enough. However, the rest will have probably already been sent from the server and winging its way to your PC whether you want it or not, so you might not save much - if anything - of the bandwidth.
Depending on what you want it for, many half decent websites have a custom 404 page which is a lot simpler than a known page. Whether that has the information you're after is another matter.
You can use the verb "HEAD" in a HttpWebRequest to return the the response headers (not element. To get the full element with the meta data you'll need to download the page and parse out the meta data you want.
System.Net.WebRequest.Create(uri) { Method = "HEAD" };

Grabbing Images from a webpage quickly

I was wondering if someone could give me some guidance here. I'd like to be able to programatically get every image on a webpage as quickly as possible. This is what I'm currently doing: (note that clear is a WebBrowser control)
if (clear.ReadyState == WebBrowserReadyState.Complete)
{
doc = (IHTMLDocument2)clear.Document.DomDocument;
sobj = doc.selection;
body = doc.body as HTMLBody;
sobj.clear();
range = body.createControlRange() as IHTMLControlRange;
for (int j = 0; j < clear.Document.Images.Count; j++)
{
img = (IHTMLControlElement)clear.Document.Images[j].DomElement;
HtmlElement ele = clear.Document.Images[j];
string test = ele.OuterHtml;
string test2 = ele.InnerHtml;
range.add(img);
range.select();
range.execCommand("Copy", false, null);
Image image = Clipboard.GetImage();
if (image != null)
{
temp = new Bitmap(image);
Clipboard.Clear();
......Rest of code ...........
}
}
}
However, I find this can be slow for alot of images, and additionally it hijacks my clipboard. I was wondering if there is a better way?
I suggest using HttpWebRequest and HttpWebResponse. In your comment you asked about efficiency/speed.
From the standpoint of data being transferred using HttpWebRequest will be at worst the same as using a browser control, but almost certainly much better. When you (or a browser) makes a request to a web server, you initially only get the markup for the page itself. This markup may include image references, objects like flash, and resources (like scripts and css files) that are referenced, but not actually included in the page itself. A web browser will then proceed to request all the associated resources needed to render the page, but using HttpWebRequest you can request only those things that you actually want (the images).
From the standpoint of resources or processing power required to extract entities from a page, there is no comparison: using a broswer control is far more resource intensive than scanning an HttpWebResponse. Scanning some data using C# code is extremely fast. Rendering a web page involves javascript, graphics rendering, css parsing, layout, caching, and so on. It's a pretty intensive operation, actually. Using a browser under programmatic control, this will quickly become apparent: I doubt you could process more than a page every second or so.
On the other hand, a C# program dealing directly with a web server (with no rendering engine involved) could probably handle dozens if not hundreds of pages per second. For all practical purposes, you'd really be limited only by the response time of the server and your internet connection.
There are multiple approaches here.
If it's a one time thing, just browse to the site and select File > Save Page As... and let the browser save all the images locally for you.
If it's a recurring thing there are lots of different ways.
buy a program that does this. I'm sure there are hundreds of implementations.
use the html agility pack to grab the page and compile a list of all the images I want. Then spin a thread for each image that downloads and saves it. You might limit the number of threads depending on various factors like your (and the sites) bandwidth and local disk speed. Note that some sites have arbitrary limitations placed on the number of concurrent requests per connection they will handle. Depending on the site this might be as few as 3.
This is by no means conclusive. There are lots of other ways. I probably wouldn't do it through a WebBrowser control though. That code looks brittle.

how external counter get unique visitors?

how do external counter track unique visitors via image
i'd also like to get Referrer if possible.
something like img="http://www.somecounterdomain.com/count.php?page=83599"
i'm using ASP.NET, c#
i'm aware of a user can "cheat" but would like to make that posibility minimal.
additional difficulty is that i should trach external server and can't implement c# code there.
what i can is only imlement a counter imag or smth like that.
i try to use generated image.
thx for answers.
Basically what you need to do is the following.
1- Create either a .ashx or .aspx. Assuming you go with .aspx and call it StatServer.aspx, the Page_Load function will read the query string and write the data to a database, you will see the querystring in step 2. If you want, you can return a image which can be rendered. Some rough code will look something like this.
private void Page_Load(object sender, EventArgs e)
{
WriteQueryStringInformationToDB(Request.QueryString);
Image image = LoadYourImageHere();
using (MemoryStream stream = new MemoryStream())
{
base.Response.Clear();
base.Response.ContentType = "image/png";
image.Save(stream, ImageFormat.Png);
stream.WriteTo(base.Response.OutputStream);
base.Response.End();
}
}
2- This is the magic, you create a small .js file. In this file you have a function lets call it mystats() which will essentially gather the client side information and make a call to the URL hosting the page you created in step 1. The client side information like screen size, referer etc. is all passed on the querystring. One important thing to include in the function is an ID which indicates which which counter you are updating, that way you can use your counter on multiple sites. A very simple .js might look something like this. (Note tested etc... :))
function mystats(id)
{
// Base URL including the ID of the counter
var url="http://yourdomainorservername/statserver.aspx?id="+id;
// Add the referer to the url querystring
url += "&r=" + escape(document.referrer);
// Add screen width + height
url += "&w=" + screen.width + "&h=" + screen.height;
document.write('<img src="'+url+'" border=0 alt="Site statistics">');
}
3- On the web pages that you want to apply the counter, you add a script block that includes the the .js file from your server and calls the mystats function from an img tag, this causes the js code to collect the info and send a request to your server, which in turn updates the DB and returns the image stream to display.
Getting the 'referer' is easy and for counting unique visitors you'll need to set/check for cookies.

Categories