I am using asp:image field to get an image from different urls. I use the imageurl to set the image string from the remote website(ex: http://www.google.com/favicon.ico), the question is how can I tell if the image exist or not? that mean that the Url of the image is valid.
You can validate if a URI is valid using the Uri.TryCreate method.
You shouldn't be checking whether the image exists in your ASP.Net application. It is the job of the browser to download the image. You can add javascript to allow the browser to replace the missing image with a default image, as described in this question.
You can't do this by using the asp:Image control alone. However, with a little extra work it would be possible to use a ASHX handler to make a programmatical HttpRequest for the image (e.g. using the image on the querystring). If the HttpRequest succeeds, you can then stream the image to the response.
If the HttpRequest returns a 404 status, then you can serve another predefined image instead.
However, this is like using a sledgehammer to crack a nut and should not be used extensively throughout a site as it could potentially create a significant load - essentially, you are asking the server (not the user's browser) to download the image. Also it could be a potential XSS security risk if not implemented carefully.
It would be fine for individual cases, specifically when you actually need to retain the requested image locally. Any images requested should be written to disk so that future requests can serve the previously retained images.
Obviously, Javascript is a solution too but I mention the above as a possibility, depending on the requirements.
class MyClient : WebClient
{
public bool HeadOnly { get; set; }
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest req = base.GetWebRequest(address);
if (HeadOnly && req.Method == "GET")
{
req.Method = "HEAD";
}
return req;
}
}
private bool headOnly;
public bool HeadOnly {
get {return headOnly;}
set {headOnly = value;}
}
using(var client = new MyClient()) {
client.HeadOnly = true;
// fine, no content downloaded
string s1 = client.DownloadString("http://google.com");
// throws 404
string s2 = client.DownloadString("http://google.com/silly");
}
try this one !!!
Related
I have written in C# application to crawl websites.
Now I have a problem
I can identify that this URL leads to a file or a webpage!
How can I solve this problem without having to send the requested URL?
You can't without sending a request... As Uniform Resource Locator is not comparable to a File System Path. For instance, while the following url ends with a .jpg, it is clearly not a picture :
google.com/search?q=asd.jpg
Here is how, if you decided to change mind :
public bool IsFileContent(string url)
{
var request = HttpWebRequest.Create(url);
request.Method = "HEAD";
switch (request.GetResponse().ContentType)
{
case "image/jpeg": return true;
case "text/plain": return true;
case "text/html": return false;
default: // TODO: add more case as needed
throw new ArgumentOutOfRangeException();
}
}
What you are asking to do is literally impossible. URLs do not 'lead to files or web pages.' They are routed to request handlers. A request handler can return an HTML response or a file download or other types of responses. Some extensions such as ".html" or ".pdf" imply what the type of response should be. But a URL could have an extension that doesn't indicate the response type, or (as on this very page) no extension at all.
You cannot determine the response type of an HTTP request from the URL alone.
Without sending any request the only thing I could think of is to check for a file extention at the end of url. This won't give you a 100% success rate, because you can send a file using a url that doesn't end on a extension. That being said it is common practice to let a file url end on the filename with the extension
I am using WebClient to retrieve a website. I decided to set If-Modified-Since because if the website hasn't changed, I don't want to get it again:
var c = new WebClient();
c.Headers[HttpRequestHeader.IfModifiedSince] = Last_refreshed.ToUniversalTime().ToString("r");
Where Last_refreshed is a variable in which I store the time I've last seen the website.
But when I run this, I get a WebException with the text:
The 'If-Modified-Since' header must be modified using the appropriate property or method.
Parameter name: name
Turns out the API docs mention this:
In addition, some other headers are also restricted when using a WebClient object. These restricted headers include, but are not limited to the following:
Accept
Connection
Content-Length
Expect (when the value is set to "100-continue")
If-Modified-Since
Range
Transfer-Encoding
The HttpWebRequest class has properties for setting some of the above headers. If it is important for an application to set these headers, then the HttpWebRequest class should be used instead of the WebRequest class.
So does this mean there's no way to set them from WebClient? Why not? What's wrong with specifying If-Modified-Since in a normal HTTP GET?
I know I can just use HttpWebRequest, but I don't want to because it's too much work (have to do a bunch of casting, can't just get the content as a string).
Also, I know Cannot set some HTTP headers when using System.Net.WebRequest is related, but it doesn't actually answer my question.
As unwieldy as it may be, I have opted to subclass WebClient in order to add the functionality in a way that mimics the way WebClient typically works (in which headers are consumed by / reset after each use):
public class ApiWebClient : WebClient {
public DateTime? IfModifiedSince { get; set; }
protected override WebRequest GetWebRequest(Uri address) {
var webRequest = base.GetWebRequest(address);
var httpWebRequest = webRequest as HttpWebRequest;
if (httpWebRequest != null) {
if (IfModifiedSince != null) {
httpWebRequest.IfModifiedSince = IfModifiedSince.Value;
IfModifiedSince = null;
}
// Handle other headers or properties here
}
return webRequest;
}
}
This has the advantage of not having to write boilerplate for the standard operations that WebClient provides, while still offering some of the flexibility of using WebRequest.
I made a code to get the camera feed in javascript using getusermedia in html5. Now what i want is to send that image to my c# api. I already this a question that is related to this, Sending base64 string to c# server but no luck. So I just want to ask what are the other way to send an image from html5 and javascript to my c# server.
Found an interesting article on this one over at Ode To Code. It shows how to write both the JavaScript and C# code to handle posting the content of an image captured from an HTML5 video element (from the previous blog post) to a C# ASP.NET server. His code is not too difficult to follow. I'd do the Regex a little differently, but it should work for you.
You can 'capture' the current content of a video object by drawing it to an HTML canvas, then convert the content of the canvas to a data: URI that you can post to your C# application.
The fun part is converting that data: URI back into an image, which the Ode To Code article shows you how to do. What you do with it after that is up to you. The O2C code saves it to disk, but you could run it through a MemoryStream and load it into memory using System.Drawing.Image.FromStream or similar.
Second answer, combining the CORS issue, tested locally.
Tested C# code (attribute and ApiController):
[AttributeUsage(AttributeTargets.Method)]
public class AllowReferrerAttribute : ActionFilterAttribute
{
/// <summary>Update headers with CORS 'Access-Control-Allow-Origin' as required</summary>
/// <param name="actionContext">Context of action to work with</param>
public override void OnActionExecuting(HttpActionContext actionContext)
{
var ctx = (System.Web.HttpContextWrapper)actionContext.Request.Properties["MS_HttpContext"];
var referrer = ctx.Request.UrlReferrer;
if (referrer != null)
{
string refhost = referrer.Host;
string thishost = ctx.Request.Url.Host;
if (refhost != thishost)
ctx.Response.AddHeader("Access-Control-Allow-Origin", string.Format("{0}://{1}", referrer.Scheme, referrer.Authority));
}
base.OnActionExecuting(actionContext);
}
}
public class TestController : ApiController
{
[AllowReferrer]
public void Post([FromBody]string value)
{
if (value == null)
throw new HttpResponseException(HttpStatusCode.BadRequest);
if (value.Length > 100000)
throw new HttpResponseException(HttpStatusCode.Forbidden);
}
}
JavaScript code to invoke:
function imgToBase64(img)
{
var canvas = document.createElement('CANVAS');
var ctx = canvas.getContext('2d');
canvas.height = img.height;
canvas.width = img.width;
ctx.drawImage(img, 0, 0);
var dataURL = canvas.toDataURL('image/jpeg');
$(canvas).remove();
return dataURL;
}
$.ajax({ url: 'http://localhost:63905/api/Test', type: 'POST', data: "=" + imgToBase64($('img')[0]), crossDomain: true })
To test I copied the above code into the console of a browser with a foreign page (on one of my servers) loaded, where the first image on the page was a nice big one. Data arrives as expected in the value parameter, and returns a 403 error because it's over 100,000 bytes in size.
The tricks are:
You have to have a [FromBody] parameter in your Post handler.
The data passed to the $.ajax call must be a string with an = character at the start. This is just the way things work with ApiController and [FromBody]. The = character will be stripped.
The AllowReferrer attribute will stop your JavaScript code from giving errors when you try to do an AJAX POST to a different site, but that's all. It just stops you reporting back to your JavaScript as to what you did with the data it posted. The POST will generally still be processed by your controller.
If the site you're on fails it's probably because of cross-site scripting headers like x-xss-protection:1; mode=block in the response when the image was served. A lot of sites now set these headers as easy protection against simple DNS poisoning attacks and so on. This will cause an error along the lines of:
SecurityError: Failed to execute 'toDataURL' on 'HTMLCanvasElement': Tainted canvases may not be exported.
If you get that error then you're up against a CORS issue and there's not much you can do about it. If you're working with your own code then it's not a problem, it's when you're trying to grab stuff from other sites that CORS is an issue.
And no, I'm not going to even start telling you how to break CORS. That's not what Stack Overflow is for.
You will probably first think is not possible because of XSS restrictions. But I'm trying to access this content from an application that hosts a WebBrowser, not from javascript code in a site.
I understand is not possible and should not be possible via non hacky means to access this content from javascript because this would be a big security issue. But it makes no sense to have this restriction from an application that hosts a WebBrowser. If I'd like to steel my application user's Facebook information, I could just do a Navigate("facebook.com") and do whatever I want in it. This is an application that hosts a WebBrowser, not a webpage.
Also, if you go with Google Chrome to any webpage that contains an iFrame whose source is in another domain and right click its content and click Inspect Element, it will show you the content. Even simpler, if you navigate to any webpage that contains an iFrame in another domain, you will see its content. If you can see it on the WebBrowser, then you should be able to access it programmatically, because it have to be somewhere in the memory.
Is there any way, not from the DOM objects because they seem to be based on the same engine as javascript and therefore restricted by XSS restrictions, but from some more low level objects such as MSHTML or SHDocVw, to access this text?
Can this be useful for you?
foreach (HtmlElement elm in webBrowser1.Document.GetElementsByTagName("iframe"))
{
string src = elm.GetAttribute("src");
if (src != null && src != "")
{
string content = new System.Net.WebClient().DownloadString(src); //or using HttpWebRequest
MessageBox.Show(content);
}
}
Do you just need a way to request content from code?
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(webRequest.URL);
request.UserAgent = webRequest.UserAgent;
request.ContentType = webRequest.ContentType;
request.Method = webRequest.Method;
if (webRequest.BytesToWrite != null && webRequest.BytesToWrite.Length > 0) {
Stream oStream = request.GetRequestStream();
oStream.Write(webRequest.BytesToWrite, 0, webRequest.BytesToWrite.Length);
oStream.Close();
}
// Send the request and get a response
HttpWebResponse resp = (HttpWebResponse)request.GetResponse();
// Read the response
StreamReader sr = new StreamReader(resp.GetResponseStream());
// return the response to the screen
string returnedValue = sr.ReadToEnd();
sr.Close();
resp.Close();
return returnedValue;
In one of my application I'm using the WebClient class to download files from a web server. Depending on the web server sometimes the application download millions of documents. It seems to be when there are lot of documents, performance vise the WebClient doesn't scale up well.
Also it seems to be the WebClient doesn't immediately close the connection it opened for the WebServer even after it successfully download the particular document.
I would like to know what other alternatives I have.
Update:
Also I noticed that for each and every download WebClient performs the authentication hand shake. I was expecting to see this hand shake once since my application only communicate with a single web server. Shouldn't the subsequent calls of the WebClient reuse the authentication session?
Update: My application also calls some web service methods and for these web service calls it seems to authentication session is reused. Also I'm using WCF to communicate with the web service.
I think you can still use "WebClient". However, you are better off using the "using" block as a good practice. This will make sure that the object is closed and is disposed off:-
using(WebClient client = new WebClient()) {
// Use client
}
I bet you are running into the default limit of 2 connections per server. Try running this code at the beginning of your program:
var cme = new System.Net.Configuration.ConnectionManagementElement();
cme.MaxConnection = 100;
System.Net.ServicePointManager.DefaultConnectionLimit = 100;
I have noticed the same behavior with the session in another project I was working on. To solve this "problem" I did use a static CookieContainer (since the session of the client is recognized by a value saved in a cookie).
public static class SomeStatics
{
private static CookieContainer _cookieContainer;
public static CookieContainer CookieContainer
{
get
{
if (_cookieContainer == null)
{
_cookieContainer = new CookieContainer();
}
return _cookieContainer;
}
}
}
public class CookieAwareWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = SomeStatics.CookieContainer;
(request as HttpWebRequest).KeepAlive = false;
}
return request;
}
}
//now the code that will download the file
using(WebClient client = new CookieAwareWebClient())
{
client.DownloadFile("http://address.com/somefile.pdf", #"c:\\temp\savedfile.pdf");
}
The code is just an example and inspired on Using CookieContainer with WebClient class and C# get rid of Connection header in WebClient.
The above code will close your connection immediately after the file is download and it will reuse the authentication.
WebClient is probably the best option. It doesn't close the connection straight away for a reason: so it can use the same connection again, without having to open a new one. If you find that it's not reusing the connection as expected, that's usually because you're not Close()ing the response from the previous request:
var request = WebRequest.Create("...");
// populate parameters
var response = request.GetResponse();
// process response
response.Close(); // <-- make sure you don't forget this!