Quick Download HTML Source in C# - c#

I am trying to download a HTML source code from a single website (https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/) in C#.
The issue is that it takes 10 seconds to download a 30kb HTML page source. Internet connection is not an issue, as I am able to download 10Mb files in this program instantly.
The following has been executed both in a separate thread and in the main thread. It still takes 10-12 seconds to download.
1)
using (var httpClient = new HttpClient())
{
using (var request = new HttpRequestMessage(new HttpMethod("GET"), url))
{
var response = await httpClient.SendAsync(request);
}
}
2)
using (var client = new System.Net.WebClient())
{
client.Proxy = null;
response = client.DownloadString(url);
}
3)
using (var client = new System.Net.WebClient())
{
webClient.Proxy = GlobalProxySelection.GetEmptyWebProxy();
response = client.DownloadString(url);
}
4)
WebRequest.DefaultWebProxy = null;
using (var client = new System.Net.WebClient())
{
response = client.DownloadString(url);
}
5)
var client = new WebClient()
response = client.DownloadString(url);
6)
var client = new WebClient()
client.DownloadFile(url, filepath);
7)
System.Net.WebClient myWebClient = new System.Net.WebClient();
WebProxy myProxy = new WebProxy();
myProxy.IsBypassed(new Uri(url));
myWebClient.Proxy = myProxy;
response = myWebClient.DownloadString(url);
8)
using var client = new HttpClient();
var content = await client.GetStringAsync(url);
9)
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
I want a faster way to do this in C#.
Any information or help you can provide is much appreciated.

I know that this is dated, but I think I found the cause: I've encountered this at other sites. If you look at the response cookies, you will find one named ak_bmsc. That cookie shows that the site is running the Akamai Bot Manager. It offers bot protection, thus blocks requests that 'look' suspicious.
In order to get a quick response from the host, you need the right request settings. In this case:
Headers:
Host: (their host data) www.faa.gov
Accept: (something like:) */*
Cookies:
AkamaiEdge = true
example:
class Program
{
private static readonly HttpClient _client = new HttpClient();
private static readonly string _url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";
static async Task Main(string[] args)
{
var sw = Stopwatch.StartNew();
using (var request = new HttpRequestMessage(HttpMethod.Get,_url))
{
request.Headers.Add("Host", "www.faa.gov");
request.Headers.Add("Accept", "*/*");
request.Headers.Add("Cookie", "AkamaiEdge=true");
Console.WriteLine(await _client.SendAsync(request));
}
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
}
Takes 896 ms for me.
by the way, you shouldn't put HttpClient in a using block. I know it's disposable, but it's not designed to be disposed.

This Question has stumped everyone I have asked. I have found a solution that I am going to stick with.
This solution does what I need it to do in 0.5 seconds on average. This will only work for windows from what I can tell. If the user does not have "CURL" I revert and go to the old way that takes 10 seconds to get what I need.
The solution creates a batch file in a temporary directory, calls that batch file to "CURL" the website, then output the result of CURL to a .txt file in the temp directory.
private static void CreateBatchFile()
{
string filePath = $"{tempPath}\\tempBat.bat";
string writeMe = "cd \"%temp%\\ProgramTempDir\"\n" +
"curl \"https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/\">FAA_NASR.txt";
File.WriteAllText(filePath, writeMe);
}
private static void ExecuteCommand()
{
int ExitCode;
ProcessStartInfo ProcessInfo;
Process Process;
ProcessInfo = new ProcessStartInfo("cmd.exe", "/c " + $"{tempPath}\\tempBat.bat");
ProcessInfo.CreateNoWindow = true;
ProcessInfo.UseShellExecute = false;
Process = Process.Start(ProcessInfo);
Process.WaitForExit();
ExitCode = Process.ExitCode;
Process.Close();
}
private static void GetResponse()
{
string response;
string url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";
CreateBatchFile();
ExecuteCommand();
if (File.Exists($"{tempPath}\\FAA_NASR.txt") && File.ReadAllText($"{tempPath}\\FAA_NASR.txt").Length > 10)
{
response = File.ReadAllText($"{tempPath}\\FAA_NASR.txt");
}
else
{
// If we get here the user does not have Curl, OR Curl returned a file that is not longer than 10 Characters.
using (var client = new System.Net.WebClient())
{
client.Proxy = null;
response = client.DownloadString(url);
}
}
}

Related

How to download file from API using Post Method

I have an API using POST Method.From this API I can download the file via Postmen tool.But I would like to know how to download file from C# Code.I have tried below code but POST Method is not allowed to download the file.
Code:-
using (var client = new WebClient())
{
client.Headers.Add("X-Cleartax-Auth-Token", ConfigurationManager.AppSettings["auth-token"]);
client.Headers[HttpRequestHeader.ContentType] = "application/json";
string url = ConfigurationManager.AppSettings["host"] + ConfigurationManager.AppSettings["taxable_entities"] + "/ewaybill/download?print_type=detailed";
TransId Id = new TransId()
{
id = TblHeader.Rows[0]["id"].ToString()
};
List<string> ids = new List<string>();
ids.Add(TblHeader.Rows[0]["id"].ToString());
string DATA = JsonConvert.SerializeObject(ids, Newtonsoft.Json.Formatting.Indented);
string res = client.UploadString(url, "POST",DATA);
client.DownloadFile(url, ConfigurationManager.AppSettings["InvoicePath"].ToString() + CboGatePassNo.EditValue.ToString().Replace("/", "-") + ".pdf");
}
Postmen Tool:-
URL : https://ewbbackend-preprodpub-http.internal.cleartax.co/gst/v0.1/taxable_entities/1c74ddd2-6383-4f4b-a7a5-007ddd08f9ea/ewaybill/download?print_type=detailed
Header :-
Content-Type : application/json
X-Cleartax-Auth-Token :b1f57327-96db-4829-97cf-2f3a59a3a548
Body :-
[
"GLD24449"
]
using (WebClient client = new WebClient())
{
client.Headers.Add("X-Cleartax-Auth-Token", ConfigurationManager.AppSettings["auth-token"]);
client.Headers[HttpRequestHeader.ContentType] = "application/json";
string url = ConfigurationManager.AppSettings["host"] + ConfigurationManager.AppSettings["taxable_entities"] + "/ewaybill/download?print_type=detailed";
client.Encoding = Encoding.UTF8;
//var data = "[\"GLD24449\"]";
var data = UTF8Encoding.UTF8.GetBytes(TblHeader.Rows[0]["id"].ToString());
byte[] r = client.UploadData(url, data);
using (var stream = System.IO.File.Create("FilePath"))
{
stream.Write(r,0,r.length);
}
}
Try this. Remember to change the filepath. Since the data you posted is not valid
json. So, I decide to post data this way.
I think it's straight forward, but instead of using WebClient, you can use HttpClient, it's better.
here is the answer HTTP client for downloading -> Download file with WebClient or HttpClient?
comparison between the HTTP client and web client-> Deciding between HttpClient and WebClient
Example Using WebClient
public static void Main(string[] args)
{
string path = #"download.pdf";
// Delete the file if it exists.
if (File.Exists(path))
{
File.Delete(path);
}
var uri = new Uri("https://ewbbackend-preprodpub-http.internal.cleartax.co/gst/v0.1/taxable_entities/1c74ddd2-6383-4f4b-a7a5-007ddd08f9ea/ewaybill/download?print_type=detailed");
WebClient client = new WebClient();
client.Headers[HttpRequestHeader.ContentType] = "application/json";
client.Headers.Add("X-Cleartax-Auth-Token", "b1f57327-96db-4829-97cf-2f3a59a3a548");
client.Encoding = Encoding.UTF8;
var data = UTF8Encoding.UTF8.GetBytes("[\"GLD24449\"]");
byte[] r = client.UploadData(uri, data);
using (var stream = System.IO.File.Create(path))
{
stream.Write(r, 0, r.Length);
}
}
Here is the sample code, don't forget to change the path.
public class Program
{
public static async Task Main(string[] args)
{
string path = #"download.pdf";
// Delete the file if it exists.
if (File.Exists(path))
{
File.Delete(path);
}
var uri = new Uri("https://ewbbackend-preprodpub-http.internal.cleartax.co/gst/v0.1/taxable_entities/1c74ddd2-6383-4f4b-a7a5-007ddd08f9ea/ewaybill/download?print_type=detailed");
HttpClient client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Post, uri)
{
Content = new StringContent("[\"GLD24449\"]", Encoding.UTF8, "application/json")
};
request.Headers.Add("X-Cleartax-Auth-Token", "b1f57327-96db-4829-97cf-2f3a59a3a548");
var response = await client.SendAsync(request);
if (response.IsSuccessStatusCode)
{
using (FileStream fs = File.Create(path))
{
await response.Content.CopyToAsync(fs);
}
}
else
{
}
}

How do I prevent httpwebrequest opening a new tcp connection for each PUT request?

Whenever I have to PUT a json string to a server, I launch a new thread which has this code inside a class. It works fine, but the thing is that a TCP connection is opened for each request. When I checked the ServicePoint hashcode, its the same for each request.
When I looked in TCPView, I cannot find those connections - I think its because its opened and closed within ~50ms.
So, 2 questions -
Is it an issue if I leave it like this? A new request will be raised every second from the client.
How do I reuse the same TCP connection? What if I set ServicePoint.KeepAlive to true?
public void SendRequest()
{
string sOutput="";
try
{
HttpWebRequest myWebRequest = (HttpWebRequest)WebRequest.Create(_uri);
myWebRequest.Timeout = Timeout;
myWebRequest.ReadWriteTimeout = Timeout;
myWebRequest.ContentType = "application/json";
myWebRequest.Method = "PUT";
myWebRequest.Proxy = WebRequest.GetSystemWebProxy();
ServicePointManager.CheckCertificateRevocationList = true;
using (StreamWriter myStreamWriter = new StreamWriter(myWebRequest.GetRequestStream()))
{
myStreamWriter.Write(_json);
}
using (HttpWebResponse myWebResponse = (HttpWebResponse)myWebRequest.GetResponse())
{
using (StreamReader myStreamReader = new StreamReader(myWebResponse.GetResponseStream()))
{
sOutput = myStreamReader.ReadToEnd();
sOutput = sOutput.Length == 0 ? myWebResponse.StatusDescription : sOutput;
ServicePoint currentServicePoint = myWebRequest.ServicePoint;
sOutput = currentServicePoint.GetHashCode().ToString();
currentServicePoint.ConnectionLimit = 5;
}
}
}
catch (Exception Ex)
{
sOutput = Ex.Message;
}
finally
{
callback?.Invoke(sOutput);
}
}
And here is how I launch the thread -
HTTPClass hTTPClass = new HTTPClass(cuURI, json, 5000, new MyCallback(ResultCallBack));
Thread t = new Thread(new ThreadStart(hTTPClass.SendRequest));
t.Start();
Here is the code after switching to HttpClient -
static HttpClient client = new HttpClient();
public async Task Write()
{
await WriteAsync(cuURI, json);
}
private async Task WriteAsync(Uri uri, string json)
{
StringContent content = new StringContent(json,Encoding.UTF8,"application/json");
await client.PutAsync(uri, content);
}
Here is the wireshark trace screenshot which shows a new connection for every request.
The client is setting the FIN flag on its own, and the server is not sending a FIN from its side. What is happening is that I see a lot of connections in the TIME_WAIT state on the server side.

Bot work in local but not in azure

I'm working on a chat bot that help users of a SharePoint on premise network to upload a file. The bot works on local but returns code 500 when tested in Azure.
I'm using Csom library to navigate in the site tree and the SharePoint _api to get all site collections. I have done some test and I don't think that is the Csom that causes this bug, but rather it is the NetworkCredential that doesn't work in Azure.
So can I use credential in Azure ?
I know that the problem comes from this function
public void GetAllSiteCollections(string url)
{
HttpWebRequest endpointRequest = (HttpWebRequest)HttpWebRequest.Create(url + "/_api/search/query?querytext='contentclass:sts_site'&trimduplicates=false&rowlimit=100");
endpointRequest.Method = "GET";
endpointRequest.Accept = "application/json;odata=verbose";
NetworkCredential cred = new NetworkCredential(Login, Mdp, DomaineUser);
endpointRequest.Credentials = cred;
HttpWebResponse endpointResponse = (HttpWebResponse)endpointRequest.GetResponse();
WebResponse webResponse = endpointRequest.GetResponse();
Stream webStream = webResponse.GetResponseStream();
StreamReader responseReader = new StreamReader(webStream);
string response = responseReader.ReadToEnd();
JObject jobj = JObject.Parse(response);
for (int ind = 0; ind < jobj["d"]["query"]["PrimaryQueryResult"]["RelevantResults"]["Table"]["Rows"]["results"].Count(); ind++)
{
string urlCollection = jobj["d"]["query"]["PrimaryQueryResult"]["RelevantResults"]["Table"]["Rows"]["results"][ind]["Cells"]["results"][6]["Value"].ToString();
string nomCollection = jobj["d"]["query"]["PrimaryQueryResult"]["RelevantResults"]["Table"]["Rows"]["results"][ind]["Cells"]["results"][3]["Value"].ToString();
if (urlCollection.Contains("myLocalDomain/sites/") == true)
{
string[] split = urlCollection.Split('/');
ClientCtx = new ClientContext(Domaine + "/sites/" + split[4]);
using (ClientCtx = new ClientContext(ClientCtx.Url))
{
ClientCtx.Credentials = new NetworkCredential(Login, Mdp, DomaineUser);
Web rootWeb = ClientCtx.Site.RootWeb;
ClientCtx.Load(rootWeb);
BasePermissions bp = new BasePermissions();
bp.Set(PermissionKind.AddListItems);
ClientResult<bool> viewListItems = rootWeb.DoesUserHavePermissions(bp);
ClientCtx.ExecuteQuery();
if (viewListItems.Value)
{
ListDesSiteCollections.Add(nomCollection, split[4]);
}
}
}
}
responseReader.Close();
}
When I check the logs at http://botName.azurewebsites.net/api/messages I get the response "The requested resource does not support http method 'GET'"

JSON format is being returned with missing data using C# (HTTPWEBREQUEST)

I'm working with JSON and C# ( HttpWebRequest ). Basically I have application to download a JSON from and API REST, but the problem is when I download it, the JSON comes missing some data, it seems that is cutting some data, with wrong structure. If I use a software which does the same thing that I'm developing, this problem doesn't happen. I'm sure that is something with my code, if I'm missing something. Here is my code:
var httpWebRequest = (HttpWebRequest)WebRequest.Create("MyURL");
httpWebRequest.ContentType = "application/json";
httpWebRequest.Method = "GET";
string authInfo = "user" + ":" + "pass";
authInfo = Convert.ToBase64String(Encoding.Default.GetBytes(authInfo));
httpWebRequest.Headers["Authorization"] = "Basic " + authInfo;
// Create the HttpContent for the form to be posted.
var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
using (var sr = new StreamReader(httpResponse.GetResponseStream(), Encoding.UTF8))
{
StreamWriter sw = new StreamWriter(#"C:\test\Stores.txt");
sw.Write(sr.ReadToEnd());
}
You can try this.its works in my code.
public static async Task MethodName()
{
using (HttpClientHandler handler = new HttpClientHandler() { UseCookies = false })
{
using (HttpClient httpClient = new HttpClient(handler))
{
httpClient.DefaultRequestHeaders.Authorization = Program.getAuthenticationHeader();
string filterQuery = Program.getURI().ToString();
using (HttpResponseMessage httpResponse = await httpClient.GetAsync(filterQuery).ConfigureAwait(false))
{
var streamContent = await httpResponse.Content.ReadAsStreamAsync();
FileStream fs = new FileStream("C:\test\Stores.Json", FileMode.Create);
streamContent.CopyTo(fs);
streamContent.Close();
fs.Close();
}
}
}
}
This can be an issue with your Http request (GET).
Step 1 - If you have a working software with the API, use Fiddler to analyse what is the http GET request it sends. You need to check the header info as well.
Step 2 - Compare the Http request with the HttpRequest you have created. There can be missing parameters etc.

one connection multiple requests Rest Service

How can one issue multiple requests using the same service?
I have created a static httpWebRequest:
private static HttpWebRequest request;
//private static StreamReader streamReader;
//private StreamWriter streamWriter;
public CentralRestService2(LogFile log)
{
if (request == null)
{
request = (HttpWebRequest)WebRequest.Create("service address");
request.Method = "POST";
request.Accept = "*/*";
request.ContentType = "application/json";
request.Headers["Authorization"] = "username and password";
request.KeepAlive = true;
}
using (var streamWriter = new StreamWriter(request.GetRequestStream()))
{
string body = new JavaScriptSerializer().Serialize(emailRequest);
streamWriter.Write(body);
}
using (var streamReader = new StreamReader(response.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
}
response.Close();
}
I get error messages including that the stream cannot be written to. The connection has unexpectedly closed. I can't seem to find the answer anywhere!
Don't create static HttpWebRequest.
Request does not represent a connection, but it uses ServicePointManager which manages underlying TCP connections for you.
Just create a new instance of HttpWebRequest every time you need to send a request.
UPDATE:
If you want to create a client for your service, you should use HttpClient instead:
// add reference to System.Net.Http and System.Net.Http.Formatting
using System.Net.Http;
// with a handler you can configure the client and its behavior
var handler = new HttpClientHandler();
var httpClient = new HttpClient(handler);
httpClient.BaseAddress = new Uri("service address");
httpClient.DefaultRequestHeaders.Add("Authorization", "username and password");
var response = await httpClient.PostAsJsonAsync(httpClient.BaseAddress, emailRequest);

Categories