Wait for page load before downloading with WebClient - c#

I have several URLs stored in a text file, each of them is a link leading to a Facebook emoji, like https://www.facebook.com/images/emoji.php/v5/u75/1/16/1f618.png
I'm trying to download these images and store them on my disk. I'm using WebClient with DownloadFileAsync, something like
using (var client = new WebClient())
{
client.DownloadFileAsync(imgURL, imgName);
}
My problem is even if the amount of URLs is small, say 10, some of the images are downloaded ok, some give me a file corrupt error. So I thought I needed to wait for files to be downloaded till the end and added DownloadFileCompleted event, like this
using System;
using System.ComponentModel;
using System.Collections.Generic;
using System.Linq;
using System.Net;
class Program
{
static Queue<string> q;
static void Main(string[] args)
{
q = new Queue<string>(new[] {
"https://www.facebook.com/images/emoji.php/v5/u51/1/16/1f603.png",
"https://www.facebook.com/images/emoji.php/v5/ud2/1/16/1f604.png",
"https://www.facebook.com/images/emoji.php/v5/ud4/1/16/1f606.png",
"https://www.facebook.com/images/emoji.php/v5/u57/1/16/1f609.png",
"https://www.facebook.com/images/emoji.php/v5/u7f/1/16/1f60a.png",
"https://www.facebook.com/images/emoji.php/v5/ufb/1/16/263a.png",
"https://www.facebook.com/images/emoji.php/v5/u81/1/16/1f60c.png",
"https://www.facebook.com/images/emoji.php/v5/u2/1/16/1f60d.png",
"https://www.facebook.com/images/emoji.php/v5/u75/1/16/1f618.png",
"https://www.facebook.com/images/emoji.php/v5/u1e/1/16/1f61a.png"
});
DownloadItem();
Console.WriteLine("Hit return after 'finished' has appeared...");
Console.ReadLine();
}
private static void DownloadItem()
{
if (q.Any())
{
var uri = new Uri(q.Dequeue());
var file = uri.Segments.Last();
var webClient = new WebClient();
webClient.DownloadFileCompleted += DownloadFileCompleted;
webClient.DownloadFileAsync(uri, file);
}
else
{
Console.WriteLine("finished");
}
}
private static void DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
DownloadItem();
}
}
It didn't help and I decided to look closer into the files that are corrupted.
It appeared that the files that were corrupted were not actually image files, but HTML pages, which either had some redirection JavaScript code to an image or were full HTML pages saying that my browser was not supported.
So my question is, how do I actually wait that an image file has been fully loaded and is ready to be downloaded?
EDIT I have also tried to remove the using statement, but that did not help either.

Nothing's being corrupted by your download - it's simply Facebook deciding (sometimes, which is odd) that it doesn't want to serve the image to your client.
It looks like it's the lack of a user agent that causes the problem. All you need to do is specify the user agent, and that looks like it fixes it:
webClient.Headers.Add(HttpRequestHeader.UserAgent,
"Mozilla/5.0 (compatible; http://example.org/)");

Related

Wikipedia user agent problem when downloading images

I am trying to download about 250 images from wikipedia with a c# .net console application.
After downloading 3 I get this error.
System.Net.WebException: 'The remote server returned an error: (403) Forbidden. Please comply with the User-Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy. '
I have read their User-Agent_policy page and added a user agent that complies with what they say. (to the best of my ability, I'm not a web-dev)
They say, make it descriptive, include the word bot if its a bot, include contact details in parentheses, all of which I have done.
I am also waiting 5 seconds in between each image.. I just really really dont wanna download them by hand in my browser.
static void DownloadImages()
{
var files = Directory.GetFiles(#"C:\projects\CarnivoraData", "*", SearchOption.AllDirectories);
var client = new WebClient();
client.Headers.Add("User-Agent", "bot by <My Name> (<My email address>) I am downloading an image of each carnivoran once (less than 300 images) for educational purposes");
foreach (var path in files)
{
//Console.WriteLine(path);
//Console.WriteLine(File.ReadAllText(path));
AnimalData data = JsonSerializer.Deserialize<AnimalData>(File.ReadAllText(path));
client.DownloadFile("https:" + data.Imageurl,#"C:\projects\CarnivoraImages\"+ data.Name +Path.GetExtension(data.Imageurl));
System.Threading.Thread.Sleep(5000);
}
}
Any suggestions?
Ok I got this to work. I think they key was using httpclient to download the files instead of webclient, and using DefaultRequestHeaders.UserAgent.ParseAdd
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("<My Name>/1.0 (<My Email>) bot");
I didnt even bother waiting between images, downloaded them all in about a minute.
Also as a bonus, heres how to download a file using httpclient (its a lot messier than webclient!)
static async Task GetFile(HttpClient httpClient,string filepath, string url)
{
using (var stream = await httpClient.GetStreamAsync(new Uri(url)))
{
using (var fileStream = new FileStream(filepath, FileMode.CreateNew))
{
await stream.CopyToAsync(fileStream);
}
}
}

Cannot access file after downloading it with Webclient

I have downloaded a zip file using this code from a web server:
client.DownloadFileAsync(url, savePath);
Then, in another method, during the same session I try and extract the file with this method:
ZipFile.ExtractToDirectory(zipPath, extractDir);
This throws the error:
System.IO.IOException: 'The process cannot access the file
'C:\ProgramData\ZipFile.zip' because it is being used by another process.'
If I restart the program then unzip the file (without redownloading it) it extracts without any problem.
This doesn't make much sense to me because the Webclient client is located in another method and hence should be destroyed...
There is nothing else accessing that file other than the 2 lines of code provided above.
Is there any way to free the file?
You need to extract the files when the download completed, to do this, you need to use DownloadFileCompleted event of webclient
private void DownloadPackageFromServer(string downloadLink)
{
ClearTempFolder();
var address = new Uri(Constants.BaseUrl + downloadLink);
using (var wc = new WebClient())
{
_downloadLink = downloadLink;
wc.DownloadFileCompleted += Wc_DownloadFileCompleted;
wc.DownloadFileAsync(address, GetLocalFilePath(downloadLink));
wc.Dispose();
}
}
private void Wc_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
UnZipDownloadedPackage(_downloadLink);
}
private void UnZipDownloadedPackage(string downloadLink)
{
var fileName = GetLocalFilePath(downloadLink);
ZipFile.ExtractToDirectory(fileName, Constants.TemporaryMusicFolderPath);
}

WebClient DownloadFile not working when download docx file with canvas from VSTS

When I download .docx file that contains 'Drawing Canvas' from VSTS with WebClient then downloaded .docx document is broken.
When I said broken, I mean that we cannot manually open Word document and we have next error message:” The file is corrupt and cannot be opened”.
This is only happening if word file contains canvas and if is downloaded from VSTS ?!
If I download from TFS2017 or if .docx file does not contains Canvas than everything is working.
Firstly, I was thinking that issue is related to Encoding, so I tested all encodings that I’ve found inside of WebClient.
Making any change related to Encoding didn’t resolve current issue.
Also, I’ve tried to change implementation in a way that we don’t use method DownloadFile and instead of that, I downloaded array of bytes and based on bytes generated Word document.
With that change in implementation, we’ve the same issue as before.
This is code example:
static void Main(string[] args)
{
var tfsUri = new Uri("https://.../");
var projectCollection = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(tfsUri);
var workItemStore = projectCollection.GetService<WorkItemStore>();
var workItem = workItemStore.GetWorkItem(2);
projectCollection.EnsureAuthenticated();
var credentials = workItemStore.TeamProjectCollection.Credentials;
var fileName = "D:\\test_folder\\files\\System.Description.docx";
var uri = workItem.Attachments[0].Uri;
using (var request = new WebClient() { Credentials = credentials })
{
request.DownloadFile(uri, fileName);
}
}
Thank you for your help if you have any idea.
This issue was not caused by the canvas in the docx file. The file should be corrupted even there is only text in your docx file if you download it from VSTS with your code.
The issue here is that the authentication to VSTS is different with TFS, so the WebClient download file request is actually getting 401 when download the file since it don't have required permission to download the file. Update your code to following and then try again:
using System;
using Microsoft.TeamFoundation.Client;
using Microsoft.TeamFoundation.WorkItemTracking.Client;
using Microsoft.TeamFoundation.WorkItemTracking.Proxy;
using System.IO;
namespace GetAdmin
{
class Program
{
static void Main(string[] args)
{
TfsTeamProjectCollection ttpc = new TfsTeamProjectCollection(new Uri("https://xxx.visualstudio.com/"));
ttpc.EnsureAuthenticated();
WorkItemStore wistore = ttpc.GetService<WorkItemStore>();
WorkItem wi = wistore.GetWorkItem(111);
WorkItemServer wiserver = ttpc.GetService<WorkItemServer>();
string tmppath = wiserver.DownloadFile(wi.Attachments[0].Id);
string filename = #"D:\test\test.docx";
File.Copy(tmppath,filename);
}
}
}

WebClient using file (file in use error)

I'm new to WinForms/C#/VB.NET and all and am trying to put together a simple application which downloads an MP3 file and edits its ID3 tags. This is what I've come up with so far :
Uri link = new System.Uri("URL");
wc.DownloadFileAsync(link, #"C:/music.mp3");
handle.WaitOne();
var file = TagLib.File.Create(#"C:/music.mp3");
file.Tag.Title = "Title";
file.Save();
The top section downloads the file with a pre-defined WebClient, but when I try to open the file in the first line of the second half, I run into this error The process cannot access the file 'C:\music.mp3' because it is being used by another process. which I'm guessing is due to the WebClient.
Any ideas on how to fix this? Thanks.
If using WebClient.DownloadFileAsync you should subscribe to the DownloadFileCompleted event and perform the remainder of your processing from that event.
Quick and dirty:
WebClient wc = new WebClient();
wc.DownloadfileCompleted += completedHandler;
Uri link = new System.Uri("URL");
wc.DownloadFileAsync(link, #"C:/music.mp3");
//handle.WaitOne(); // dunno what this is doing in here.
function completedHandler(Object sender, AsyncCompletedEventArgs e) {
var file = TagLib.File.Create(#"C:/music.mp3");
file.Tag.Title = "Title";
file.Save();
}

Wait until file is downloaded from URL via webClient

I have struggle with downloading few MB excel file from URL and then work with it. Im using VS2010 so i cant use await keyword.
My code follows:
using (WebClient webClient = new WebClient())
{
// setting Windows Authentication
webClient.UseDefaultCredentials = true;
// event fired ExcelToCsv after file is downloaded
webClient.DownloadFileCompleted += (sender, e) => ExcelToCsv(fileName);
// start download
webClient.DownloadFileAsync(new Uri("http://serverx/something/Export.ashx"), exportPath);
}
The line in ExcelToCsv() method
using (FileStream stream = new FileStream(filePath, FileMode.Open))
Throws me an error:
System.IO.IOException: The process cannot access the file because it
is being used by another process.
I tried webClient.DownloadFile() only without an event but it throws same error. Same error is throwed if i do not dispose too. What can i do ?
Temporary workaround may be Sleep() method but its not bullet proof.
Thank you
EDIT:
I tried second approach with standard handling but i have mistake in the code
using (WebClient webClient = new WebClient())
{
// nastaveni ze webClient ma pouzit Windows Authentication
webClient.UseDefaultCredentials = true;
// <--- I HAVE CONVERT ASYNC ERROR IN THIS LINE
webClient.DownloadFileCompleted += new DownloadDataCompletedEventHandler(HandleDownloadDataCompleted);
// spusteni stahovani
webClient.DownloadFile(new Uri("http://czprga2001/Logio_ZelenyKyblik/Export.ashx"), TempDirectory + PSFileName);
}
public delegate void DownloadDataCompletedEventHandler(string fileName);
public event DownloadDataCompletedEventHandler DownloadDataCompleted;
static void HandleDownloadDataCompleted(string fileName)
{
ExcelToCsv(fileName);
}
EDIT: approach 3
I tried this code
while (true)
{
if (isFileLocked(downloadedFile))
{
System.Threading.Thread.Sleep(5000); //wait 5s
ExcelToCsv(fileName);
break;
}
}
and it seems that it is never accessible :/ I dont get it.
Try to use DownloadFile instead of DownloadFileAsync, as you do in Edit 1, like this:
string filename=Path.Combine(TempDirectory, PSFileName);
using (WebClient webClient = new WebClient())
{
// nastaveni ze webClient ma pouzit Windows Authentication
webClient.UseDefaultCredentials = true;
// spusteni stahovani
webClient.DownloadFile(new Uri("http://czprga2001/Logio_ZelenyKyblik/Export.ashx"), filename);
}
ExcelToCsv(filename); //No need to create the event handler if it is not async
From your example it seems that you do not need asynchronous download, so use synchronous download and avoid possible related problems like here.
Also use Path.Combine to combine parts of a path like folder and filename.
There is also a chance that it is locked by something else, use Sysinternals Process Explorer's Find DLL or Handle function to check it.
Use local disk to store downloaded file to prevent problems with network.

Categories