Download files from Google Books - c#

I'm writing very simple application. It is supposed to download files from internet. I have URLs and names for files to save in tables. But my code doesn't work.
for (int i = 1; i < links.Length; i++)
{
Uri uri = new Uri(links[i]);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(uri);
webRequest.Method = "GET";
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
StreamReader responseStreamReader = new StreamReader(responseStream);
String result = responseStreamReader.ReadToEnd();
StreamWriter w = new StreamWriter(savepath + names[i]);
w.Write(result);
w.Close();
break;
}
example url:
http://books.google.pl/books?id=yOz1ePt39WQC&pg=PA2&img=1&zoom=3&hl=pl&sig=ACfU3U0MDQtXGU_3YVqGvcsDiWLLcKh0KA&w=800&gbd=1
example name:
002.png
Files are to be saved as PNG image but instead I get something that begins with
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Second qestion. How can I detect HTTP 404 error when trying to download?
EDIT:
My bad. my links were incorrect. After replacing & with & they are correct.
Example link (correctted):
http://books.google.pl/books?id=yOz1ePt39WQC&pg=PA2&img=1&zoom=3&hl=pl&sig=ACfU3U0MDQtXGU_3YVqGvcsDiWLLcKh0KA&w=800&gbd=1
Despite of that I can't still download PNGs correctly.
They are not opening. But at least they are not HTML pages.
I'm thinking that trying to save them as a string is not good idea. But I don't know how else I could do that. Maybe using byte[] or something?

Have you tried WebClient.DownloadFile ?
string url = "http://books.google.pl/books?id=yOz1ePt39WQC&pg=PA2&img=1&zoom=3&hl=pl&sig=ACfU3U0MDQtXGU_3YVqGvcsDiWLLcKh0KA&w=800&gbd=1";
string file = "002.png";
WebClient wc = new WebClient();
wc.DownloadFile(url, file);
will save the image in the application directory as 002.png.

Related

how to open webpage in c# without using webbrowser class

I want to know how to open webpage in c# without using webbrowser class. First time on c sharp. I tried below but that did not work. Can anyone help.
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create("http://google.com");
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
If you try to simply open a website without doing something else with it, you could do something like that to open the defined default browser:
string url = "http://google.com";
System.Diagnostics.Process.Start(url);

Getting the HTML code of a webpage

I am trying to get the HTML code of a webpage using it's url. I have written the following code, it works, but comparing the resulting string it doesn't match the code I see when I use google chrome's inspect. I am not an HTML gru, but it seems to be different.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://fantasy.premierleague.com/a/leagues/standings/517292/classic");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding(response.CharacterSet));
string PageScript = stream.ReadToEnd();
The resulting script is as follows: https://ideone.com/DXzfKy
I am using those two lines to set the security protocol
ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
If someone can tell me what am I looking at and what might be wrong, I will be grateful.
All you need to do is to create an instance of a WebClient and using that you can read the data from URI, than convert it into StreamReader and finally in Plain Text Format.
WebClient client = new WebClient();
Stream dataFromPage = client.OpenRead(new Uri("https://ideone.com/DXzfKy"));
StreamReader reader = new StreamReader(dataFromPage);
string htmlContent = reader.ReadToEnd();

Some error when get data from HTTP request with post

I follow the instruction at: HTTP request with post to get the audio file from site: http://www.tudienabc.com/phat-am-tieng-nhat (This site allow us to input the english or japanese word/phrase/ sentence and generate the audio file, look like "/pronunciations/audio?file_name=1431134309.002.mp3&file_type=mp3" at line 129 of HTML code after postback).
However, the audio file which i get from my own application is not same with the one generated from this website. The audio file (mp3) generated from this website can play at www.tudienabc.com/pronunciations/ (such as: www.tudienabc.com/pronunciations/audio?file_name=1431141268.9947.mp3&file_type=mp3), but the audio file generated from my application can not play (such as: www.tudienabc.com/pronunciations/audio?file_name=1431141475.4908.mp3&file_type=mp3).
So, what wrong? And how to get the exact audio file?
Here is my code:
var request = (HttpWebRequest)WebRequest.Create("http://www.tudienabc.com/phat-am-tieng-nhat");
var postData = "_method=POST&data[Pronun][text]=hello&data[Pronun][type]=3";
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
int m = responseString.IndexOf("pronunciations/audio?file_name=")+"pronunciations/audio?file_name=".Length;
int n = responseString.IndexOf("&file_type=mp3");
string filename = responseString.Substring(m, n - m);
return filename;
Thank you,
Their website processes the audio using ECMAScript
<script>
var wait = new waitGenerateAudio(
'#progress_audio_placeholder',
'/pronunciations/checkFinish/1431151184.739',
'aGVsbG8gZnlyeWU=',
'/pronunciations/audio?file_name=1431151184.739.mp3&file_type=mp3',
'T?o file l?i'
);
</script>
You will need to be able to process the JavaScript for the audio file to be created.
Checkout
C# httpwebrequest and javascript
or
WebClient runs javascript
For utilizing a headless browser.
I suggest looking into a more versatile library for text to audio.
https://gist.github.com/alotaiba/1728771

How to get PDF from http request/response stream

I have a url like this "https://site.com/cgi-bin/somescript.pl?file=12345.pdf&type=application/pdf". When i go to this url it will dispaly an pdf in my browser in an iframe. Is it possible to save this pdf? I am thinking of getting the http response stream and dump it into a file. Please advice. Thanks.
Something like this would work
const string FILE_PATH = "C:\\foo.pdf";
const string DOWNLOADER_URI = "https://site.com/cgi-bin/somescript.pl?file=12345.pdf&type=application/pdf";
using (var writeStream = File.OpenWrite(FILE_PATH))
{
var httpRequest = WebRequest.Create(DOWNLOADER_URI) as HttpWebRequest;
var httpResponse = httpRequest.GetResponse();
httpResponse.GetResponseStream().CopyTo(writeStream);
writeStream.Close();
}

Problem pulling data from website in .NET and C#

I have written a web scraping program to go to a list of pages and write all the html to a file. The problem is that when I pull a block of text some of the characters get written as '�'. How do I pull those characters into my text file? Here is my code:
string baseUri = String.Format("http://www.rogersmushrooms.com/gallery/loadimage.asp?did={0}&blockName={1}", id.ToString(), name.Trim());
// our third request is for the actual webpage after the login.
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(baseUri);
request.Method = "GET";
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)";
//get the response object, so that we may get the session cookie.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
// and read the response
string page = reader.ReadToEnd();
StreamWriter SW;
string filename = string.Format("{0}.txt", id.ToString());
SW = File.AppendText("C:\\Share\\" + filename);
SW.Write(page);
reader.Close();
response.Close();
You're saving a page named loadimage to a text file. Are you sure that's really all text?
Either way, you can save yourself a lot of code by using System.Net.WebClient.DownloadFile().
You need to specify your encoding in this line:
StreamReader reader = new StreamReader(response.GetResponseStream());
and
File.AppendText("C:\\Share\\" + filename); uses UTF-8
Specify Unicode encoding, like so:
New StreamReader(response.GetResponseStream(), Text.Encoding.UTF8)
..same for the StreamWriter

Categories