Read webpage - avoid diamond / question mark for non-standard chars

Read webpage - avoid diamond / question mark for non-standard chars - c#

I'm trying to read a webpage that contains a registered trademark symbol in the content, i.e. ®. However, when I use quickwatch and look at sb in the below example, I see a diamond with a question mark instead of ®. The same issue occurs if I serialize sb and display it in another webpage via javascript. Is this just how this char will appear in my quickwatch window, or am I reading/decoding the page incorrectly? The code is as follows:
const int bufSize = 4096;
const int maxBytesToGet = 5000000;
byte[] buf = new byte[bufSize];
StringBuilder sb = new StringBuilder(bufSize);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
while ((bytesToGet = responseStream.Read(buf, 0, buf.Length)) != 0)
{
sb.Append(Encoding.UTF8.GetString(buf, 0, bytesToGet));
if (sb.Length > maxBytesToGet) break;
}
}
}

You're assuming the response is UTF8. You need to look at the response headers to see what the encoding actually is. It's also easier to use a StreamReader instead of Encoding.GetString.
string responseText;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
var encoding = Encoding.GetEncoding(response.CharacterSet);
using(var reader = new StreamReader(responseStream, encoding))
{
responseText = reader.ReadToEnd();
}
}
}

Related

I need to append multiple filestreams to the same pdf file

I have a function that converts a ZPL(Zebra Label) into a PDF format and saves the file. What I'm trying to do is instead of overwriting the file each time, I would like to append the filestream to the file (if it exists), write new (if not exists).
I've tried setting a new filestream with filemode.append, that did not seem to make a difference.
private static void SaveLabel(string label, string labelDir, string caseNumber)
{
var zpl = Encoding.UTF8.GetBytes(label);
var fileName = $#"{labelDir}\{caseNumber}.pdf";
// adjust print density (8dpm), label width (4 inches), label height (6 inches), and label index (0) as necessary
var request = (HttpWebRequest)WebRequest.Create("http://api.labelary.com/v1/printers/8dpmm/labels/4x6/0/");
request.Method = "POST";
request.Accept = "application/pdf"; // omit this line to get PNG images back
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = zpl.Length;
var requestStream = request.GetRequestStream();
requestStream.Write(zpl, 0, zpl.Length);
requestStream.Close();
try
{
var response = (HttpWebResponse)request.GetResponse();
var responseStream = response.GetResponseStream();
if (!File.Exists(fileName))
File.Create(fileName);
using (var fileStream = File.Open(fileName, FileMode.Append))
{
responseStream?.CopyTo(fileStream);
responseStream?.Close();
fileStream.Close();
}
}
catch (WebException e)
{
Console.WriteLine(#"Error: {0}", e.Status);
}
}

I first check to see if the file existed (meaning there was going to be more than one label for the shipment). If not, process as normal. If yes, then read that file into a new pdf file. Then read the contents of the current response stream into a 2nd new pdf file.
I then delete the destination file freeing the name for the new combined pdf. I then use the suggested link to PDFSharp and combine the pages and save as the original file name. This will enable a continuous appending of the file regardless of how many package labels are generated.
private static void SaveLabel(string label, string labelDir, string caseNumber)
{
var zpl = Encoding.UTF8.GetBytes(label);
var destFileName = $#"{labelDir}\{caseNumber}.pdf";
// adjust print density (8dpm), label width (4 inches), label height (6 inches), and label index (0) as necessary
var request = (HttpWebRequest)WebRequest.Create("http://api.labelary.com/v1/printers/8dpmm/labels/4x6/0/");
request.Method = "POST";
request.Accept = "application/pdf"; // omit this line to get PNG images back
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = zpl.Length;
var requestStream = request.GetRequestStream();
requestStream.Write(zpl, 0, zpl.Length);
requestStream.Close();
try
{
var response = (HttpWebResponse)request.GetResponse();
var responseStream = response.GetResponseStream();
if (File.Exists(destFileName))
{
var oldStream = File.OpenRead(destFileName);
var oldFileName = $#"{labelDir}\{caseNumber}-1.pdf";
using (var fileStream = File.Open(oldFileName, FileMode.Create))
{
oldStream.CopyTo(fileStream);
oldStream.Close();
fileStream.Close();
}
var newFileName = $#"{labelDir}\{caseNumber}-2.pdf";
using (var fileStream = File.Open(newFileName, FileMode.Create))
{
responseStream?.CopyTo(fileStream);
responseStream?.Close();
fileStream.Close();
}
File.Delete(destFileName);
using (var pdfOne = PdfReader.Open(oldFileName, PdfDocumentOpenMode.Import))
{
using (var pdfTwo = PdfReader.Open(newFileName, PdfDocumentOpenMode.Import))
{
using (var outPdf = new PdfDocument())
{
CopyPages(pdfOne, outPdf);
CopyPages(pdfTwo, outPdf);
outPdf.Save(destFileName);
}
}
}
File.Delete(oldFileName);
File.Delete(newFileName);
}
else
{
using (var fileStream = File.Open(destFileName, FileMode.Create))
{
responseStream?.CopyTo(fileStream);
responseStream?.Close();
fileStream.Close();
}
}
}
catch (WebException e)
{
Console.WriteLine(#"Error: {0}", e.Status);
}
}

Convert Code from .NetFramework 3.5 to Xamarin PCL

I have the following code working fine in a project, I can reference it in Android but not in UWP. This is why I want to convert the code into my PCL project.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
namespace NetScraperLibrary
{
public class NetScraper
{
public NetScraper()
{
}
public ScrapedPage GetPage(string uri)
{
string strWebPage = "";
// create request
System.Net.WebRequest objRequest = System.Net.HttpWebRequest.Create(uri);
// get response
System.Net.HttpWebResponse objResponse;
objResponse = (System.Net.HttpWebResponse)objRequest.GetResponse();
// get correct charset and encoding from the server's header
string Charset = objResponse.CharacterSet;
Encoding encoding = Encoding.GetEncoding(Charset);
// read response into memory stream
MemoryStream memoryStream;
using (Stream responseStream = objResponse.GetResponseStream())
{
memoryStream = new MemoryStream();
byte[] buffer = new byte[1024];
int byteCount;
do
{
byteCount = responseStream.Read(buffer, 0, buffer.Length);
memoryStream.Write(buffer, 0, byteCount);
} while (byteCount > 0);
}
// set stream position to beginning
memoryStream.Seek(0, SeekOrigin.Begin);
StreamReader sr = new StreamReader(memoryStream, encoding);
strWebPage = sr.ReadToEnd();
// Check real charset meta-tag in HTML
int CharsetStart = strWebPage.IndexOf("charset=");
if (CharsetStart > 0)
{
CharsetStart += 8;
int CharsetEnd = strWebPage.IndexOfAny(new[] { ' ', '\"', ';' }, CharsetStart);
string RealCharset =
strWebPage.Substring(CharsetStart, CharsetEnd - CharsetStart);
// real charset meta-tag in HTML differs from supplied server header???
if (RealCharset != Charset)
{
// get correct encoding
Encoding CorrectEncoding = Encoding.GetEncoding(RealCharset);
// reset stream position to beginning
memoryStream.Seek(0, SeekOrigin.Begin);
// reread response stream with the correct encoding
StreamReader sr2 = new StreamReader(memoryStream, CorrectEncoding);
strWebPage = sr2.ReadToEnd();
// Close and clean up the StreamReader
sr2.Close();
}
}
// dispose the first stream reader object
sr.Close();
ScrapedPage page = new ScrapedPage(uri, strWebPage);//sb.ToString());
return page;
}
}
}
The major issue, is that objResponse.CharacterSet does not exist, very weird?
Also the closing of streams sr2.Close(); sr.Close();. finally webresponse cannot be called synchronously anymore. Can anybody help with these missing functionalities for the conversion, since I have not been able to figure it out?

using http response how to save the pdf files

I've written following code to get the content from a web page and save to the system.
if the webpage is in html format i'm able to save it.
if the web page is in pdf format i'm unable to save it. After saving if i opend the file blank pages are coming.
I want to know How to save the pdf files from the response.
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(Url);
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
webContent = reader.ReadToEnd();
StreamWriter sw = new StreamWriter(FileName);
sw.WriteLine(webContent);
sw.Close();
Please help me ASAP.

StreamReader.ReadToEnd() returns a string. PDF files are binary, and contain data that is not string-friendly. You need to read it into a byte array, and write the byte array to disk. Even better, use a smaller byte array as a buffer and read in small chunks.
You can also simplify the whole thing by just using webclient:
using (var wc = new System.Net.WebClient())
{
wc.DownloadFile(Url, FileName);
}

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(Url);
WebResponse response = request.GetResponse();
using (Stream stream = response.GetResponseStream())
using (FileStream fs = new FileStream(FileName, FileMode.Create, FileAccess.Write, FileShare.None))
{
stream.BlockCopy(fs);
}
...
public static class StreamHelper
{
public static void Copy(Stream source, Stream target, int blockSize)
{
int read;
byte[] buffer = new byte[blockSize];
while ((read = source.Read(buffer, 0, blockSize)) > 0)
{
target.Write(buffer, 0, read);
}
}
public static void BlockCopy(this Stream source, Stream target, int blockSize = 65536)
{
Copy(source, target, blockSize);
}
}

Convert to Stream from a Url

I was trying to convert an Url to Stream but I am not sure whether I am right or wrong.
protected Stream GetStream(String gazouUrl)
{
Stream rtn = null;
HttpWebRequest aRequest = (HttpWebRequest)WebRequest.Create(gazouUrl);
HttpWebResponse aResponse = (HttpWebResponse)aRequest.GetResponse();
using (StreamReader sReader = new StreamReader(aResponse.GetResponseStream(), System.Text.Encoding.Default))
{
rtn = sReader.BaseStream;
}
return rtn;
}
Am I on the right track?

I ended up doing a smaller version and using WebClient instead the old Http Request code:
private static Stream GetStreamFromUrl(string url)
{
byte[] imageData = null;
using (var wc = new System.Net.WebClient())
imageData = wc.DownloadData(url);
return new MemoryStream(imageData);
}

You don't need to create a StreamReader there. Just return aResponse.GetResponseStream();. The caller of that method will also need to call Dispose on the stream when it's done.

The current answer is missing an example in how to use GetResponseStream()
Here is an example
// Creates an HttpWebRequest with the specified URL.
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
// Sends the HttpWebRequest and waits for the response.
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
// Gets the stream associated with the response.
Stream receiveStream = myHttpWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
// Pipes the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader( receiveStream, encode );
Console.WriteLine("\r\nResponse stream received.");
Char[] read = new Char[256];
// Reads 256 characters at a time.
int count = readStream.Read( read, 0, 256 );
Console.WriteLine("HTML...\r\n");
while (count > 0)
{
// Dumps the 256 characters on a string and displays the string to the console.
String str = new String(read, 0, count);
Console.Write(str);
count = readStream.Read(read, 0, 256);
}
Console.WriteLine("");
// Releases the resources of the response.
myHttpWebResponse.Close();
// Releases the resources of the Stream.
readStream.Close();
For more details see - https://learn.microsoft.com/en-us/dotnet/api/system.net.httpwebresponse.getresponsestream?view=net-5.0

How to use WebResponse to Download .wmv file

I'm using the following code to grab a wmv file through a WebResponse. I'm using a thread to call this function:
static void GetPage(object data)
{
// Cast the object to a ThreadInfo
ThreadInfo ti = (ThreadInfo)data;
// Request the URL
WebResponse wr = WebRequest.Create(ti.url).GetResponse();
// Display the value for the Content-Length header
Console.WriteLine(ti.url + ": " + wr.Headers["Content-Length"]);
string toBeSaved = #"C:\Users\Kevin\Downloads\TempFiles" + wr.ResponseUri.PathAndQuery;
StreamWriter streamWriter = new StreamWriter(toBeSaved);
MemoryStream m = new MemoryStream();
Stream receiveStream = wr.GetResponseStream();
using (StreamReader sr = new StreamReader(receiveStream))
{
while (sr.Peek() >= 0)
{
m.WriteByte((byte)sr.Read());
}
streamWriter.Write(sr.ReadToEnd());
sr.Close();
wr.Close();
}
streamWriter.Flush();
streamWriter.Close();
// streamReader.Close();
// Let the parent thread know the process is done
ti.are.Set();
wr.Close();
}
The file seems to download just fine, but Windows Media Viewer cannot open the file properly. Some silly error about not being able to support the file type.
What incredibly easy thing am I missing?

You just need to download it as binary instead of text. Here's a method that should do the trick for you.
public void DownloadFile(string url, string toLocalPath)
{
byte[] result = null;
byte[] buffer = new byte[4097];
WebRequest wr = WebRequest.Create(url);
WebResponse response = wr.GetResponse();
Stream responseStream = response.GetResponseStream;
MemoryStream memoryStream = new MemoryStream();
int count = 0;
do {
count = responseStream.Read(buffer, 0, buffer.Length);
memoryStream.Write(buffer, 0, count);
if (count == 0) {
break;
}
}
while (true);
result = memoryStream.ToArray;
FileStream fs = new FileStream(toLocalPath, FileMode.OpenOrCreate, FileAccess.ReadWrite);
fs.Write(result, 0, result.Length);
fs.Close();
memoryStream.Close();
responseStream.Close();
}

I do not understand why you are filling MemoryStream m one byte at a time, but then writing the sr to the file. At that point, I believe the sr is empty, and MemoryStream m is never used.
Below is some code I wrote to do a similar task. It gets a WebResponse in 32K chunks at a time, and dumps it directly to a file.
public void GetStream()
{
// ASSUME: String URL is set to a valid URL.
// ASSUME: String Storage is set to valid filename.
Stream response = WebRequest.Create(URL).GetResponse().GetResponseStream();
using (FileStream fs = File.Create(Storage))
{
Byte[] buffer = new Byte[32*1024];
int read = response.Read(buffer,0,buffer.Length);
while (read > 0)
{
fs.Write(buffer,0,read);
read = response.Read(buffer,0,buffer.Length);
}
}
// NOTE: Various Flush and Close of streams and storage not shown here.
}

You are using a StreamReader and a StreamWriter to transfer your stream, but those classes are for handling text. Your file is binary and chances are that sequences of CR, LF and CR LF may get clobbered when you transfer the data. How NUL characters are handled I have no idea.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read webpage - avoid diamond / question mark for non-standard chars - c#

Related

I need to append multiple filestreams to the same pdf file

Convert Code from .NetFramework 3.5 to Xamarin PCL

using http response how to save the pdf files

Convert to Stream from a Url

How to use WebResponse to Download .wmv file

Categories

Resources