I am creating a small dictionary, with additional option to use google translate. So here is the problem: when I receive the respond from Google and show it in a textbox I see some kind of strange symbols.
Here is the code of the method which "asks" google:
public string TranslateText(string inputText, string languagePair)
{
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", inputText, languagePair);
WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
// Get translated text
string result = webClient.DownloadString(url);
result = result.Substring(result.IndexOf("<span title=\"") + "<span title=\"".Length);
result = result.Substring(result.IndexOf(">") + 1);
result = result.Substring(0, result.IndexOf("</span>"));
return result.Trim();
}
..and calling this method like this(after translate button clicked):
string resultText;
string inputText = tbInputWord.Text.ToString();
if (inputText != null && inputText.Trim() != "")
{
ExtendedGoogleTranslate urlTranslate = new ExtendedGoogleTranslate();
resultText = urlTranslate.TranslateText(inputText, "en|bg");
tbOutputWord.Text = resultText;
}
So I am translating from English(en) to Bulgarian(bg) and encoding webClient with UTF8 so I think that I am missing something on caller code to parse resultText somehow before putting it to tbOutputWord textbox. I know that this code works, because if I choose to translate from English to French(for example) it shows the correct result.
Somehow, Google doesn't respect the ie=UTF8 query parameter. We need to add some headers to our request so that UTF8 is returned:
WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
webClient.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0");
webClient.Headers.Add(HttpRequestHeader.AcceptCharset, "UTF-8");
Related
I am trying to get the html content from a url that has Persian characters in it such as:
http://example.com/%D8%B7%D8%B1%D8%A7%D8%AD%DB%8C-%D9%88%D8%A8-%D8%B3%D8%A7%DB%8C%D8%AA-%D8%A2%D8%AA%D9%84%DB%8C%D9%87/website/Atelier
I am using this code:
using (WebClient client = new WebClient())
{
client.Encoding = Encoding.UTF8;
string data = client.DownloadString(urlTextWithPersianCharacters);
}
When the url is something like this, I get unreadable characters and symbols. This code is fine with other websites that have English urls and Persian content.
Edit: both answers worked find now that I am testing other websites. The problem is with one specific website that I am trying to get its content. Can the website block these kinds of requests?or use other encodings maybe?
What do you suggest me to do?
Try to convert your url string to URI:
Uri uri = new Uri("http://example.com/%D8%B7%D8%B1%D8%A7%D8%AD%DB%8C-%D9%88%D8%A8-%D8%B3%D8%A7%DB%8C%D8%AA-%D8%A2%D8%AA%D9%84%DB%8C%D9%87/website/Atelier");
using (WebClient client = new WebClient())
{
client.Encoding = Encoding.UTF8;
string data = client.DownloadString(uri);
}
The default System.Text.UTF8Encoding class is only capable of performing direct binary decoding of the UTF-8 format. In the example you have, you are attempting to decode a URL that is using "URL Encoding".
URL encoding occurs when special characters are encoded into a URL using hex encoding with % signs as markers.
To solve this issue, you will need to decode the URL into a UTF-8 string.
The System.Net.Uri.UnescapeDataString() method should be able to this for you.
string url = "http://example.com/%D8%B7%D8%B1%D8%A7%D8%AD%DB%8C-%D9%88%D8%A8-%D8%B3%D8%A7%DB%8C%D8%AA-%D8%A2%D8%AA%D9%84%DB%8C%D9%87/website/Atelier";
string result = Uri.UnescapeDataString(url);
In this example, result contains: http://example.com/طراحی-وب-سایت-آتلیه/website/Atelier
Edit: I did some research and saw that there are variances on how WebClient and WebRequest handle character encoding.Link to relevant article.
Try switching from WebClient and use WebRequest and see if that resolves you encoding problem.
There are many methods and solutions. Try which one fits your need
string testString = "http://test# space 123/text?var=val&another=two";
Console.WriteLine("UrlEncode: " + System.Web.HttpUtility.UrlEncode(testString));
Console.WriteLine("EscapeUriString: " + Uri.EscapeUriString(testString));
Console.WriteLine("EscapeDataString: " + Uri.EscapeDataString(testString));
Console.WriteLine("EscapeDataReplace: " + Uri.EscapeDataString(testString).Replace("%20", "+"));
Console.WriteLine("HtmlEncode: " + System.Web.HttpUtility.HtmlEncode(testString));
Console.WriteLine("UrlPathEncode: " + System.Web.HttpUtility.UrlPathEncode(testString));
//.Net 4.0+
Console.WriteLine("WebUtility.HtmlEncode: " + WebUtility.HtmlEncode(testString));
Console.WriteLine("WebUtility.UrlEncode: " + WebUtility.UrlEncode(testString));
I want to read the response from the URI and modify it by replacing all S's to X's and to return that string back to client.
Below is my code, but replace is not working.
I downloaded the "response" string to check and there are lots of S characters.
Any idea why this is not working or how can I manipulate this ?
try
{
// open and read from the supplied URI
stream = webClient.OpenRead(uri);
reader = new StreamReader(stream);
response = reader.ReadToEnd();
response.Replace('S', 'X');
webClient.DownloadFile(uri, "C://Users//MyPC//Desktop//a.txt");
}
Thanks..
you can use webClient.DownloadString(uri)
like this:
string str = webClient.DownloadString(uri).Replace('S', 'X');
File.WriteAllText(#"C://Users//MyPC//Desktop//a.txt", str);
I'm using HttpClient to POST MultipartFormDataContent to a Java web application. I'm uploading several StringContents and one file which I add as a StreamContent using MultipartFormDataContent.Add(HttpContent content, String name, String fileName) using the method HttpClient.PostAsync(String, HttpContent).
This works fine, except when I provide a fileName that contains german umlauts (I haven't tested other non-ASCII characters yet). In this case, fileName is being base64-encoded. The result for a file named 99 2 LD 353 Temp Äüöß-1.txt
looks like this:
__utf-8_B_VGVtcCDvv73vv73vv73vv71cOTkgMiBMRCAzNTMgVGVtcCDvv73vv73vv73vv70tMS50eHQ___
The Java server shows this encoded file name in its UI, which confuses the users. I cannot make any server-side changes.
How do I disable this behavior? Any help would be highly appreciated.
Thanks in advance!
I just found the same limitation as StrezzOr, as the server that I was consuming didn't respect the filename* standard.
I converted the filename to a byte array of the UTF-8 representation, and the re-armed the bytes as chars of "simple" string (non UTF-8).
This code creates a content stream and add it to a multipart content:
FileStream fs = File.OpenRead(_fullPath);
StreamContent streamContent = new StreamContent(fs);
streamContent.Headers.Add("Content-Type", "application/octet-stream");
String headerValue = "form-data; name=\"Filedata\"; filename=\"" + _Filename + "\"";
byte[] bytes = Encoding.UTF8.GetBytes(headerValue);
headerValue="";
foreach (byte b in bytes)
{
headerValue += (Char)b;
}
streamContent.Headers.Add("Content-Disposition", headerValue);
multipart.Add(streamContent, "Filedata", _Filename);
This is working with spanish accents.
Hope this helps.
I recently found this issue and I use a workaround here:
At server side:
private static readonly Regex _regexEncodedFileName = new Regex(#"^=\?utf-8\?B\?([a-zA-Z0-9/+]+={0,2})\?=$");
private static string TryToGetOriginalFileName(string fileNameInput) {
Match match = _regexEncodedFileName.Match(fileNameInput);
if (match.Success && match.Groups.Count > 1) {
string base64 = match.Groups[1].Value;
try {
byte[] data = Convert.FromBase64String(base64);
return Encoding.UTF8.GetString(data);
}
catch (Exception) {
//ignored
return fileNameInput;
}
}
return fileNameInput;
}
And then use this function like this:
string correctedFileName = TryToGetOriginalFileName(fileRequest.FileName);
It works.
In order to pass non-ascii characters in the Content-Disposition header filename attribute it is necessary to use the filename* attribute instead of the regular filename. See spec here.
To do this with HttpClient you can do the following,
var streamcontent = new StreamContent(stream);
streamcontent.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") {
FileNameStar = "99 2 LD 353 Temp Äüöß-1.txt"
};
multipartContent.Add(streamcontent);
The header will then end up looking like this,
Content-Disposition: attachment; filename*=utf-8''99%202%20LD%20353%20Temp%20%C3%84%C3%BC%C3%B6%C3%9F-1.txt
I finally gave up and solved the task using HttpWebRequest instead of HttpClient. I had to build headers and content manually, but this allowed me to ignore the standards for sending non-ASCII filenames. I ended up cramming unencoded UTF-8 filenames into the filename header, which was the only way the server would accept my request.
I am developing a news-app for Windows 8 (in C#, XAML). Unfortunately I encountered a strange error after downloading a JSON-Feed (validated with http://jsonlint.com/) asynchronously. The download succeeds and then I want to parse the result: var items = Windows.Data.JsonArray.Parse(result);.
When I run the code I get the following error:
Invalid character at position 0. and Invalid JSON string.
Json.JsonArray is a new Library from Microsoft. I also tried Newtonsoft's JSON-library with the same errors. What am I doing wrong?
This is the full code:
// Retrieve recipe data from Azure
var client = new HttpClient();
client.MaxResponseContentBufferSize = 1024*1024; // Read up to 1 MB of data
var response = await client.GetAsync(new Uri("http://contosorecipes8.blob.core.windows.net/AzureRecipes"));
var result = await response.Content.ReadAsStringAsync();
// Parse the JSON recipe data
var recipes = JsonArray.Parse(result.Substring(1, result.Length - 1));
This code snippet is from a Microsoft Hands-On Lab (Contoso CookBook). I also tried it without the "[" and "]" in the source (with no effect)...
Thank you!
I was able to download and parse the result fine using this:
static async Task<JsonValue> DownloadJsonAsync(string url)
{
var client = new HttpClient();
client.MaxResponseContentBufferSize = 1024 * 1024;
var data = await client.GetByteArrayAsync(new Uri(url));
var encoding = Encoding.UTF8;
var preamble = encoding.GetPreamble();
var content = encoding.GetString(data, preamble.Length, data.Length - preamble.Length);
var result = JsonValue.Parse(content);
return result;
}
The BOM in the response wasn't handled correctly apparently which resulted in having a '\xfeff' character in the beginning killing the parser. Stripping off the preamble and parsing reads fine. Otherwise parsing it as-is throws a FormatException with the message: Encountered unexpected character 'ï'..
I was able to run your code after a small modification. The byte order mark of the UTF8 string seems to triggers a problem with JsonArray.Parse() from Windows.Data.Json.
A way to solve it without using additional encoding is to replace the BOM character after ReadAsStringAsync(), e.g.
result = result.Replace('\xfeff', ' ');
or better
if (result.Length > 1 && result[0] == '\xfeff'){
result = result.Remove(0, 1);
}
I am developing a application for downloading mcx data from the website.It would be good if i
myself create an application and use it.
There is a datetimepicker in the website in which I want to select the date programatically
click the go button and later view in excel.when I click view on excel it downloads the file
of the data of the particular date. You can see this link and understand what i want to say.
http://www.mcxindia.com/sitepages/bhavcopy.aspx
There would be a great appreciation if someone could help me.
Thanks In Avance.
using System.Net;
WebClient webClient = new WebClient();
webClient.DownloadFile("http://mysite.com/myfile.txt", #"c:\myfile.txt");
but if the file is too large then you should use the async method.
check this code example http://www.csharp-examples.net/download-files/
You'll need to post your data to the server with your client request as explained by #Peter.
This is an ASP.net page, and therefore it requires that you send some data on postback in order to complete the callback.
Using google, I was able to find this as a proof of concept.
The following is a snippet I wrote in Linqpad to test it out. Here it is:
void Main()
{
WebClient webClient = new WebClient();
byte[] b = webClient.DownloadData("http://www.mcxindia.com/sitepages/BhavCopyDateWise.aspx");
string s = System.Text.Encoding.UTF8.GetString(b);
var __EVENTVALIDATION = ExtractVariable(s, "__EVENTVALIDATION");
__EVENTVALIDATION.Dump();
var forms = new NameValueCollection();
forms["__EVENTTARGET"] = "btnLink_Excel";
forms["__EVENTARGUMENT"] = "";
forms["__VIEWSTATE"] = ExtractVariable(s, "__VIEWSTATE");
forms["mTbdate"] = "11%2F15%2F2011";
forms["__EVENTVALIDATION"] = __EVENTVALIDATION;
webClient.Headers.Set(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
var responseData = webClient.UploadValues(#"http://www.mcxindia.com/sitepages/BhavCopyDateWise.aspx", "POST", forms);
System.IO.File.WriteAllBytes(#"c:\11152011.csv", responseData);
}
private static string ExtractVariable(string s, string valueName)
{
string tokenStart = valueName + "\" value=\"";
string tokenEnd = "\" />";
int start = s.IndexOf(tokenStart) + tokenStart.Length;
int length = s.IndexOf(tokenEnd, start) - start;
return s.Substring(start, length);
}
There're many way to DownloadFile using WebClient
You must read this first
http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx
If you want put some addition information, you can use WebClient.Headers,
and using
using System.Net;
WebClient webClient = new WebClient();
var forms = new NameValueCollection();
forms["token"] = "abc123";
var responseData = webClient.UploadValues(#"http://blabla.com/download/?name=abc.exe", "POST", forms);
System.IO.File.WriteAllBytes(#"D:\abc.exe");