Problem With Cyrillic Characters as URL Parameter - c#

I'm trying to translate some text by sending a GET request to https://translate.googleapis.com/ from a C# application.
The request should be formatted as following:
"/translate_a/single?client=gtx&sl=BG&tl=EN&dt=t&q=Здравей Свят!"
where sl= is the source language, tl= is the target language and q= is the text to be translated.
The response is a JSON array with the translated text and other details.
The problem is that when I try to translate from bulgarian to english the result gets broken like: "Р-РґСЂР ° РІРμР№ РЎРІСЏС,!"
There is no problem when I'm translating from english to bulgarian (no cyrillic in the URL) so my gues is that the problem is in the request.
Also whenever I'm sending the request directly from the browser the result is properly translated text.
How I'm doing it:
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Net.Http;
using System.Web;
class Program
{
static void Main(string[] args)
{
string ApiUrl = "https://translate.googleapis.com/translate_a/single?client=gtx&sl={0}&tl={1}&dt=t&q={2}";
string targetLang = "en";
string sourceLang = "bg";
string text = "Здравей Свят!";
text = HttpUtility.UrlPathEncode(text);
string url = string.Format(ApiUrl, sourceLang, targetLang, text);
using (var client = new HttpClient())
{
var result = client.GetStringAsync(url).Result;
var jRes = (JArray)JsonConvert.DeserializeObject(result);
var translatedText = jRes[0][0][0].ToString();
var originalText = jRes[0][0][1].ToString();
var sourceLanguage = jRes[2].ToString();
}
}
}
Any suggestion will be appreciated.

Thanks to this comment I have managed to recieve a properly formatted response.
The thing is that I'm not using two important parameters in the URL:
ie=UTF-8
oe=UTF-8
The URL should look like this:
https://translate.googleapis.com/translate_a/single?client=gtx&sl=BG&tl=EN&dt=t&q=Здравей%20Свят!&ie=UTF-8&oe=UTF-8

Related

Custom string from TextBox in webrequest

i am trying to send a HttpWebRequest with the following body :
string body = "{\"prompt\": \"MyText\",\"n\": 2,\"size\": \"256x256\",\"response_format\":\"b64_json\"}";
the request works perfectly with this body, but everytime i try to change "MyText" with a text from textbox, i get an error 400 from server.
i tried this (return error 400):
string body = "{\"prompt\":" +textBox1.Text+",\"n\": 2,\"size\": \"256x256\",\"response_format\":\"b64_json\"}";
any ideas ?
It is not recommended to build your JSON manually as you will highly expose to syntax errors due to missing/extra quotes, braces, etc especially when dealing with complex objects/arrays.
Using the libraries such as System.Text.Json or Newtonsoft.Json for the JSON serialization. This is safer and easier compared to building it manually.
using System.Text.Json;
var obj = new
{
prompt = textBox1.Text,
n = 2,
size = "256x256",
response_format = "b64_json"
};
string body = JsonSerializer.Serialize(obj);
using Newtonsoft.Json;
string body = JsonConvert.SerializeObject(obj);

Need to extract a specific url from source code in c# console

Im making a bot that needs to display images from page links that are user fed. The only way i see of doing this is getting the href code from the source code
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString("url that is input by the user");
Console.WriteLine(htmlCode);
Console.ReadKey();
}
is the current code that gets a url. If it helps, this query targets the card pages on the duelmaster wiki so the page layout is identical. I guess what im trying to ask is how do i get that code from the entire source code file?
You can use regex to extract href data from a string
Regular Expression :-
href[\s]=[\s]\"(.?)[\s]\"
C# Code
Include namespace
using System.Text.RegularExpressions;
Updated Code
static void Main()
{
Console.WriteLine("Enter Url you want to Extract data from");
string userInput = Console.ReadLine();
Task t = new Task(DownloadPageAsync);
t.Start();
Console.WriteLine("Downloading page...");
Console.ReadLine();
}
static async void DownloadPageAsync(string requestUrl)
{
// ... Use HttpClient instead of webclient
using (HttpClient client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(requestUrl))
using (HttpContent content = response.Content)
{
string mydata = await content.ReadAsStringAsync();
Regex regex = new Regex("href[\\s]*=[\\s]*\"(.*?)[\\s]*\\\"");
foreach (Match htmlPath in regex.Matches(mydata))
{
// Here you can write your custom logic
Console.WriteLine(htmlPath.Groups[1].Value);
}
}
}
Code explanation
Regex regex = new Regex("href[\\s]*=[\\s]*\"(.*?)[\\s]*\\\"");
This line will create regex object with given regular expression
you can find regex explanation Here after posting given regular expression
foreach (Match htmlPath in regex.Matches(mydata))
{
This line will iterate through all the matches found using regex in given string.
Console.WriteLine(htmlPath.Groups[1].Value);
Notice (.*?) in regex its capture group
Above line will give you your contains inside that group in your case data inside href brackets

FormUrlEncode returns unexpected result encoding the º(masculine ordinal indicator)

I have a program that comunicates with an external http server to the request a first, second etc value... (1º,2º,3º,4º,...)
I have an issue in c# with the º character.
Here is some example code:
var testdata=new Dictionary<string,string>{
{"val","º"},
{"val1","\xBA"},
{"val2","\u00BA"},
};
var content = new FormUrlEncodedContent(testdata);
var cont = content.ReadAsStringAsync().GetAwaiter().GetResult();
the result is:
val=%C2%BA&val1=%C2%BA&val2=%C2%BA
I test the communication with the server with curl and firefox console
and the result should be:
val=%BA&val1=%BA&val2=%BA
Somehow the extra %C2 in C# dosent work with the server.
How can I fix or escape the º correctly?
This issue relates with the default encoding used by FormUrlEncodedContent which is UTF-8 and your server expect ISO-8859-1.
Here is a workaround to get over it but you'll need (unfortunately) to add System.Web to your project :
// This is an implementation of FormUrlEncodedContent with `ISO-8859-1`
public class FormIso8859Encoder : ByteArrayContent
{
public FormIso8859Encoder(IEnumerable<KeyValuePair<string, string>> nameValueCollection)
: base(FormDataToByteArray(nameValueCollection))
{
Headers.Add("Content-Type", "application/x-www-form-urlencoded");
}
private static byte[] FormDataToByteArray(IEnumerable<KeyValuePair<string, string>> nameValueCollection)
{
StringBuilder sb = new StringBuilder();
foreach (var nameValue in nameValueCollection)
{
if (sb.Length > 0)
sb.Append('&');
sb.Append(nameValue.Key);
sb.Append('=');
// Here is the major change
sb.Append(HttpUtility.UrlEncode(nameValue.Value, Encoding.GetEncoding("iso-8859-1") ));
}
return Encoding.Default.GetBytes(sb.ToString());
}
}
Then
var testdata=new Dictionary<string,string>{
{"val","º"},
{"val1","\xBA"},
{"val2","\u00BA"},
};
var content = new FormIso8859Encoder(testdata);
var cont = content.ReadAsStringAsync().GetAwaiter().GetResult();
This provide the following output :
val=%BA&val1=%BA&val2=%BA
The correct unicode character for ° is \u00B0. More info can you find here how to work with unicode in C#.
All unicode characters can be found here.

How to search a downloaded string of a website?

I have downloaded the string and found the index but am not able to get the text which I am searching for. Here is my code:
System.Net.WebClient client = new System.Net.WebClient();
string downloadedString = client.DownloadString("http://www.gmail.com");
int ss = downloadedString.IndexOf("fun");
string mm = downloadedString.Substring(ss);
textBox1.Text = mm;
try the following
if (downloadedString .Contains("fun"))
{
// Process...
}
Visiting www.gmail.com will perform 3 directs. Try the following url instead:
https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=default&ltmplcache=2
Also, consider using a proper HTML Parser like the HTML Agility Pack.

Equivalent of Python code in C#?

I have to really ask this question as I donot know Python.
Following are a few lines taken from this place. I would appreciate if someone guides me in translating the following to C#
#Step 1: Get a session key
servercontent = myhttp.request(baseurl + '/services/auth/login', 'POST',
headers={}, body=urllib.urlencode({'username':username, 'password':password}))[1]
sessionkey = minidom.parseString(servercontent).getElementsByTagName('sessionKey')[0].childNodes[0].nodeValue
print "====>sessionkey: %s <====" % sessionkey
I can't translate it to C#, but I can explain what this code does:
Login to baseurl + '/services/auth/login' using the username and password provided.
Read the contents of that URL.
Parse the content for the first <sessionkey> tag, and read the value of its first child node.
Here's a quick-n-dirty translation:
using System.Linq.Xml;
using System.Net;
using System.Collections.Generic;
using System.Web;
// ...
var client = new WebClient();
var parameters = new Dictionary<string, string>
{
{ "username", username },
{ "password", password }
};
var result = client.UploadString(String.Format("{0}/services/auth/login", BaseUrl), UrlEncode(parameters));
var doc = XDocument.Load(result); // load response into XML document (LINQ)
var key = doc.Elements("sessionKey").Single().Value // get the one-and-only <sessionKey> element.
Console.WriteLine("====>sessionkey: {0} <====", key);
// ...
// Utility function:
private static string UrlEncode(IDictionary<string, string> parameters)
{
var sb = new StringBuilder();
foreach(var val in parameters)
{
// add each parameter to the query string, url-encoding the value.
sb.AppendFormat("{0}={1}&", val.Key, HttpUtility.UrlEncode(val.Value));
}
sb.Remove(sb.Length - 1, 1); // remove last '&'
return sb.ToString();
}
This code does a check to see that the response only has one sessionKey element, otherwise it'll throw an exception if there's 0, or more than 1. Then it prints it out.

Categories