Using translate.google from code

Using translate.google from code - c#

Is there a way to use https://translate.google.co.za/ in code?
Maybe make use of Encoding, WebClients and Uri's, but I'm not sure on the correct way to do this.
In code I can get the translate to language and translate from language as well as the content, but how can I incorporate those parameters into the url and then display the end result?
Please Help
Code attempt:
UnicodeEncoding tmpEncoding = new UnicodeEncoding();
string url = String.Format("http://translate.google.co.za/#{0}/{1}/{2}", languageFrom, languageTo, content);
WebClient tmpClient = new WebClient();
tmpClient.Encoding = System.Text.Encoding.ASCII;
string result = tmpEncoding.GetString(tmpClient.DownloadData(url));
The result it gives me is a list of chinese or japanese characters. I dont know what I doing wrong. Maybe the Encoding?

Take a look at the following website click here
You can use the official Google Translate API for this
Take note that it will cost money to translate. Also take a look at other translate api which can be used inside .net
I Did some searching for ya, Bing translator service is a free API for a maximum of 2M characters a monthe from there on you have to pay for it. It also has a nice SDK to go with it.

I found an answer courtesy of Rick Strahl's Web Log (http://weblog.west-wind.com/posts/2011/Aug/06/Translating-with-Google-Translate-without-API-and-C-Code)
Although I didnt use the JavaScriptSerializer it gave me what I wanted. In the form of (\"content\").
So just a bit of string manipulation and I'm golden.
EDIT:
I ended up using the Serializer as the other way didn't give the special characters that form some words, i.e words from french woudnt have those characters that make the words frenchy. Instead it would give a question mark surrounded by a white diamond.

Related

How to skip encoding params in ASP.NET Routes

In my ASP.NET WebForm application I have simple rule:
routes.MapPageRoute("RouteSearchSimple", "search/{SearchText}", "~/SearchTicket.aspx");
As "SearchText" param I need to use cyrillic words, so to create Url I use:
string searchText = "текст";
string url = Page.GetRouteUrl("RouteSearchSimple",
new
{
SearchText = searchText
});
GetRouteUrl automatically encode searchText value and as a result
url = /search/%D1%82%D0%B5%D0%BA%D1%81%D1%82
but I need -> /search/текст
How is it possible to get it by Page.GetRouteUrl function.
Thanks a lot!

Actually, I believe Alexei Levenkov is close to the answer. Ultimately, a URL may only contain ASCII characters, so anything beyond alphanumeric characters will be URL encoded (even things like spaces).
Now, to your point, there are browsers out there that will display non-ASCII characters, but that is up to the implementation of the browser (behind the scenes, it is still performing the encoding). GetRouteUrl, however, will return the ASCII-encoded form every time because that is a requirement for URLs.
(As an aside, that "some 8 year old document" defines URLs. It's written by Tim Berners Lee. He had a bit of an impact on the Internet.)
Update
And because you got me interested, I did a bit more research. It looks as though Internationalized Domain Names do exist. However, from what I understand from the article, underneath the covers, ToASCII or ToUnicode are applied to the names. More can be read in this spec: RFC 3490. So, again, you're still at the same point. More discussion can be found at this Stackoverflow question.

Ok, guys, thank you for replies, it helps much. Simple answer is: it's impossible to do that by Page.GetRouteUrl() function. It's very strange why it hasn't beed developed in way to rely Encoding/Decoding params on developers like we have it in Request.Params or .QueryString, or at least it would be some alternate routing function where developers could control that.
One way I found is getting Url from RouteTable and parse it manually, in my case it would be like:
string url = (System.Web.Routing.RouteTable.Routes["RouteSearchSimple"] as System.Web.Routing.Route).Url.Replace("{SearchText}", "текст");
or simplest way is just creating url via string concatenation:
string param = "текст";
string url = "/search/" + param;
what I already did, but in that case you will need change the code in all places where it appears if you change your route url, therefore better create some static function like GetSearchUrl(string searchText) in one place.
And it works like a charm, Url's looks human readable and I can read params via RouteData.Values

The most simple solution is to decode with UrlDecode method:
string searchText = "текст";
string url = Page.GetRouteUrl("RouteSearchSimple",
new
{
SearchText = searchText
});
string decodedUrl = Server.UrlDecode(url); // => /search/текст

Name a method to get url safe unique links (.net)

EDIT 1: Perhaps I wasn't very clear earlier. For the following scenario, I'd like to know the best/standard method.
I have a .NET 4 web application in which for various reasons I need to send unique links to our customers.(like password resets, invitations, acount verifications etc)
The link structure will be typically mysite/some-action?key=some-unique-value
What should I do to generate the "some-unique-value" part? Whatever the method, it shouldn't break my URL.
I found some questions on SO that came close to my need but couldn't quite nail it.
Also let me know if there is a better/standard way to implement this kind of feature. Thanks.

Assuming you get a byte array - you can convert it to hex using:
BitConverter.ToString(bytes);
You might want to use a hash algorithm such as SHA1 instead of encryption.

You can try to encrypt your query string parameters, here is a good explanation.
(Source)

Use server.UrlEncode for encoding and Server.UrlDecode for decoding
Dim Url As String = "something.aspx?"
Url & = "key = " & Server.UrlEncode("someUniqueValue")
EDIT: You don't have to decode the url string at the server as it is automatically decoded by asp.net and decoding it a second time may cause problems especially if your original url includes a '+' which will be decoded to a space.

URL from user, XSS security? [duplicate]

We have a high security application and we want to allow users to enter URLs that other users will see.
This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.
What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?
Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)
Is there an argument for not supporting user entered links at all?
Clarification:
Basically our users want to input:
stackoverflow.com
And have it output to another user:
stackoverflow.com
What I really worry about is them using this in a XSS hack. I.e. they input:
alert('hacked!');
So other users get this link:
stackoverflow.com
My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.
You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?
I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

If you think URLs can't contain code, think again!
https://owasp.org/www-community/xss-filter-evasion-cheatsheet
Read that, and weep.
Here's how we do it on Stack Overflow:
/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
return Regex.Replace(url, #"[^-A-Za-z0-9+&##/%?=~_|!:,.;\(\)]", "");
}

The process of rendering a link "safe" should go through three or four steps:
Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.
If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Use a library, such as OWASP-ESAPI API:
PHP - http://code.google.com/p/owasp-esapi-php/
Java - http://code.google.com/p/owasp-esapi-java/
.NET - http://code.google.com/p/owasp-esapi-dotnet/
Python - http://code.google.com/p/owasp-esapi-python/
Read the following:
https://www.golemtechnologies.com/articles/prevent-xss#how-to-prevent-cross-site-scripting
https://www.owasp.org/
http://www.secbytes.com/blog/?p=253
For example:
$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$esapi = new ESAPI( "/etc/php5/esapi/ESAPI.xml" ); // Modified copy of ESAPI.xml
$sanitizer = ESAPI::getSanitizer();
$sanitized_url = $sanitizer->getSanitizedURL( "user-homepage", $url );
Another example is to use a built-in function. PHP's filter_var function is an example:
$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$sanitized_url = filter_var($url, FILTER_SANITIZE_URL);
Using filter_var allows javascript calls, and filters out schemes that are neither http nor https. Using the OWASP ESAPI Sanitizer is probably the best option.
Still another example is the code from WordPress:
http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2561
Additionally, since there is no way of knowing where the URL links (i.e., it might be a valid URL, but the contents of the URL could be mischievous), Google has a safe browsing API you can call:
https://developers.google.com/safe-browsing/lookup_guide
Rolling your own regex for sanitation is problematic for several reasons:
Unless you are Jon Skeet, the code will have errors.
Existing APIs have many hours of review and testing behind them.
Existing URL-validation APIs consider internationalization.
Existing APIs will be kept up-to-date with emerging standards.
Other issues to consider:
What schemes do you permit (are file:/// and telnet:// acceptable)?
What restrictions do you want to place on the content of the URL (are malware URLs acceptable)?

Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)

You don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library
It is very easy to use, all you need is an include and that is it :)
While you're on the topic, why not given a read on Design Guidelines for Secure Web Applications
If any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Python, ROR, etc)

For Pythonistas, try Scrapy's w3lib.
OWASP ESAPI pre-dates Python 2.7 and is archived on the now-defunct Google Code.

How about not displaying them as a link? Just use the text.
Combined with a warning to proceed at your own risk may be enough.
addition - see also Should I sanitize HTML markup for a hosted CMS? for a discussion on sanitizing user input

There is a library for javascript that solves this problem
https://github.com/braintree/sanitize-url
Try it =)

In my project written in JavaScript I use this regex as white list:
url.match(/^((https?|ftp):\/\/|\.{0,2}\/)/)
the only limitation is that you need to put ./ in front for files in same directory but I think I can live with that.

Using Regular Expression to prevent XSS vulnerability is becoming complicated thus hard to maintain over time while it could leave some vulnerabilities behind. Having URL validation using regular expression is helpful in some scenarios but better not be mixed with vulnerability checks.
Solution probably is to use combination of an encoder like AntiXssEncoder.UrlEncode for encoding Query portion of the URL and QueryBuilder for the rest:
public sealed class AntiXssUrlEncoder
{
public string EncodeUri(Uri uri, bool isEncoded = false)
{
// Encode the Query portion of URL to prevent XSS attack if is not already encoded. Otherwise let UriBuilder take care code it.
var encodedQuery = isEncoded ? uri.Query.TrimStart('?') : AntiXssEncoder.UrlEncode(uri.Query.TrimStart('?'));
var encodedUri = new UriBuilder
{
Scheme = uri.Scheme,
Host = uri.Host,
Path = uri.AbsolutePath,
Query = encodedQuery.Trim(),
Fragment = uri.Fragment
};
if (uri.Port != 80 && uri.Port != 443)
{
encodedUri.Port = uri.Port;
}
return encodedUri.ToString();
}
public static string Encode(string uri)
{
var baseUri = new Uri(uri);
var antiXssUrlEncoder = new AntiXssUrlEncoder();
return antiXssUrlEncoder.EncodeUri(baseUri);
}
}
You may need to include white listing to exclude some characters from encoding. That could become helpful for particular sites.
HTML Encoding the page that render the URL is another thing you may need to consider too.
BTW. Please note that encoding URL may break Web Parameter Tampering so the encoded link may appear not working as expected.
Also, you need to be careful about double encoding
P.S. AntiXssEncoder.UrlEncode was better be named AntiXssEncoder.EncodeForUrl to be more descriptive. Basically, It encodes a string for URL not encode a given URL and return usable URL.

You could use a hex code to convert the entire URL and send it to your server. That way the client would not understand the content in the first glance. After reading the content, you could decode the content URL = ? and send it to the browser.

Allowing a URL and allowing JavaScript are 2 different things.

remove illegal 0x1f charector from xml

I have a program that generates some data and saves it as an xml, unfortunately for my purposes I cant save it in the newer XML that allows for characters like 0x1f. As a result, I need to eliminate this character from my xml. All I have been able to find that seems to do this is this http://benjchristensen.com/2008/02/07/how-to-strip-invalid-xml-characters/ but I don't know java-script, and would like to be able to use a script that I am able to understand. I do know basic C#, but am not great in it. Anyway, what would be the easiest way to filter this character? I do think this is a good question for the online community anyway as finding a working method in C# from Google proves to be challenging.

From this post: How can you strip non-ASCII characters from a string? (in C#)
Adjusting it for your case:
string s = File.ReadAllText(filepath);
s = Regex.Replace(s, #"[\u0000-\u001F]", string.Empty);
File.WriteAllText(newFilepath, s);
Then test the new file. Don't overwrite the old until you know if this works or not.

Double/incomplete Parameter Url Encoding

In my web app, my parameters can contain all sorts of crazy characters (russian chars, slashes, spaces etc) and can therefor not always be represented as-is in a URL.
Sending them on their merry way will work in about 50% of the cases. Some things like spaces are already encoded somewhere (I'm guessing in the Html.BuildUrlFromExpression does). Other things though (like "/" and "*") are not.
Now I don't know what to do anymore because if I encode them myself, my encoding will get partially encoded again and end up wrong. If I don't encode them, some characters will not get through.
What I did is manually .replace() the characters I had problems with.
This is off course not a good idea.
Ideas?
--Edit--
I know there are a multitude of encoding/decoding libraries at my disposal.
It just looks like the mvc framework is already trying to do it for me, but not completely.
<a href="<%=Html.BuildUrlFromExpression<SearchController>(c=>c.Search("", 1, "a \v/&irdStr*ng"))%>" title="my hat's awesome!">
will render me
<a href="/Search.mvc/en/Search/1/a%20%5Cv/&irdStr*ng" title="my hat's awesome!">
Notice how the forward slash, asterisk and ampersand are not escaped.
Why are some escaped and others not? How can I now escape this properly?
Am I doing something wrong or is it the framework?

Parameters should be escaped using Uri.EscapeDataString:
string url = string.Format("http://www.foo.bar/page?name={0}&address={1}",
Uri.EscapeDataString("adlknad /?? lkm#"),
Uri.EscapeDataString(" qeio103 8182"));
Console.WriteLine(url);
Uri uri = new Uri(url);
string[] options = uri.Query.Split('?','&');
foreach (string option in options)
{
string[] parts = option.Split('=');
if (parts.Length == 2)
{
Console.WriteLine("{0} = {1}",parts[0],
Uri.UnescapeDataString(parts[1]));
}
}

AS others have mentioned, if you encode your string first you aviod the issue.
The MVC Framework is encoding characters that it knows it needs to encode, but leaving those that are valid URL characters (e.g. & % ? * /). This is because these are valid URL characters, although they are special chracters in a URL that might not acheive the result you are after.

Try using the Microsoft Anti-Cross Site Scripting library. It contains several Encode methods, which encode all the characters (including #, and characters in other languages). As for decoding, the browser should handle the encoded Url just fine, however if you need to manually decode the Url, use Uri.UnescapeDataString
Hope that helps.

Escaping of forward slahes and dots in path part of url is prohibited by security reason (althrough, it works in mono).

Html.BuildUrlFromExpression needs to be fixed then, would submit this upstream to the MVC project... alternatively do the encoding to the string before passing to BuildUrlFromExpression, and decode it when it comes back out on the other side.
It may not be readily fixable, as IIS may be handling the decoding of the url string beforehand... may need to do some more advanced encoding/decoding for alternative path characters in the utility methods, and decode on your behalf coming out.

I've seen similar posts on this. Too me, it looks like a flaw in MVC. The function would be more appropriately named "BuildUrlFromEncodedExpression". Whats worse, is that the called function needs to decode its input parameters. Yuk.
If there is any overlap between the characters encoded BuildUrlFromExpression() and the characters encoded by the caller (who, I think might fairly just encode any non-alphanumeric for simplicities sake) then you have potential for nasty bugs.

Server.URLEncode or HttpServerUtility.UrlEncode
I see what you're saying now - I didn't realize the question was specific to MVC. Looks like a limitation of that part of the MVC framework - particularly BuildUrlFromExpression is doing some URL encoding, but it knows that also needs some of those punctation as part of the framework URLs.
And also unfortunately, URLEncoding doesn't produce an invariant, i.e.
URLEncode(x) != URLEncode(URLEncode(x))
Wouldn't that be nice. Then you could pre-encode your variables and they wouldn't be double encoded.
There's probably an ASP.NET MVC framework best practice for this. I guess another thing you could do is encode into base64 or something that is URLEncode-invariant.

Have you tried using the Server.UrlEncode() method to do the encoding, and the Server.UrlDecode() method to decode?
I have not had any issues with using it for passing items.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.