How to decode Javascript Unicode into C# strings - c#

For example the JSON callback we get on a google autosearch:
window.google.td && window.google.td('tljp1322487273527014', 4,{e:"HY7TTtmRFZPe8QPCvf30Dw",c:1,u:"http://www.google.co.uk/s?hl\x3den\x26cp\x3d5\x26gs_id\x3d17\x26xhr\x3dt\x26q\x3dowasp\x26pf\x3dp\x26sclient\x3dpsy-ab\x26source\x3dhp\x26pbx\x3d1\x26oq\x3d\x26aq\x3d\x26aqi\x3d\x26aql\x3d\x26gs_sm\x3d\x26gs_upl\x3d\x26bav\x3don.2,or.r_gc.r_pw.,cf.osb\x26fp\x3dbd20912ccdf288ab\x26biw\x3d387\x26bih\x3d362\x26tch\x3d4\x26ech\x3d15\x26psi\x3d5o3TTqCqCsnD0QXA7sUI.1322487273527.1\x26wrapid\x3dtljp1322487273527014",d:"[\x22owasp\x22,[[\x22owasp\x22,0,\x220\x22],[\x22owasp\\u003Cb\\u003E top 10\\u003C\\/b\\u003E\x22,0,\x221\x22],[\x22owasp\\u003Cb\\u003E top 10 2011\\u003C\\/b\\u003E\x22,0,\x222\x22],[\x22owasp\\u003Cb\\u003E zap\\u003C\\/b\\u003E\x22,0,\x223\x22]],{\x22j\x22:\x2217\x22}]"});window.google.td && window.google.td('tljp1322487273527014', 4,{e:"HY7TTtmRFZPe8QPCvf30Dw",c:0,u:"http://www.google.co.uk/s?hl\x3den\x26cp\x3d5\x26gs_id\x3d17\x26xhr\x3dt\x26q\x3dowasp\x26pf\x3dp\x26sclient\x3dpsy-ab\x26source\x3dhp\x26pbx\x3d1\x26oq\x3d\x26aq\x3d\x26aqi\x3d\x26aql\x3d\x26gs_sm\x3d\x26gs_upl\x3d\x26bav\x3don.2,or.r_gc.r_pw.,cf.osb\x26fp\x3dbd20912ccdf288ab\x26biw\x3d387\x26bih\x3d362\x26tch\x3d4\x26ech\x3d15\x26psi\x3d5o3TTqCqCsnD0QXA7sUI.1322487273527.1\x26wrapid\x3dtljp1322487273527014",d:""});
more specifically, how to go from:
"\x22te\\u003Cb\\u003Esco\\u003C\\/b\\u003E\x22,0,\x220\x22"
to
"te\u003Cb\u003Esco\u003C\/b\u003E",0,"0"
to
"te<b>sco</b>"
Note that the System.Web UrlDecode and HtmlDecode are not able to handle this.
Interestingly, the AntiXss almost does the reverse, since it can go from:
"te<b>sco</b>"
To
te\00003Cb\00003Esco\00003C\00002Fb\00003E
Security angle
These decodings have a number of security implications since they will be rendered by the browser. For example if in Javascript/jQuery we have a variable with the payload
var xss = "te\u003Cscript\u003Ealert\u002812\u0029\u003C\u002Fscript\u003E"
will be triggered if assigned to a div's html
$("#header").html(xss)

\x....
WTF? \u - dat's okey.
According to previous answer:
string str = #"P\u003e\u003cp\u003e Notes \u003cstrong\u003e Разработчик: \u003c/STRONG\u003e \u003cbr /\u003eЕсли игра Безразлично";
Regex regex = new Regex(#"\\u([0-9a-z]{4})",RegexOptions.IgnoreCase);
str = regex.Replace(str, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value , System.Globalization.NumberStyles.HexNumber)));

It appears that "\x22te\\u003Cb\\u003Esco\\u003C\\/b\\u003E\x22,0,\x220\x22" is hex encoded, there is nothing available to decode this string out of the box, however the following should work:
var regex = new Regex(#"\\x([a-fA-F0-9]{2})");
var replaced = regex.Replace(input, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)));

Related

Regex trying to get just package name from `az.accounts.2.10.4.nupkg`

I am trying to get the package name from the file name using C# and Regex. This is my attempt so far which works, but I am wondering if is there a more elegant way.
Given for example, az.accounts.2.10.4.nupkg I want to get az.accounts
My attempt:
var filename = Path.GetFileNameWithoutExtension(nupkgPackagePath);
var nupkgPackageGetModulePath = Regex.Matches(filename, #"[^\d]+").First().Value.TrimEnd('.'));
Test cases:
$ ls *.nupkg
PowerShellGet.nupkg az.iothub.2.7.4.nupkg
az.9.2.0.nupkg az.keyvault.4.9.1.nupkg
az.accounts.2.10.4.nupkg az.kusto.2.1.0.nupkg
az.advisor.2.0.0.nupkg az.logicapp.1.5.0.nupkg
az.aks.5.1.0.nupkg az.machinelearning.1.1.3.nupkg
az.analysisservices.1.1.4.nupkg az.maintenance.1.2.1.nupkg
az.apimanagement.4.0.1.nupkg az.managedserviceidentity.1.1.0.nupkg
az.appconfiguration.1.2.0.nupkg az.managedservices.3.0.0.nupkg
az.applicationinsights.2.2.0.nupkg az.marketplaceordering.2.0.0.nupkg
az.attestation.2.0.0.nupkg az.media.1.1.1.nupkg
az.automation.1.8.0.nupkg az.migrate.2.1.0.nupkg
az.batch.3.2.1.nupkg az.monitor.4.3.0.nupkg
az.billing.2.0.0.nupkg az.mysql.1.1.0.nupkg
az.cdn.2.1.0.nupkg az.network.5.2.0.nupkg
az.cloudservice.1.1.0.nupkg az.notificationhubs.1.1.1.nupkg
az.cognitiveservices.1.12.0.nupkg az.operationalinsights.3.2.0.nupkg
az.compute.5.2.0.nupkg az.policyinsights.1.5.1.nupkg
az.confidentialledger.1.0.0.nupkg az.postgresql.1.1.0.nupkg
az.containerinstance.3.1.0.nupkg az.powerbiembedded.1.2.0.nupkg
az.containerregistry.3.0.0.nupkg az.privatedns.1.0.3.nupkg
az.cosmosdb.1.9.0.nupkg az.recoveryservices.6.1.2.nupkg
az.databoxedge.1.1.0.nupkg az.rediscache.1.6.0.nupkg
az.databricks.1.4.0.nupkg az.redisenterprisecache.1.1.0.nupkg
az.datafactory.1.16.11.nupkg az.relay.1.0.3.nupkg
az.datalakeanalytics.1.0.2.nupkg az.resourcemover.1.1.0.nupkg
az.datalakestore.1.3.0.nupkg az.resources.6.5.0.nupkg
az.dataprotection.1.0.1.nupkg az.security.1.3.0.nupkg
az.datashare.1.0.1.nupkg az.securityinsights.3.0.0.nupkg
az.deploymentmanager.1.1.0.nupkg az.servicebus.2.1.0.nupkg
az.desktopvirtualization.3.1.1.nupkg az.servicefabric.3.1.0.nupkg
az.devtestlabs.1.0.2.nupkg az.signalr.1.5.0.nupkg
az.dns.1.1.2.nupkg az.sql.4.1.0.nupkg
az.eventgrid.1.5.0.nupkg az.sqlvirtualmachine.1.1.0.nupkg
az.eventhub.3.2.0.nupkg az.stackhci.1.4.0.nupkg
az.frontdoor.1.9.0.nupkg az.storage.5.2.0.nupkg
az.functions.4.0.6.nupkg az.storagesync.1.7.0.nupkg
az.hdinsight.5.0.1.nupkg az.streamanalytics.2.0.0.nupkg
az.healthcareapis.2.0.0.nupkg az.support.1.0.0.nupkg
You can try something like this:
string text = "az.streamanalytics.2.0.0.nupkg";
var result = Regex
.Match(text, #"(?<name>[a-zA-Z0-9.]+?)(\.[0-9]+)*\.nupkg$")
.Groups["name"]
.Value;
Pattern explained:
(?<name>[a-zA-Z0-9.]+?) - letters, digits, dots as few as possible
(in order do not match version part)
(\.[0-9]+)* - zero or more version part: . followed by digits
\.nupkg - .nupkg
$ - end of string
Fiddle
^[^.]*\.[^.]*
You can test it out at https://regex101.com/
using System.Text.RegularExpressions;
// ...
string filename = "az.accounts.2.10.4.nupkg";
string pattern = #"^[^.]*\.[^.]*";
string nupkgPackageGetModulePath = Regex.Match(filename, pattern).Value;
// nupkgPackageGetModulePath is now "az.accounts"
You've got two different input formats
<PackageName>.nupkg
<PackageName>.<Major>.<Minor>.<Patch>.nupkg
Your current attempt:
Regex.Matches(fileName, #"[^\d]+").First().Value.TrimEnd('.')
This actually doesn't work for an input of "PowerShellGet.nupkg". To explain how this code works.
Starting at the beginning of the string, find the first non-digit character, and greedily include all other consecutive non-digit characters. This is the "matched text"
If the matched text ends with a period, take off that period.
This works fine if your input has a number in it, but "PowerShellGet.nupkg" doesn't, hence nupkgPackageGetModulePath in your code example will be the full file name not "PowerShellGet".
This will also be a huge problem if the package name itself contains a digit. How about "runtime.opensuse.13.2-x64.runtime.native.System.Security.Cryptography.OpenSsl.4.3.3.nupkg", or (and I can't believe this is actually a package) "2.2.0.0.nupgk".
It's not a good idea to find the first non-digit. Instead, work with the expected format of nuget packages.
Using string.Split:
Split the input by periods. If there's two elements in the resulting array, it's the first format and return the first element of the array. If there's at least 5 elements in the array, it's the second format. Otherwise, the format is unknown.
private static string GetPackageName(string packageFileName)
{
var segments = packageFileName.Split('.');
return segments.Length switch
{
2 => segments[0],
>= 5 => string.Join(".", segments[..^4]),
_ => throw new Exception("Unknown what you want done here")
};
}
segments[..^4] is a handy way to get all the element(s) before the major version.
https://dotnetfiddle.net/Ok6jbq
Using Regex:
Again, because you've got two different formats you've got to account for both so this gets a bit more complicated.
([\S]+?)(?:\.\d+\.\d+\.\d+)?\.nupkg
The middle section ((?:\.\d+\.\d+\.\d+)?) is a non-capture group (starts with ?:) which is optional (suffixed with ?).
Capture group 1 will have the package name.
https://regexr.com/74mgf

Replacing html content in a string

I have a string that has html contents such as:
string myMessage = "Please the website for more information (<a class=\"link\" href=\"http://www.africau.edu/images/default/sample.pdf\" target=_blank\" id=\"urlLink\"> easy details given</a>)";
What I need in the end is:
string myMessage = "Please the website for more information http://www.africau.edu/images/default/sample.pdf easy details given";
I can do this replacing each string as myMessage = myMessage.Replace("string to replace", ""); but then I have to take in each string and replace it will empty. Could there be a better solution?
If I understand you correctly you have a larger text with multiple occurrences of "<a ....>" and actually you want to replace that entire thing by simply only the URL given in the href.
Not sure if this makes it so much easier for you but you could use Regex.Matches something like e.g.
var myMessage = "Please the website for more information (<a class=\"link\" href=\"http://www.africau.edu/images/default/sample.pdf\" target=_blank\" id=\"urlLink\"> easy details given</a>)";
var matches = Regex.Matches(myMessage, "(.+?)<a.+?href=\"(.+?)\".+?<\\/a>(.+?)");
var strBuilder = new StringBuilder();
foreach (Match match in matches)
{
var groups = match.Groups;
strBuilder.Append(groups[1]) // Please the website for more information (
.Append(groups[2]) // http://www.africau.edu/images/default/sample.pdf
.Append(groups[3]); // )
}
Debug.Log(strBuilder.ToString());
So what does this do?
(.+?) will create a group for everything before the first encounter of the following <a => groups[1]
<a.+?href=" matches everything starting with <a and ending with href=" => ignored
(.+?) will create a group for everything between href=" and the next " (so the URL) => groups[2]
".+?<\/a> matches everything from the " until the next </a> => ignored
(.+?) will create a group for everything after the </a> => groups[3]
and groups[0] is the entire match.
so finally we just want to combine
groups[1] + groups[2] + groups[3]
but in a loop so we find possibly multiple matches within the same string and it is simply more efficient to use a StringBuilder for that.
Result
Please the website for more information (http://www.africau.edu/images/default/sample.pdf)
you can simply adjust this to e.g. also remove the ( ) or include the text between the tags but I figured actually this makes the most sense for now.
I personally don't like to rely on the string format always being what I expect as this can lead to errors down the road.
Instead, I offer two ways I can think of doing this:
Use regular expressions:
string myMessage = "Please the website for more information (<a class=\"link\" href=\"http://www.africau.edu/images/default/sample.pdf\" target=_blank\" id=\"urlLink\"> easy details given</a>)";
var capturePattern = #"(.+)\(<a .*href.*?=""(.*?)"".*>(.*)</a>\)";
var regex = new Regex(capturePattern);
var captures = regex.Match(myMessage);
var newString = $"{captures.Groups[1]}{captures.Groups[2]}{captures.Groups[3]}";
Console.WriteLine(myMessage);
Console.WriteLine(newString);
Output:
Please the website for more information (<a class="link" href="http://www.africau.edu/images/default/sample.pdf" target=_blank" id="urlLink"> easy details given)
Please the website for more information http://www.africau.edu/images/default/sample.pdf easy details given
Of course, regular expressions are only as good as the cases you can think of/test. I wrote this up quickly just to illustrate so make sure to verify for other string variations.
The other way is using HTMLAgilityPack:
string myMessage = "Please the website for more information (<a class=\"link\" href=\"http://www.africau.edu/images/default/sample.pdf\" target=_blank\" id=\"urlLink\"> easy details given</a>)";
var doc = new HtmlDocument();
doc.LoadHtml(myMessage);
var prefix = doc.DocumentNode.ChildNodes[0].InnerText;
var url = doc.DocumentNode.SelectNodes("//a[#href]").First().GetAttributeValue("href", string.Empty);
var suffix= doc.DocumentNode.ChildNodes[1].InnerText + doc.DocumentNode.ChildNodes[2].InnerText;
var newString = $"{prefix}{url}{suffix}";
Console.WriteLine(myMessage);
Console.WriteLine(newString);
Output:
Please the website for more information (<a class="link" href="http://www.africau.edu/images/default/sample.pdf" target=_blank" id="urlLink"> easy details given)
Please the website for more information (http://www.africau.edu/images/default/sample.pdf easy details given)
Notice this method preserves the parenthesis around the link. This is because from the agility pack's perspective, the first parenthesis is part of the text of the node. You can always remove them with a quick replace.
This method adds a dependency but this library is very mature and has been around for a long time.
it goes without saying that for both methods, you should make sure to add [error handling] checks for unexpected conditions.

find string using c#?

I am trying find a string in below string.
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
by using http://example.com/TIGS/SIM/Lists string. How can I get Team Discussion word from it?
Some times strings will be
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
I need `Team Discussion`
http://example.com/TIGS/ALIF/Lists/Artifical Lift Discussion Forum 2/DispForm.aspx?ID=8
I need `Artifical Lift Discussion Forum 2`
If you're always following that pattern, I recommend #Justin's answer. However, if you want a more robust method, you can always couple the System.Uri and Path.GetDirectoryName methods, then perform a String.Split. Like this example:
String url = #"http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
System.Uri uri = new System.Uri(url);
String dir = Path.GetDirectoryName(uri.AbsolutePath);
String[] parts = dir.Split(new[]{ Path.DirectorySeparatorChar });
Console.WriteLine(parts[parts.Length - 1]);
The only major problem, however, is you're going to wind up with a path that's been "encoded" (i.e. your space is now going to be represented by a %20)
This solution will get you the last directory of your URL regardless of how many directories are in your URL.
string[] arr = s.Split('/');
string lastPart = arr[arr.Length - 2];
You could combine this solution into one line, however it would require splitting the string twice, once for the values, the second for the length.
If you wanted to see a regular expression example:
string input = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
string given = "http://example.com/TIGS/SIM/Lists";
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(given + #"\/(.+)\/");
System.Text.RegularExpressions.Match match = regex.Match(input);
Console.WriteLine(match.Groups[1]); // Team Discussion
Here's a simple approach, assuming that your URL always has the same number of slashes before the are you want:
var value = url.Split(new[]{'/'}, StringSplitOptions.RemoveEmptyEntries)[5];
Here is another solution that provides the following advantages:
Does not require the use of regular expressions.
Does not require a certain 'count' of slashes be present (indexing based of a specific number). I consider this a key benefit because it makes the code less likely to fail if some part of the URL changes. Ultimately it is best to base your parsing logic off which part of the text's structure you consider least likely to change.
This method, however, DOES rely on the following assumptions, which I consider to be the least likely to change:
URL must have "/Lists/" right before target text.
URL must have "/" right after target text.
Basically, I just split the string twice, using text that I expect to be surrounding the area I am interested in.
String urlToSearch = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx";
String result = "";
// First, get everthing after "/Lists/"
string[] temp1 = urlToSearch.Split(new String[] { "/Lists/" }, StringSplitOptions.RemoveEmptyEntries);
if (temp1.Length > 1)
{
// Next, get everything before the first "/"
string[] temp2 = temp1[1].Split(new String[] { "/" }, StringSplitOptions.RemoveEmptyEntries);
result = temp2[0];
}
Your answer will then be stored in the 'result' variable.

Find/parse server-side <?abc?>-like tags in html document

I guess I need some regex help. I want to find all tags like <?abc?> so that I can replace it with whatever the results are for the code ran inside. I just need help regexing the tag/code string, not parsing the code inside :p.
<b><?abc print 'test' ?></b> would result in <b>test</b>
Edit: Not specifically but in general, matching (<?[chars] (code group) ?>)
This will build up a new copy of the string source, replacing <?abc code?> with the result of process(code)
Regex abcTagRegex = new Regex(#"\<\?abc(?<code>.*?)\?>");
StringBuilder newSource = new StringBuilder();
int curPos = 0;
foreach (Match abcTagMatch in abcTagRegex.Matches(source)) {
string code = abcTagMatch.Groups["code"].Value;
string result = process(code);
newSource.Append(source.Substring(curPos, abcTagMatch.Index));
newSource.Append(result);
curPos = abcTagMatch.Index + abcTagMatch.Length;
}
newSource.Append(source.Substring(curPos));
source = newSource.ToString();
N.B. I've not been able to test this code, so some of the functions may be slightly the wrong name, or there may be some off-by-one errors.
var new Regex(#"<\?(\w+) (\w+) (.+?)\?>")
This will take this source
<b><?abc print 'test' ?></b>
and break it up like this:
Value: <?abc print 'test' ?>
SubMatch: abc
SubMatch: print
SubMatch: 'test'
These can then be sent to a method that can handle it differently depending on what the parts are.
If you need more advanced syntax handling you need to go beyond regex I believe.
I designed a template engine using Antlr but thats way more complex ;)
exp = new Regex(#"<\?abc print'(.+)' \?>");
str = exp.Replace(str, "$1")
Something like this should do the trick. Change the regexes how you see fit

How do I replace all the spaces with %20 in C#?

I want to make a string into a URL using C#. There must be something in the .NET framework that should help, right?
Another way of doing this is using Uri.EscapeUriString(stringToEscape).
I believe you're looking for HttpServerUtility.UrlEncode.
System.Web.HttpUtility.UrlEncode(string url)
I found useful System.Web.HttpUtility.UrlPathEncode(string str);
It replaces spaces with %20 and not with +.
To properly escape spaces as well as the rest of the special characters, use System.Uri.EscapeDataString(string stringToEscape).
As commented on the approved story, the HttpServerUtility.UrlEncode method replaces spaces with + instead of %20.
Use one of these two methods instead: Uri.EscapeUriString() or Uri.EscapeDataString()
Sample code:
HttpUtility.UrlEncode("https://mywebsite.com/api/get me this file.jpg")
//Output: "https%3a%2f%2fmywebsite.com%2fapi%2fget+me+this+file.jpg"
Uri.EscapeUriString("https://mywebsite.com/api/get me this file.jpg");
//Output: "https://mywebsite.com/api/get%20me%20this%20file.jpg"
Uri.EscapeDataString("https://mywebsite.com/api/get me this file.jpg");
//Output: "https%3A%2F%2Fmywebsite.com%2Fapi%2Fget%20me%20this%20file.jpg"
//When your url has a query string:
Uri.EscapeUriString("https://mywebsite.com/api/get?id=123&name=get me this file.jpg");
//Output: "https://mywebsite.com/api/get?id=123&name=get%20me%20this%20file.jpg"
Uri.EscapeDataString("https://mywebsite.com/api/get?id=123&name=get me this file.jpg");
//Output: "https%3A%2F%2Fmywebsite.com%2Fapi%2Fget%3Fid%3D123%26name%3Dget%20me%20this%20file.jpg"
I needed to do this too, found this question from years ago but question title and text don't quite match up, and using Uri.EscapeDataString or UrlEncode (don't use that one please!) doesn't usually make sense unless we are talking about passing URLs as parameters to other URLs.
(For example, passing a callback URL when doing open ID authentication, Azure AD, etc.)
Hoping this is more pragmatic answer to the question: I want to make a string into a URL using C#, there must be something in the .NET framework that should help, right?
Yes - two functions are helpful for making URL strings in C#
String.Format for formatting the URL
Uri.EscapeDataString for escaping any parameters in the URL
This code
String.Format("https://site/app/?q={0}&redirectUrl={1}",
Uri.EscapeDataString("search for cats"),
Uri.EscapeDataString("https://mysite/myapp/?state=from idp"))
produces this result
https://site/app/?q=search%20for%20cats&redirectUrl=https%3A%2F%2Fmysite%2Fmyapp
Which can be safely copied and pasted into a browser's address bar, or the src attribute of a HTML A tag, or used with curl, or encoded into a QR code, etc.
Use HttpServerUtility.UrlEncode
HttpUtility.UrlDecode works for me:
var str = "name=John%20Doe";
var str2 = HttpUtility.UrlDecode(str);
str2 = "name=John Doe"
HttpUtility.UrlEncode Method (String)
The below code will replace repeating space with a single %20 character.
Example:
Input is:
Code by Hitesh Jain
Output:
Code%20by%20Hitesh%20Jain
Code
static void Main(string[] args)
{
Console.WriteLine("Enter a string");
string str = Console.ReadLine();
string replacedStr = null;
// This loop will repalce all repeat black space in single space
for (int i = 0; i < str.Length - 1; i++)
{
if (!(Convert.ToString(str[i]) == " " &&
Convert.ToString(str[i + 1]) == " "))
{
replacedStr = replacedStr + str[i];
}
}
replacedStr = replacedStr + str[str.Length-1]; // Append last character
replacedStr = replacedStr.Replace(" ", "%20");
Console.WriteLine(replacedStr);
Console.ReadLine();
}
HttpServerUtility.HtmlEncode
From the documentation:
String TestString = "This is a <Test String>.";
String EncodedString = Server.HtmlEncode(TestString);
But this actually encodes HTML, not URLs. Instead use UrlEncode(TestString).

Categories