Decoding Base64urlUInt-encoded value - c#

What I am generally trying to do, is to validate an id_token value obtained from an OpenID Connect provider (e.g. Google). The token is signed with the RSA algorithm and the public key is read from the Discovery document (the jwks_uri parameter). For example, Google keys are available here in the JWK format:
{
kty: "RSA",
alg: "RS256",
use: "sig",
kid: "38d516cbe31d4345819b786d4d227e3075df02fc",
n: "4fQxF6dFabDqsz9a9-XgVhDaadTBO4yBZkpUyUKrS98ZtpKIQRMLoph3bK9Cua828wwDZ9HHhUxOcbcUiNDUbubtsDz1AirWpCVRRauxRdRInejbGSqHMbg1bxWYfquKKQwF7WnrrSbgdInUZPv5xcHEjQ6q_Kbcsts1Nnc__8YRdmIGrtdTAcm1Ga8LfwroeyiF-2xn0mtWDnU7rblQI4qaXCwM8Zm-lUrpSUkO6E1RTJ1L0vRx8ieyLLOBzJNwxpIBNFolMK8-DYXDSX0SdR7gslInKCn8Ihd9mpI2QBuT-KFUi88t8TW4LsoWHAwlgXCRGP5cYB4r30NQ1wMiuQ",
e: "AQAB"
}
I am going to use the RSACryptoServiceProvider class for decoding the signature. To initialize it, I have to provide RSAParameters with the Modulus and Exponent values. These values are read from the above JWK as n and e correspondingly. According to the specification, these values are Base64urlUInt-encoded values:
The representation of a positive or zero integer value as the
base64url encoding of the value's unsigned big-endian representation
as an octet sequence. The octet sequence MUST utilize the minimum
number of octets needed to represent the value. Zero is represented
as BASE64URL(single zero-valued octet), which is "AA".
So, my question is how to decode these values to put them to RSAParameters? I tried decoding them as a common Base64url string (Convert.FromBase64String(modulusRaw)), but this obviously does not work and generates this error:
The input is not a valid Base-64 string as it contains a non-base 64
character, more than two padding characters, or an illegal character
among the padding characters.

RFC 7515 defines base64url encoding like this:
Base64 encoding using the URL- and filename-safe character set
defined in Section 5 of RFC 4648, with all trailing '='
characters omitted (as permitted by Section 3.2) and without the
inclusion of any line breaks, whitespace, or other additional
characters. Note that the base64url encoding of the empty octet
sequence is the empty string. (See Appendix C for notes on
implementing base64url encoding without padding.)
RFC 4648 defines "Base 64 Encoding with URL and Filename Safe Alphabet" as regular base64, but:
The padding may be omitted (as it is here)
Using - instead of + and _ instead of /
So to use regular Convert.FromBase64String, you just need to reverse that process:
static byte[] FromBase64Url(string base64Url)
{
string padded = base64Url.Length % 4 == 0
? base64Url : base64Url + "====".Substring(base64Url.Length % 4);
string base64 = padded.Replace("_", "/")
.Replace("-", "+");
return Convert.FromBase64String(base64);
}
It's possible that this code already exists somewhere in the framework, but I'm not aware of it.

Who ever comes here from Java: there are two methods in java.util.Base64:
getDecoder()
getUrlDecoder()
As you probably assumed: taking the second one does all the chars replacements for you already.

Related

What is the difference between Convert.ToBase64String(byte[]) and HttpServerUtility.UrlTokenEncode(byte[])?

I'm trying to remove a dependence on System.Web.dll from a Web API project, but have stumbled on a call to HttpServerUtility.UrlTokenEncode(byte[] input) (and its corresponding decode method) that I don't know what to replace with to ensure backwards compatibility. The documentation says that this method
Encodes a byte array into its equivalent string representation using base 64 digits, which is usable for transmission on the URL.
I tried substituting with Convert.ToBase64String(byte[] input) (and its corresponding decode method), which is very similarly described in the docs:
Converts an array of 8-bit unsigned integers to its equivalent string representation that is encoded with base-64 digits.
However, they don't seem to be entirely equivalent; when using Convert.FromBase64String(string input) to decode a string encoded with HttpServerUtility, I get an exception stating
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
What is the difference between these two conversion utilities? What's the correct way to remove this dependence on System.Web.HttpServerUtility?
Some users have suggested that this is a duplicate of this one, but I disagree. That question is about base-64-encoding a string in a url-safe manner in general, but I need to reproduce the exact behavior of HttpServerUtility but without a dependency on System.Web.
I took DGibbs on their word and Used the Source. It turns out the following happens in the HttpServerUtility methods:
Encoding to Base64
Use System.Convert to convert the input to Base64.
Replace + by - and / by _. Example: Foo+bar/=== becomes Foo-bar_===.
Replace any number of = at the end of the string, with an integer denoting how many they were. Example: Foo-bar_=== becomes Foo-bar_3.
Decoding from Base64
Replace the digit at the end of the string by the same number of = signs. Example: Foo-bar_3 becomes Foo-bar_===.
Replace - by + and _ by /. Example: Foo-bar_=== becomes Foo+bar/===.
Use System.Convert to decode the preprocessed input from Base64.
HttpServerUtility.UrlTokenEncode(byte[] input) will encode a URL safe Base64 string. In Base64 +, / and = characters are valid, but they are not URL safe, this method will replace these characters whereas the Convert.ToBase64String(byte[] input) will not. You can probably drop the reference and do it yourself.
Usually, '+' is replaced with '-' and '/' with '_' padding '=' is just removed.
Accepted answer here gives a code example: How to achieve Base64 URL safe encoding in C#?

ASCII Code Of Characters

In C# I need to get the ASCII code of some characters.
So I convert the char To byte Or int, then print the result.
String sample="A";
int AsciiInt = sample[0];
byte AsciiByte = (byte)sample[0];
For characters with ASCII code 128 and less, I get the right answer.
But for characters greater than 128 I get irrelevant answers!
I am sure all characters are less than 0xFF.
Also I have Tested System.Text.Encoding and got the same results.
For example: I get 172 For a char with actual byte value of 129!
Actually ASCII characters Like ƒ , ‡ , ‹ , “ , ¥ , © , Ï , ³ , · , ½ , » , Á Each character takes 1 byte and goes up to more than 193.
I Guess There is An Unicode Equivalent for Them and .Net Return That Because Interprets Strings As Unicode!
What If SomeOne Needs To Access The Actual Value of a byte , Whether It is a valid Known ASCII Character Or Not!!!
But For Characters Upper Than 128 I get Irrelevant answers
No you don't. You get the bottom 8 bits of the UTF-16 code unit corresponding to the char.
Now if your text were all ASCII, that would be fine - because ASCII only goes up to 127 anyway. It sounds like you're actually expecting the representation in some other encoding - so you need to work out which encoding that is, at which point you can use:
Encoding encoding = ...;
byte[] bytes = encoding.GetBytes(sample);
// Now extract the bytes you want. Note that a character may be represented by more than
// one byte.
If you're essentially looking for an encoding which treats bytes 0 to 255 respectively as U+0000 to U+00FF respectively, you should use ISO-8859-1, which you can access using Encoding.GetEncoding(28591).
You can't just ignore the issue of encoding. There is no inherent mapping between bytes and characters - that's defined by the encoding.
If I use your example of 131, on my system, this produces â. However, since you're obviously on an arabic system, you most likely have Windows-1256 encoding, which produces ƒ for 131.
In other words, if you need to use the correct encoding when converting characters to bytes and vice versa. In your case,
var sample = "ƒ";
var byteValue = Encoding.GetEncoding("windows-1256").GetBytes(sample)[0];
Which produces 131, as you seem to expect. Most importantly, this will work on all computers - if you want to have this system locale-specific, Encoding.Default can also work for you.
The only reason your method seems to work for bytes under 128 is that in UTF-8, the characters correspond to the ASCII standard mapping. However, you're misusing the term ASCII - it really only refers to these 7-bit characters. What you're calling ASCII is actually an extended 8-bit charset - all characters with the 8-bit set are charset-dependent.
We're no longer in a world when you can assume your application will only run on computers with the same locale you have - .NET is designed for this, which is why all strings are unicode. At the very least, read this http://www.joelonsoftware.com/articles/Unicode.html for an explanation of how encodings work, and to get rid of some of the serious and dangerous misconceptions you seem to have.

Algorithm to code URL [duplicate]

This question already has answers here:
URL Encoding using C#
(14 answers)
Closed 9 years ago.
is there some algorithm in C# to encode url with symbols that can correct display in web-browser?
something like Base64.
The Standard (RFC 3986 aka STD 66) lays it out for you. In particular, §2 and 2.1:
2. Characters
The URI syntax provides a method of encoding data, presumably for the
sake of identifying a resource, as a sequence of characters. The URI
characters are, in turn, frequently encoded as octets for transport
or presentation. This specification does not mandate any particular
character encoding for mapping between URI characters and the octets
used to store or transmit those characters. When a URI appears in a
protocol element, the character encoding is defined by that protocol;
without such a definition, a URI is assumed to be in the same
character encoding as the surrounding text.
The ABNF notation defines its terminal values to be non-negative
integers (codepoints) based on the US-ASCII coded character set
[ASCII]. Because a URI is a sequence of characters, we must invert
that relation in order to understand the URI syntax. Therefore, the
integer values used by the ABNF must be mapped back to their
corresponding characters via US-ASCII in order to complete the syntax
rules.
A URI is composed from a limited set of characters consisting of
digits, letters, and a few graphic symbols. A reserved subset of
those characters may be used to delimit syntax components within a
URI while the remaining characters, including both the unreserved set
and those reserved characters not acting as delimiters, define each
component's identifying data.
2.1. Percent-Encoding
A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component. A percent-encoded octet is encoded as a character
triplet, consisting of the percent character "%" followed by the two
hexadecimal digits representing that octet's numeric value. For
example, "%20" is the percent-encoding for the binary octet
"00100000" (ABNF: %x20), which in US-ASCII corresponds to the space
character (SP). Section 2.4 describes when percent-encoding and
decoding is applied.
pct-encoded = "%" HEXDIG HEXDIG
The uppercase hexadecimal digits 'A' through 'F' are equivalent to
the lowercase digits 'a' through 'f', respectively. If two URIs
differ only in the case of hexadecimal digits used in percent-encoded
octets, they are equivalent. For consistency, URI producers and
normalizers should use uppercase hexadecimal digits for all percent-
encodings.
In general, the only characters that may freely be represented in a URL without being percent-encoded are
The unreserved characters. These are the US-ASCII (7-bit) characters
A-Z
a-z
0-9
-._~
The reserved characters ... when in use as within their role in the grammar of a URL and its scheme. These reserved characters are:
:/?#[]#!$&'()*+,;=
Any other characters, per the standard must be properly percent-encoded.
Further note that a URL may only contains characters drawn from the US-ASCII character set (0x00-0x7F): If your URL contains characters outside that range of codepoints, those characters will need to be suitably encoded for representation in US-ASCII (e.g., via HTML/XML entity references). Further, you application is responsible for interpreting such.

How do I create a string with a surrogate pair inside of it?

I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair which will actually cause the string reversal to fail. How does one actually go about creating a string with a surrogate pair in it so that I can see the failure myself?
The simplest way is to use \U######## where the U is capital, and the # denote exactly eight hexadecimal digits. If the value exceeds 0000FFFF hexadecimal, a surrogate pair will be needed:
string myString = "In the game of mahjong \U0001F01C denotes the Four of circles";
You can check myString.Length to see that the one Unicode character occupies two .NET Char values. Note that the char type has a couple of static methods that will help you determine if a char is a part of a surrogate pair.
If you use a .NET language that does not have something like the \U######## escape sequence, you can use the method ConvertFromUtf32, for example:
string fourCircles = char.ConvertFromUtf32(0x1F01C);
Addition: If your C# source file has an encoding that allows all Unicode characters, like UTF-8, you can just put the charater directly in the file (by copy-paste). For example:
string myString = "In the game of mahjong 🀜 denotes the Four of circles";
The character is UTF-8 encoded in the source file (in my example) but will be UTF-16 encoded (surrogate pairs) when the application runs and the string is in memory.
(Not sure if Stack Overflow software handles my mahjong character correctly. Try clicking "edit" to this answer and copy-paste from the text there, if the "funny" character is not here.)
The term "surrogate pair" refers to a means of encoding Unicode characters with high code-points in the UTF-16 encoding scheme (see this page for more information);
In the Unicode character encoding, characters are mapped to values between 0x000000 and 0x10FFFF. Internally, a UTF-16 encoding scheme is used to store strings of Unicode text in which two-byte (16-bit) code sequences are considered. Since two bytes can only contain the range of characters from 0x0000 to 0xFFFF, some additional complexity is used to store values above this range (0x010000 to 0x10FFFF).
This is done using pairs of code points known as surrogates. The surrogate characters are classified in two distinct ranges known as low surrogates and high surrogates, depending on whether they are allowed at the start or the end of the two-code sequence.
Try this yourself:
String surrogate = "abc" + Char.ConvertFromUtf32(Int32.Parse("2A601", NumberStyles.HexNumber)) + "def";
Char[] surrogateArray = surrogate.ToCharArray();
Array.Reverse(surrogateArray);
String surrogateReversed = new String(surrogateArray);
or this, if you want to stick with the blog example:
String surrogate = "Les Mise" + Char.ConvertFromUtf32(Int32.Parse("0301", NumberStyles.HexNumber)) + "rables";
Char[] surrogateArray = surrogate.ToCharArray();
Array.Reverse(surrogateArray);
String surrogateReversed = new String(surrogateArray);
nnd then check the string values with the debugger. Jon Skeet is damn right... strings and dates seem easy but they are absolutely NOT.

C# How to encrypt string in which result are letters or numbers only without any other character?

I use some code to encrypt & decrypt string in C# but i want a good one that can generate encrypted string that contain only letters or numbers not any other ( + , / , ...)
Is there good one for that ?
You could use any encryption algorithm, then encode the result. Once you have binary data, you can push it out to any textual format. The result of an encryption algorithm is going to be a series of bytes, anyhow, so any textual representation is simply an encoding.
Hexadecimal would be fairly large, depending on your encrypted data. Base64 would almost encode it the way you want, except for the / and + symbols. Base32 would probably be the way to go, because it is A-Z, 2-7 and = for padding.
If you want to custom tailor your own encoding scheme, that is also an option, and would be very easy to implement. For example, you could take Base32, and replace the padding with 8, then you'd have just A-Z, 2-8.

Categories