Get a guid to encode using big-endian formatting C#

Get a guid to encode using big-endian formatting C# - c#

I have a unusual situation where by I have an existing MySQL database that uses binary(16) primary keys, these are the basis for UUIDs that are used in an existing api.
My problem is that I now want to add a replacement api written with dotnet core, and I'm running into a problem with encoding that has been explained here
Specifically, the Guid struct in dotnet uses a mixed-endian format that produces a different string to the existing api. This isn't acceptable for obvious reasons.
So my question is this: is there an elegant way to force the Guid struct to encode entirely with the big-endian format?
If there isn't I can just write a terrible hack, but I thought I'd check with the collective intelligence of the SO community first!

Nope; as far as I'm aware there's no inbuilt way to get this. And yes, Guid has what I can only call "crazy-endian" implementation currently. You'd need to get the Guid-ordered bits (either via unsafe or Guid.ToByteArray) and then order them manually, figuring out which chunks to reverse - it isn't a simple Array.Reverse(). So: very manual, I'm afraid. I suggest using a guid like
00010203-0405-0607-0809-0a0b0c0d0e0f
to debug it; this gives you (as I suspect you are aware):
03-02-01-00-05-04-07-06-08-09-0A-0B-0C-0D-0E-0F
so:
reverse 4
reverse 2
reverse 2
straight 8

As of 2021 there still isn't a built-in way to convert a System.Guid to a MySQL compatible big endian string in C#.
Here's the extension we came up with when we encountered this exact C# mixed-endian Guid problem at work:
public static string ToStringBigEndian(this Guid guid)
{
// allocate enough bytes to store Guid ASCII string
Span<byte> result = stackalloc byte[36];
// set all bytes to 0xFF (to be able to distinguish them from real data)
result.Fill(0xFF);
// get bytes from guid
Span<byte> buffer = stackalloc byte[16];
_ = guid.TryWriteBytes(buffer);
int skip = 0;
// iterate over guid bytes
for (int i = 0; i < buffer.Length; i++)
{
// indices 4, 6, 8 and 10 will contain a '-' delimiter character in the Guid string.
// --> leave space for those delimiters
if (i is 4 or 6 or 8 or 10)
{
skip++;
}
// stretch high and low bytes of every single byte into two bytes (skipping '-' delimiter characters)
result[(2 * i) + skip] = (byte)(buffer[i] >> 0x4);
result[(2 * i) + 1 + skip] = (byte)(buffer[i] & 0x0Fu);
}
// iterate over precomputed byte array.
// values 0x0 to 0xF are final hex values, but must be mapped to ASCII characters.
// value 0xFF is to be mapped to '-' delimiter character.
for (int i = 0; i < result.Length; i++)
{
// map bytes to ASCII values (a-f will be lowercase)
ref byte b = ref result[i];
b = b switch
{
0xFF => 0x2D, // Map 0xFF to '-' character
< 0xA => (byte)(b + 0x30u), // Map 0x0 - 0x9 to '0' - '9'
_ => (byte)(b + 0x57u) // Map 0xA - 0xF to 'a' - 'f'
};
}
// get string from ASCII encoded guid byte array
return Encoding.ASCII.GetString(result);
}
it's a bit lengthy but apart from the big endian string it returns it does no heap allocations so it's guaranteed to be fast :)

Related

C# change the first 32bit Int of a GUID

I have a GUID which I created with GUID.NewGUID(). Now I want to replace the first 32 bit of it with a specific 32-bit Integer while keeping the rest as they are.
Is there a function to do this?

You can use ToByteArray() function and then the Guid constructor.
byte[] buffer = Guid.NewGuid().ToByteArray();
buffer[0] = 0;
buffer[1] = 0;
buffer[2] = 0;
buffer[3] = 0;
Guid guid = new Guid(buffer);

Since the Guid struct has a constructor that takes a byte array and can return its current bytes, it's actually quite easy:
//Create a random, new guid
Guid guid = Guid.NewGuid();
Console.WriteLine(guid);
//The original bytes
byte[] guidBytes = guid.ToByteArray();
//Your custom bytes
byte[] first4Bytes = BitConverter.GetBytes((UInt32) 0815);
//Overwrite the first 4 Bytes
Array.Copy(first4Bytes, guidBytes, 4);
//Create new guid based on current values
Guid guid2 = new Guid(guidBytes);
Console.WriteLine(guid2);
Fiddle
Keep in mind however, that the order of bytes returned from BitConverter depends on your processor architecture (BitConverter.IsLittleEndian) and that your Guid's entropy decreases by 232 if you use the same number every time (which, depending on your application might not be as bad as it sounds, since you have 2128 to begin with).

The question is about replacing bits, but if someone wants to replace first characters of guid directly, this can be done by converting it to string, replacing characters in string and converting back. Note that replaced characters should be valid in hex, i.e. numbers 0 - 9 or letters a - f.
var uniqueGuid = Guid.NewGuid();
var uniqueGuidStr = "1234" + uniqueGuid.ToString().Substring(4);
var modifiedUniqueGuid = Guid.Parse(uniqueGuidStr);

Array of chars in hex format to integer?

I have an API which returns a byte[] over the network which represents information about a device.
It is in format 15ab1234cd\r\n where the first 2 characters are a HEX representation of the amount of data in the message.
I am aware I can convert this to a string via ASCIIEncoding.ASCII.GetString, and then use Convert.ToInt32(string.Substring(0, 2), 16) to achieve this. However the whole thing stays a byte array throughout the life of the whole program I am writing, and I don't want to convert to a string just for the purpose of getting the packet length.
Any suggestions of converting array of chars in hex format to an int in C#?

There is no .Net provided function that does it. Converting first 2 bytes to string with Encoding.GetString is very readable (possibly not most performant):
var hexValue = ASCIIEncoding.ASCII.GetString(byteData, 0, 2);
var intValue = Convert.ToInt32(hexValue, 16);
You can easily write conversion code (map '0'-'9' and 'a'-'f' / 'A'-'F' ranges to corresponding integer value and add together.
Here is one-statement conversion strictly for entertainment purposes. The resulting lambda (before ((byte)'0',(byte)'A') in sample takes 2 byte arguments assuming them to be ASCII characters and convert into integer.
((Func<Func<char,int>, Func<byte, byte, int>>)
(charToInt=> (c, c1)=>
charToInt(char.ToUpper((char)c)) * 16 + charToInt(char.ToUpper((char)c1))))
((Func<char, int>)(
c => c >= '0' && c <='9' ? c-'0' : c >='A' && c <= 'F' ? c - 'A' + 10 : 0))
((byte)'0',(byte)'A')

If you know the first two values are valid hexadecimal characters (0-9, A-Z, a-z), it is possible to convert to a hex value using logical operators.
int GetIntFromHexBytes(byte[] s, int start, int length)
{
int ret = 0;
for (int i = start; i < start+length; i++)
{
ret <<= 4;
ret |= (byte)((s[i] & 0x0f) + ((s[i] & 0x40) >> 6) * 9);
}
return ret;
}
(This works because c & 0x0f returns the 4 least significant bits, and will range from 0-9 for the values '0'-'9', and from 1 - 6 for both capital and lowercase letters ('a' - 'z' and 'A' - 'Z'). s[i] & 0x40 is 0 for numeric characters, and 0x40 for alpha characters; shifting right six characters provides a value of 0 for numeric characters and 1 for alphabetic characters. Shifting left and multiplying by 9 will add a bias of 9 for alpha characters to map A-F and a-f from 1-6 to 10-15.)
Given the byte array:
byte[] b = { (byte)'7', (byte)'f', (byte)'1', (byte)'c' };
Calling GetIntFromHexBytes(b, 0, 2) will return 127 (0x7f), the first two bytes of the array, as required.
As a caution: this approach does no bounds checking. A check can be added in the loop if needed to ensure that the input bytes are valid hex characters.

C to C# Bytearray + hex

I'm currently trying to get this C code converted into C#.
Since I'm not really familiar with C I'd really apprecheate your help!
static unsigned char byte_table[2080] = {0};
First of, some bytearray gets declared but never filled which I'm okay with
BYTE* packet = //bytes come in here from a file
int unknownVal = 0;
int unknown_field0 = *(DWORD *)(packet + 0x08);
do
{
*((BYTE *)packet + i) ^= byte_table[(i + unknownVal) & 0x7FF];
++i;
}
while (i <= packet[0]);
But down here.. I really have no idea how to translate this into C#
BYTE = byte[] right?
DWORD = double?
but how can (packet + 0x08) be translated? How can I add a hex to a bytearray? Oo
I'd be happy about anything that helps! :)

In C, setting any set of memory to {0} will set the entire memory area to zeroes, if I'm not mistaken.
That bottom loop can be rewritten in a simpler, C# friendly fashion.
byte[] packet = arrayofcharsfromfile;
int field = packet[8]+(packet[9]<<8)+(packet[10]<<16)+(packet[11]<<24); //Assuming 32 bit little endian integer
int unknownval = 0;
int i = 0;
do //Why waste the newline? I don't know. Conventions are silly!
{
packet[i] ^= byte_table[(i+unknownval) & 0x7FF];
} while( ++i <= packet[0] );
field is set by taking the four bytes including and following index 8 and generating a 32 bit int from them.
In C, you can cast pointers to other types, as is done in your provided snippet. What they're doing is taking an array of bytes (each one 1/4 the size of a DWORD) and adding 8 to the index which advances the pointer by 8 bytes (since each element is a byte wide) and then treating that pointer as a DWORD pointer. In simpler terms, they're turning the byte array in to a DWORD array, and then taking index 2, as 8/4=2.
You can simulate this behavior in a safe fashion by stringing the bytes together with bitshifting and addition, as I demonstrated above. It's not as efficient and isn't as pretty, but it accomplishes the same thing, and in a platform agnostic way too. Not all platforms are little endian.

Convert int32 to string in base 16

I'm currently trying to convert a .NET JSON Encoder to NETMF but have hit a problem with Convert.ToString() as there isn't such thing in NETMF.
The original line of the encoder looks like this:
Convert.ToString(codepoint, 16);
And after looking at the documentation for Convert.ToString(Int32, Int32) it says it's for converting an int32 into int 2, 8, 10 or 16 by providing the int as the first parameter and the base as the second.
What are some low level code of how to do this or how would I go about doing this?
As you can see from the code, I only need conversion from an Int32 to Int16.
EDIT
Ah, the encoder also then wants to do:
PadLeft(4, '0');
on the string, is this just adding 4 '0' + '0' + '0' + '0' to the start of the string?

If you mean you want to change a 32-bit integer value into a string which shows the value in hexadecimal:
string hex = intValue.ToString("x");
For variations, please see Stack Overflow question Convert a number into the hex value in .NET.
Disclaimer: I'm not sure if this function exists in NETMF, but it is so fundamental that I think it should.

Here’s some sample code for converting an integer to hexadecimal (base 16):
int num = 48764; // assign your number
// Generate hexadecimal number in reverse.
var sb = new StringBuilder();
do
{
sb.Append(hexChars[num & 15]);
num >>= 4;
}
while (num > 0);
// Pad with leading 0s for a minimum length of 4 characters.
while (sb.Length < 4)
sb.Append('0');
// Reverse string and get result.
char[] chars = new char[sb.Length];
sb.CopyTo(0, chars, 0, sb.Length);
Array.Reverse(chars);
string result = new string(chars);
PadLeft(4, '0') prepends leading 0s to the string to ensure a minimum length of 4 characters.
The hexChars value lookup may be trivially defined as a string:
internal static readonly string hexChars = "0123456789ABCDEF";
Edit: Replacing StringBuilder with List<char>:
// Generate hexadecimal number in reverse.
List<char> builder = new List<char>();
do
{
builder.Add(hexChars[num & 15]);
num >>= 4;
}
while (num > 0);
// Pad with leading 0s for a minimum length of 4 characters.
while (builder.Count < 4)
builder.Add('0');
// Reverse string and get result.
char[] chars = new char[builder.Count];
for (int i = 0; i < builder.Count; ++i)
chars[i] = builder[builder.Count - i - 1];
string result = new string(chars);
Note: Refer to the “Hexadecimal Number Output” section of Expert .NET Micro Framework for a discussion of this conversion.

Best way to shorten UTF8 string based on byte length

A recent project called for importing data into an Oracle database. The program that will do this is a C# .Net 3.5 app and I'm using the Oracle.DataAccess connection library to handle the actual inserting.
I ran into a problem where I'd receive this error message when inserting a particular field:
ORA-12899 Value too large for column X
I used Field.Substring(0, MaxLength); but still got the error (though not for every record).
Finally I saw what should have been obvious, my string was in ANSI and the field was UTF8. Its length is defined in bytes, not characters.
This gets me to my question. What is the best way to trim my string to fix the MaxLength?
My substring code works by character length. Is there simple C# function that can trim a UT8 string intelligently by byte length (ie not hack off half a character) ?

I think we can do better than naively counting the total length of a string with each addition. LINQ is cool, but it can accidentally encourage inefficient code. What if I wanted the first 80,000 bytes of a giant UTF string? That's a lot of unnecessary counting. "I've got 1 byte. Now I've got 2. Now I've got 13... Now I have 52,384..."
That's silly. Most of the time, at least in l'anglais, we can cut exactly on that nth byte. Even in another language, we're less than 6 bytes away from a good cutting point.
So I'm going to start from #Oren's suggestion, which is to key off of the leading bit of a UTF8 char value. Let's start by cutting right at the n+1th byte, and use Oren's trick to figure out if we need to cut a few bytes earlier.
Three possibilities
If the first byte after the cut has a 0 in the leading bit, I know I'm cutting precisely before a single byte (conventional ASCII) character, and can cut cleanly.
If I have a 11 following the cut, the next byte after the cut is the start of a multi-byte character, so that's a good place to cut too!
If I have a 10, however, I know I'm in the middle of a multi-byte character, and need to go back to check to see where it really starts.
That is, though I want to cut the string after the nth byte, if that n+1th byte comes in the middle of a multi-byte character, cutting would create an invalid UTF8 value. I need to back up until I get to one that starts with 11 and cut just before it.
Code
Notes: I'm using stuff like Convert.ToByte("11000000", 2) so that it's easy to tell what bits I'm masking (a little more about bit masking here). In a nutshell, I'm &ing to return what's in the byte's first two bits and bringing back 0s for the rest. Then I check the XX from XX000000 to see if it's 10 or 11, where appropriate.
I found out today that C# 6.0 might actually support binary representations, which is cool, but we'll keep using this kludge for now to illustrate what's going on.
The PadLeft is just because I'm overly OCD about output to the Console.
So here's a function that'll cut you down to a string that's n bytes long or the greatest number less than n that's ends with a "complete" UTF8 character.
public static string CutToUTF8Length(string str, int byteLength)
{
byte[] byteArray = Encoding.UTF8.GetBytes(str);
string returnValue = string.Empty;
if (byteArray.Length > byteLength)
{
int bytePointer = byteLength;
// Check high bit to see if we're [potentially] in the middle of a multi-byte char
if (bytePointer >= 0
&& (byteArray[bytePointer] & Convert.ToByte("10000000", 2)) > 0)
{
// If so, keep walking back until we have a byte starting with `11`,
// which means the first byte of a multi-byte UTF8 character.
while (bytePointer >= 0
&& Convert.ToByte("11000000", 2) != (byteArray[bytePointer] & Convert.ToByte("11000000", 2)))
{
bytePointer--;
}
}
// See if we had 1s in the high bit all the way back. If so, we're toast. Return empty string.
if (0 != bytePointer)
{
returnValue = Encoding.UTF8.GetString(byteArray, 0, bytePointer); // hat tip to #NealEhardt! Well played. ;^)
}
}
else
{
returnValue = str;
}
return returnValue;
}
I initially wrote this as a string extension. Just add back the this before string str to put it back into extension format, of course. I removed the this so that we could just slap the method into Program.cs in a simple console app to demonstrate.
Test and expected output
Here's a good test case, with the output it create below, written expecting to be the Main method in a simple console app's Program.cs.
static void Main(string[] args)
{
string testValue = "12345“”67890”";
for (int i = 0; i < 15; i++)
{
string cutValue = Program.CutToUTF8Length(testValue, i);
Console.WriteLine(i.ToString().PadLeft(2) +
": " + Encoding.UTF8.GetByteCount(cutValue).ToString().PadLeft(2) +
":: " + cutValue);
}
Console.WriteLine();
Console.WriteLine();
foreach (byte b in Encoding.UTF8.GetBytes(testValue))
{
Console.WriteLine(b.ToString().PadLeft(3) + " " + (char)b);
}
Console.WriteLine("Return to end.");
Console.ReadLine();
}
Output follows. Notice that the "smart quotes" in testValue are three bytes long in UTF8 (though when we write the chars to the console in ASCII, it outputs dumb quotes). Also note the ?s output for the second and third bytes of each smart quote in the output.
The first five characters of our testValue are single bytes in UTF8, so 0-5 byte values should be 0-5 characters. Then we have a three-byte smart quote, which can't be included in its entirety until 5 + 3 bytes. Sure enough, we see that pop out at the call for 8.Our next smart quote pops out at 8 + 3 = 11, and then we're back to single byte characters through 14.
0: 0::
1: 1:: 1
2: 2:: 12
3: 3:: 123
4: 4:: 1234
5: 5:: 12345
6: 5:: 12345
7: 5:: 12345
8: 8:: 12345"
9: 8:: 12345"
10: 8:: 12345"
11: 11:: 12345""
12: 12:: 12345""6
13: 13:: 12345""67
14: 14:: 12345""678
49 1
50 2
51 3
52 4
53 5
226 â
128 ?
156 ?
226 â
128 ?
157 ?
54 6
55 7
56 8
57 9
48 0
226 â
128 ?
157 ?
Return to end.
So that's kind of fun, and I'm in just before the question's five year anniversary. Though Oren's description of the bits had a small error, that's exactly the trick you want to use. Thanks for the question; neat.

Here are two possible solution - a LINQ one-liner processing the input left to right and a traditional for-loop processing the input from right to left. Which processing direction is faster depends on the string length, the allowed byte length, and the number and distribution of multibyte characters and is hard to give a general suggestion. The decision between LINQ and traditional code I probably a matter of taste (or maybe speed).
If speed matters, one could think about just accumulating the byte length of each character until reaching the maximum length instead of calculating the byte length of the whole string in each iteration. But I am not sure if this will work because I don't know UTF-8 encoding well enough. I could theoreticaly imagine that the byte length of a string does not equal the sum of the byte lengths of all characters.
public static String LimitByteLength(String input, Int32 maxLength)
{
return new String(input
.TakeWhile((c, i) =>
Encoding.UTF8.GetByteCount(input.Substring(0, i + 1)) <= maxLength)
.ToArray());
}
public static String LimitByteLength2(String input, Int32 maxLength)
{
for (Int32 i = input.Length - 1; i >= 0; i--)
{
if (Encoding.UTF8.GetByteCount(input.Substring(0, i + 1)) <= maxLength)
{
return input.Substring(0, i + 1);
}
}
return String.Empty;
}

Shorter version of ruffin's answer. Takes advantage of the design of UTF8:
public static string LimitUtf8ByteCount(this string s, int n)
{
// quick test (we probably won't be trimming most of the time)
if (Encoding.UTF8.GetByteCount(s) <= n)
return s;
// get the bytes
var a = Encoding.UTF8.GetBytes(s);
// if we are in the middle of a character (highest two bits are 10)
if (n > 0 && ( a[n]&0xC0 ) == 0x80)
{
// remove all bytes whose two highest bits are 10
// and one more (start of multi-byte sequence - highest bits should be 11)
while (--n > 0 && ( a[n]&0xC0 ) == 0x80)
;
}
// convert back to string (with the limit adjusted)
return Encoding.UTF8.GetString(a, 0, n);
}

All of the other answers appear to miss the fact that this functionality is already built into .NET, in the Encoder class. For bonus points, this approach will also work for other encodings.
public static string LimitByteLength(string message, int maxLength)
{
if (string.IsNullOrEmpty(message) || Encoding.UTF8.GetByteCount(message) <= maxLength)
{
return message;
}
var encoder = Encoding.UTF8.GetEncoder();
byte[] buffer = new byte[maxLength];
char[] messageChars = message.ToCharArray();
encoder.Convert(
chars: messageChars,
charIndex: 0,
charCount: messageChars.Length,
bytes: buffer,
byteIndex: 0,
byteCount: buffer.Length,
flush: false,
charsUsed: out int charsUsed,
bytesUsed: out int bytesUsed,
completed: out bool completed);
// I don't think we can return message.Substring(0, charsUsed)
// as that's the number of UTF-16 chars, not the number of codepoints
// (think about surrogate pairs). Therefore I think we need to
// actually convert bytes back into a new string
return Encoding.UTF8.GetString(buffer, 0, bytesUsed);
}
If you're using .NET Standard 2.1+, you can simplify it a bit:
public static string LimitByteLength(string message, int maxLength)
{
if (string.IsNullOrEmpty(message) || Encoding.UTF8.GetByteCount(message) <= maxLength)
{
return message;
}
var encoder = Encoding.UTF8.GetEncoder();
byte[] buffer = new byte[maxLength];
encoder.Convert(message.AsSpan(), buffer.AsSpan(), false, out _, out int bytesUsed, out _);
return Encoding.UTF8.GetString(buffer, 0, bytesUsed);
}
None of the other answers account for extended grapheme clusters, such as 👩🏽‍🚒. This is composed of 4 Unicode scalars (👩, 🏽, a zero-width joiner, and 🚒), so you need knowledge of the Unicode standard to avoid splitting it in the middle and producing 👩 or 👩🏽.
In .NET 5 onwards, you can write this as:
public static string LimitByteLength(string message, int maxLength)
{
if (string.IsNullOrEmpty(message) || Encoding.UTF8.GetByteCount(message) <= maxLength)
{
return message;
}
var enumerator = StringInfo.GetTextElementEnumerator(message);
var result = new StringBuilder();
int lengthBytes = 0;
while (enumerator.MoveNext())
{
lengthBytes += Encoding.UTF8.GetByteCount(enumerator.GetTextElement());
if (lengthBytes <= maxLength)
{
result.Append(enumerator.GetTextElement());
}
}
return result.ToString();
}
(This same code runs on earlier versions of .NET, but due to a bug it won't produce the correct result before .NET 5).

If a UTF-8 byte has a zero-valued high order bit, it's the beginning of a character. If its high order bit is 1, it's in the 'middle' of a character. The ability to detect the beginning of a character was an explicit design goal of UTF-8.
Check out the Description section of the wikipedia article for more detail.

Is there a reason that you need the database column to be declared in terms of bytes? That's the default, but it's not a particularly useful default if the database character set is variable width. I'd strongly prefer declaring the column in terms of characters.
CREATE TABLE length_example (
col1 VARCHAR2( 10 BYTE ),
col2 VARCHAR2( 10 CHAR )
);
This will create a table where COL1 will store 10 bytes of data and col2 will store 10 characters worth of data. Character length semantics make far more sense in a UTF8 database.
Assuming you want all the tables you create to use character length semantics by default, you can set the initialization parameter NLS_LENGTH_SEMANTICS to CHAR. At that point, any tables you create will default to using character length semantics rather than byte length semantics if you don't specify CHAR or BYTE in the field length.

Following Oren Trutner's comment here are two more solutions to the problem:
here we count the number of bytes to remove from the end of the string according to each character at the end of the string, so we don't evaluate the entire string in every iteration.
string str = "朣楢琴执执 瑩浻牡楧硰执执獧浻牡楧敬瑦 瀰 絸朣杢执獧扻捡杫潲湵 潣"
int maxBytesLength = 30;
var bytesArr = Encoding.UTF8.GetBytes(str);
int bytesToRemove = 0;
int lastIndexInString = str.Length -1;
while(bytesArr.Length - bytesToRemove > maxBytesLength)
{
bytesToRemove += Encoding.UTF8.GetByteCount(new char[] {str[lastIndexInString]} );
--lastIndexInString;
}
string trimmedString = Encoding.UTF8.GetString(bytesArr,0,bytesArr.Length - bytesToRemove);
//Encoding.UTF8.GetByteCount(trimmedString);//get the actual length, will be <= 朣楢琴执执 瑩浻牡楧硰执执獧浻牡楧敬瑦 瀰 絸朣杢执獧扻捡杫潲湵 潣潬昣昸昸慢正
And an even more efficient(and maintainable) solution:
get the string from the bytes array according to desired length and cut the last character because it might be corrupted
string str = "朣楢琴执执 瑩浻牡楧硰执执獧浻牡楧敬瑦 瀰 絸朣杢执獧扻捡杫潲湵 潣"
int maxBytesLength = 30;
string trimmedWithDirtyLastChar = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(str),0,maxBytesLength);
string trimmedString = trimmedWithDirtyLastChar.Substring(0,trimmedWithDirtyLastChar.Length - 1);
The only downside with the second solution is that we might cut a perfectly fine last character, but we are already cutting the string, so it might fit with the requirements.
Thanks to Shhade who thought about the second solution

This is another solution based on binary search:
public string LimitToUTF8ByteLength(string text, int size)
{
if (size <= 0)
{
return string.Empty;
}
int maxLength = text.Length;
int minLength = 0;
int length = maxLength;
while (maxLength >= minLength)
{
length = (maxLength + minLength) / 2;
int byteLength = Encoding.UTF8.GetByteCount(text.Substring(0, length));
if (byteLength > size)
{
maxLength = length - 1;
}
else if (byteLength < size)
{
minLength = length + 1;
}
else
{
return text.Substring(0, length);
}
}
// Round down the result
string result = text.Substring(0, length);
if (size >= Encoding.UTF8.GetByteCount(result))
{
return result;
}
else
{
return text.Substring(0, length - 1);
}
}

public static string LimitByteLength3(string input, Int32 maxLenth)
{
string result = input;
int byteCount = Encoding.UTF8.GetByteCount(input);
if (byteCount > maxLenth)
{
var byteArray = Encoding.UTF8.GetBytes(input);
result = Encoding.UTF8.GetString(byteArray, 0, maxLenth);
}
return result;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get a guid to encode using big-endian formatting C# - c#

Related

C# change the first 32bit Int of a GUID

Array of chars in hex format to integer?

C to C# Bytearray + hex

Convert int32 to string in base 16

Best way to shorten UTF8 string based on byte length

Categories

Resources