Really destroy a string in c# - c#

I am writing an API which will accept a standard string for a username and password as a means of compatibility. I know standard strings are not ideal and my API already uses the SecureString class for this purpose and my summaries above methods warn the API user of this. However since the API may not be used in an environment where SecureString is possible, I have written a function to really destroy a string as soon as my SecureString Extention methods convert the standard string to SecureString.
public static void CrunchString(ref string str) {
int l = str.Length;
unsafe {
fixed (char* c = str) {
for (int i = 0; i < l; ++i) {
c[i] = (char)0x00;
}
}
}
str = null;
}
Is this the right way about it or is there a better solution? Are there any consequences that could be forseen by destroying the string in this nature in-place?
The aim here is to really destroy the un secured string early on and to thoroughly remove it from normal memory.

Related

What is the encoding of the string returned from Marshal.PtrToStringAnsi?

I'm implementing a custom Marshaler, to send the utf8 string from/to native from/to managed.
{
[ComVisible(true)]
public class UTF8StringMarshaler : ICustomMarshaler
{
private static ICustomMarshaler marshalerInstance = new UTF8StringMarshaler();
public static ICustomMarshaler GetInstance(string optionalCookie)
{
return marshalerInstance;
}
public void CleanUpManagedData(object ManagedObj)
{
//Managed Data will be deleted by the garbage collector
}
public void CleanUpNativeData(IntPtr pNativeData)
{
Marshal.FreeCoTaskMem(pNativeData);
}
public int GetNativeDataSize()
{
//Not used in our case
return -1;
}
public IntPtr MarshalManagedToNative(object ManagedObj)
{
if (ManagedObj == null || ManagedObj as string == null)
return IntPtr.Zero;
if (!(ManagedObj is string))
throw new MarshalDirectiveException("UTF8StringMarshaler can only be used on String.");
UTF8Encoding utf8Encoder = new UTF8Encoding();
string utf8string = ManagedObj as string;
byte[] stringBuffer = utf8Encoder.GetBytes(utf8string);
IntPtr buffer = Marshal.AllocCoTaskMem(stringBuffer.Length + 1);
Marshal.Copy(stringBuffer, 0, buffer, stringBuffer.Length);
Marshal.WriteByte(buffer + stringBuffer.Length, 0);
return buffer;
}
public unsafe object MarshalNativeToManaged(IntPtr pNativeData)
{
if (pNativeData == IntPtr.Zero)
return null;
string temp = null;
UTF8Encoding utf8Encoder = new UTF8Encoding(true, true);
byte* buffer = (byte*)pNativeData;
while (*buffer != 0)
{
buffer++;
}
int length = (int)(buffer - (byte*)pNativeData);
byte[] stringbuffer = new byte[length];
Marshal.Copy(pNativeData, stringbuffer, 0, length);
try
{
temp = utf8Encoder.GetString(stringbuffer);
}
catch (EncoderFallbackException e)
{
Console.WriteLine("Encoding Exception type {0}, Error {1}", e.GetType().Name, e.Message);
}
return temp;
}
}
This implementation works except when the C# string is from Marshal.PtrToStringAnsi function.
so in MarshalNativeToManaged function, I need to verify if the string is the right encoding from the Marshal.PtrToStringAnsi
From the Microsoft Doc, Marshal.PtrToStringAnsi widens each ANSI character to Unicode:
Copies all characters up to the first null character from an unmanaged ANSI string to a managed String, and widens each ANSI character to Unicode.
So the question is, what is the Encoding of the string from Marshal.PtrToStringAnsi function?
Is there a simpler way to verify if the string is from that function?
what is the Encoding of the string from Marshal.PtrToStringAnsi function?
There is no one "ANSI" encoding. It is whatever the current code page of your system is. It will depend on the user's locale settings. This should correspond to the CharSet enum:
Ansi: Marshal strings as multiple-byte character strings: the system default Windows (ANSI) code page on Windows, and UTF-8 on Unix.
Note the special handling on Unix though (and on, I presume, Linux).
Is there a simpler way to verify if the string is from that function?
That seems to me to be a completely different question from what appears to be the main one. In particular: knowing what encoding the function will use when converting from "ANSI" to UTF-16 (the internal text encoding used by .NET) doesn't seem to me to lead to a way to "verify if the string is from that function". Once you have a C# string object, it's already been encoded as UTF-16. It could have originated from practically any encoding.
It's also not clear from your question what you mean by "works except when the C# string is from Marshal.PtrToStringAnsi function". That is, in what way precisely does it not work under that scenario? Your marshaler appears to be responsible for nothing more than passing UTF-8 bytes to or from the native code. Given a C# string object, it should never matter how that string was created. It is now a string of UTF-16 characters, which can be reliably re-encoded as UTF-8. If there's a problem with "ANSI" text, that problem occurred before your marshaler got involved. Your marshaler shouldn't have to concern itself with that.
Finally: why not just use Encoding.UTF8 instead of instantiating a new UTF8Encoding object on every marshaling operation? At the very least, you should be caching the object, but since GetBytes() and GetString() work the same for any instance of UTF8Encoding, really you should just use the one that .NET has already created for you, and let .NET deal with caching the object.

Efficiency of static constant list initialization in C# vs C++ static arrays

I apologize in advance. My domain is mostly C (and C++). I'm trying to write something similar in C#. Let me explain with code.
In C++, I can use large static arrays that are processed during compile-time and stored in a read-only section of the PE file. For instance:
typedef struct _MY_ASSOC{
const char* name;
unsigned int value;
}MY_ASSOC, *LPMY_ASSOC;
bool GetValueForName(const char* pName, unsigned int* pnOutValue = nullptr)
{
bool bResult = false;
unsigned int nValue = 0;
static const MY_ASSOC all_assoc[] = {
{"name1", 123},
{"name2", 213},
{"name3", 1433},
//... more to follow
{"nameN", 12837},
};
for(size_t i = 0; i < _countof(all_assoc); i++)
{
if(strcmp(all_assoc[i].name, pName) == 0)
{
nValue = all_assoc[i].value;
bResult = true;
break;
}
}
if(pnOutValue)
*pnOutValue = nValue;
return bResult;
}
In the example above, the initialization of static const MY_ASSOC all_assoc is never called at run-time. It is entirely processed during the compile-time.
Now if I write something similar in C#:
public struct NameValue
{
public string name;
public uint value;
}
private static readonly NameValue[] g_arrNV_Assoc = new NameValue[] {
new NameValue() { name = "name1", value = 123 },
new NameValue() { name = "name2", value = 213 },
new NameValue() { name = "name3", value = 1433 },
// ... more to follow
new NameValue() { name = "nameN", value = 12837 },
};
public static bool GetValueForName(string name, out uint nOutValue)
{
foreach (NameValue nv in g_arrNV_Assoc)
{
if (name == nv.name)
{
nOutValue = nv.value;
return true;
}
}
nOutValue = 0;
return false;
}
The line private static readonly NameValue[] g_arrNV_Assoc has to be called once during the host class initialization, and it is done for every single element in that array!
So my question -- can I somehow optimize it so that the data stored in g_arrNV_Assoc array is stored in the PE section and not initialized at run-time?
PS. I hope I'm clear for the .NET folks with my terminology.
Indeed the terminology is sufficient enough, large static array is fine.
There is nothing you can really do to make it more efficient out of the box.
It will load initially once (at different times depending on which version of .net and if you have a static constructor). However, it will load before you call it.
Even if you created it empty with just the predetermined size, the CLR is still going to initialize each element to default, then you would have to buffer copy over your data somehow which in turn will have to be loaded from file.
The question are though
How much overhead does loading the default static array of struct actually have compared to what you are doing in C
Does it matter when in the lifecycle of the application when its loaded
And if this is way too much over-head (which i have already assumed you have determined), what other options are possibly available outside the box?
You could possibly pre-allocate a chunk of unmanaged memory, then read and copy the bytes in from somewhere, then inturn access using pointers.
You could also create this in a standard Dll, Pinvoke just like an other un-managed DLL. However i'm not really sure you will get much of a free-lunch here anyway, as there is overhead to marshal these sorts of calls to load your dll.
If your question is only academic, these are really your only options. However if this is actually a performance problem you have, you will need to try and benchmark this for micro-optimization and try to figure out what is suitable to you.
Anyway, i don't profess to know everything, maybe someone else has a better idea or more information. Good luck

Using Regex in a MultiThreaded Environment

I have a constructor that uses several Regex objects which are static readonly in a class called RegexLib (mainly because this project uses a whole lot of Regex patterns that need to be used all over the place.
Upon the user adding some files to the application, this constructor gets called once for each file (run aynchronously across several threads). I've attached the relevant function that the constructor calls below.
private void GetSymbolsFromLines()
{
for (int i = 0; i < Lines.Length; i++)
{
string line = Lines[i];
if (RegexLib.InstString.IsMatch(line))
{
int instString = i;
int userdataString = 0;
for (int j = i; j < Lines.Length; j++)
{
if (RegexLib.UserdataString.IsMatch(Lines[j]))
{
userdataString = j;
break;
}
else if (Lines[j].Contains("userdata"))
{
break;
}
}
if (userdataString != 0)
{
_symbols.Add(new Symbol(RegexLib.InstString.Match(Lines[instString]),
RegexLib.UserdataString.Match(Lines[userdataString])));
}
}
}
}
The Regex objects are all fairly similar to these and have been tested using Regex Hero.
public static readonly Regex AliasFromUserdata = new Regex(#"text_alias=(?<AliasName>\w+).*?value=(?<AliasValue>(.*?))\^(?=(?:text_alias|\""))");
public static readonly Regex UpdateFromUserdata = new Regex("FOX_VAR=.*?attr=(?<AttributeType>.+?)\\^(?<AttributePropertyString>.*?)\\^(?:(?=(?:FOX_VAR|END_FOXV)))");
For some reason, the use of Regex seems to cause some issues in this multithreaded environment and a dig into the documentation revealed that this could be because:
However, result objects (Match and MatchCollection) returned by Regex should be used on a single thread.
So my question is, is there an easy way to use Regex accross mutliple threads whilst structuring them inside a library class for organisational reasons?
The only likely solution I can think of short of moving the Regex declaration closer to use is to clone the objects before use, but this seems like it could be quite slow.
For reference, here is the Worker Function that runs concurrently on 4 different threads.
private void FoxFileConvWorker(ConcurrentQueue<string> queue,QueueProgressData qpd)
{
string[] extensions = {".fdf", ".m1", ".g"};
while (!queue.IsEmpty)
{
string file;
if (queue.TryDequeue(out file))
{
if (extensions.Any(extension => Path.GetExtension(file) == extension))
{
try
{
_jobGraphics.Add(new Graphic(file));
IncrementProgress(qpd);
}
catch (Exception e)
{
ThreadSafeControlMethods.SetText(qpd.LblStatus, "Non-Fatal Error");
WriteLog(e, $"Creating Graphic DOM for {file}");
#if DEBUG
throw;
#endif
}
}
}
}
}
Can you modify your static Regex class ? If so, you can use a Factory instead of static properties:
static class RegexLib
{
static Regex CreateInstString(){
{
return new Regex("YourRegex");
}
static Regex CreateUserdataString(){
{
return new Regex("YourOtherRegex");
}
[..]
}
This way, you regex will not be shared among threads.
You could also use some dependency injection but this means a lot of refactoring in your code.

classes and threading

I have the following code:
public class Search
{
StringBuilder sb = new StringBuilder();
string[] myparams;
public void Start()
{
//Start search threads
for (int i = 0; i < 50; i++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
string text1 = GetFirstRequest(url, myparams);
string text2 = GetFirstRequest(url, myparams);
}, ct, TaskCreationOptions.LongRunning, TaskScheduler.Default));
}
}
private string GetFirstRequest(string url, string[] myparams)
{
//Use stringbuilder to build the complete url with params
//Use webrequest, response and stream to return the url contents
}
private string GetSecondRequest(string url, string[] myparams)
{
//Similar to GetFirstRequest
}
}
For my main form I call:
Search search = new Search();
search.Start();
As you can see from the code above, individual threads are created. However, each thread is calling the same private functions in the Search class in order to access the url.
Is the code thread-safe? Is it better to place the private functions into a separate class and create a class for each thread?
Without seeing the actual code for GetFirstRequest and GetSecondRequest, we can't tell - but the fact that you've got an instance variable of type StringBuilder makes me skeptical. StringBuilder itself isn't thread-safe, and if you're modifying a single object in multiple threads I doubt that you'll get the result you want anyway.
If you're using StringBuilder to build a complete URL, why not just create that StringBuilder in each method? If you don't need to change any of the state of your object, you'll be a long way towards being thread-safe.
Also note that your method has a params parameter but could also access the params instance variable (which would need a different name anyway as params is a keyword in C#). Do you really need that duplication? Why not just use the instance variable from the method?
It feels like this class can be made thread-safe, but almost certainly isn't yet. You need to design it to be thread-safe - which means either avoiding any state mutation, or using appropriate locking. (The former approach is usually cleaner where it's possible.)

Understanding of .NET internal StringBuilderCache class configuration

When I was looking at decompiled .NET assemblies to see some internals, I've noticed interesting StringBuilderCache class used by multiple framework's methods:
internal static class StringBuilderCache
{
[ThreadStatic]
private static StringBuilder CachedInstance;
private const int MAX_BUILDER_SIZE = 360;
public static StringBuilder Acquire(int capacity = 16)
{
if (capacity <= 360)
{
StringBuilder cachedInstance = StringBuilderCache.CachedInstance;
if (cachedInstance != null && capacity <= cachedInstance.Capacity)
{
StringBuilderCache.CachedInstance = null;
cachedInstance.Clear();
return cachedInstance;
}
}
return new StringBuilder(capacity);
}
public static void Release(StringBuilder sb)
{
if (sb.Capacity <= 360)
{
StringBuilderCache.CachedInstance = sb;
}
}
public static string GetStringAndRelease(StringBuilder sb)
{
string result = sb.ToString();
StringBuilderCache.Release(sb);
return result;
}
}
Example usage we can find for example in string.Format method:
public static string Format(IFormatProvider provider, string format, params object[] args)
{
...
StringBuilder stringBuilder = StringBuilderCache.Acquire(format.Length + args.Length * 8);
stringBuilder.AppendFormat(provider, format, args);
return StringBuilderCache.GetStringAndRelease(stringBuilder);
}
While it is quite clever and for sure I will remember about such caching pattern, I wonder why MAX_BUILDER_SIZE is so small? Setting it to, let's set 2kB, wouldn't be better? It would prevent from creating bigger StringBuilder instances with a quite little memory overhead.
It is a per-thread cache so a low number is expected. Best to use the Reference Source for questions like this, you'll see the comments as well, which looks like (edited to fit):
// The value 360 was chosen in discussion with performance experts as a
// compromise between using as litle memory (per thread) as possible and
// still covering a large part of short-lived StringBuilder creations on
// the startup path of VS designers.
private const int MAX_BUILDER_SIZE = 360;
"VS designers" is a wee bit puzzling. Well, not really, surely this work was done to optimize Visual Studio. Neelie Kroes would have a field day and the EU would have another billion dollars if she would find out :)
Most strings built are probably small, so using a relatively small buffer size will cover most of the operations while not using up too much memory. Consider that there is a thread pool with possibly many threads being created. If every one of them would take up to 2kB for a cached buffer, it would add up to some amount of memory.

Categories