I have been trying to assign/append a string to a blob column in an output buffer, on a C# script that's taking a number of input rows, and concatenating them where the related id is identical, passing onto the next row where the is new, and seem to be coming up against a problem that may be me not knowing the intermediate steps.
I'm using this:
Output0Buffer.compAlert.AddBlobData(Encoding.Unicode.GetBytes(alert),alert.Length);
To assign the alert string to the NTEXT column compAlert.
The theory and what I can see from previous answers is that this will add the string alert to said NTEXT column. The issue I'm coming across is this only adds the first character of that string. As near as I can tell, if GetBytes is fed a string, it should iterate over that string and add everything? I appear to be missing something that all the other answers that say to use Encoding.Unicode.GetBytes() are taking for granted that I don't know that should be done?
Based on the official documentation, the count argument in the public void AddBlobData (byte[] data, int count) method refers to :
The number of bytes of binary data to be appended.
You should use Encoding.Unicode.GetBytes(alert).length instead of alert.length.
Output0Buffer.compAlert.AddBlobData(Encoding.Unicode.GetBytes(alert),Encoding.Unicode.GetBytes(alert).Length);
Or simply use:
Output0Buffer.compAlert.AddBlobData(Encoding.Unicode.GetBytes(alert));
Related
I'm developing a pdf file viewer. A pdf file stores it characters in bytes and a pdf file can have several megabytes. Using strings for this scenario is a bad idea, because the storage space of a string cannot be reused for another string. Therefor I store these pdf bytes in a char array. When reading the next big pdf file, I can reuse the char array.
Now I need to support a search functionality, so that the user can find a certain text in this huge file. When I am searching, I usually don't want to have to enter proper upper and lower case letters, I might even not remember the correct casing, meaning the search should succeed regardless of casing. When using
string.IndexOf(String, StringComparison)
one can chose InvariantCultureIgnoreCase to get both upper and lower case matches.
However, converting the megabyte char array into an equally big string is a bad idea.
Unfortunately, IndexOf for an Array is not helpful:
public static int IndexOf<T> (T[] array, T value);
This allows to search for only 1 char in a char array and does also not support IgnoreCase, which obviously wouldn't make sense for other arrays, like an integer array.
So the question is:
Which method can be used from DotNet to search a string in a character array.
Please read this before marking this question as dupplicate
I am aware that there are already similar questions regarding searching. But the ones I have seen all convert the character array in one way or another into a string, which I definitely not want.
Also note that many of those solutions don't support ignoring the casing. The solution should also handle exotic Unicodes correctly.
And last but not least, best would be an existing method from DotNet.
I came to the conclusion that I need to implement my own IndexOf method for character arrays. However, programming that proved rather challenging, so I checked in the DotNet source code how string.IndexOf is doing it.
It's a bit confusing because one method is calling another which calls another, each doing not much. Finally, one arrives at:
public unsafe int IndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value,
CompareOptions options = CompareOptions.None)
Lo and behold, that was exactly the functionality I was looking for, because it is very easy to convert a char[] into a ReadOnlySpan<char>. This method belongs to the CompareInfo class. To call it, one has to write something like this:
var index = CultureInfo.InvariantCulture.CompareInfo.IndexOf(bigCharArray,
searchString, CompareOptions.IgnoreCase);
I'm venturing into networking using C# and I'm trying to create a clean way to send my packets, right now though I'm not going to worry with all of the Packets enclosed in special characters stuff that I've been reading about, instead the packet is three digit number dedicated to the front of the data passed to the client. For example, a data string may be the following.
LoginPacket is packet 000.
LoginPacket Data would be "000Username~Password"
I've tried to clean this up, so I could just write things in a cleaner manner, and try something like this
SendPacket(000, new string { "data", "parameters" });
However, sending the integer 000 is instantly converted to zero.
Is there a way around this, or would I be better off storing it all in a string, such as
SendPacket(new string { "000", "data", "params" } );
When you convert the number to text you need to specify the number of digits. The number 000 and 0 are both zero. However the string "000" and "0" are different strings.
Use
n.ToString("000");
To ensure you get three digits.
I would suggest you go with a Command Type followed by a Length followed by the Payload
Then your payloads can be of a similar structure.. so the login command (0) would use a structure that began with a length byte followed by username followed by a second length byte and finally followed by a password.
For example:
0155Dan-o8password
Remember that this all comes over the wire as a byte array.. so you read the first 4 bytes (Int32, Command Type)
Then read the next Int32 to figure out the length of the payload.. that's how many bytes you will read in your third read.
Now that you know the Command is login you can implement login-specific reading.
In addition I would suggest you create some extension methods to make this easier.
like: Stream.ReadByte, Stream.ReadInt32, Stream.ReadInt64, Stream.ReadString(length)
Then some application-specific extensions.. like Stream.ReadLogin
I'm trying to read a value of type REG_RESOURCE_LIST from the registry, but without success.
The specific value I'm trying to read is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI\{YourNetworkInterface}\{GUID}\Control\AllocConfig.
You can find this value by going to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI, and searching for a key that his subkey (the {GUID} part in the path) have a value named Class with a value of Net.
Or you can just search for it.
The strange thing is, when I'm opening the Control key through code, and calling GetValueNames() I'm getting the AllocConfig value name, but when calling GetValue("AllocConfig") I'm getting null (not null in reality).
Also, in ProccessMonitor, I see that when calling GetValue("AllocConfig") the result is Buffer Overflow.
Any help will be appriciated, thanks.
To get a REG_RESOURCE_LIST, you need to use RegQueryValueEx(). The value you should expect in the lpType out parameter is 8 (see here). The data you get back in the lpData out parameter is a CM_RESOURCE_LIST
Call RegQueryValueEx first to get the size of the list, allocate a buffer of that size, call ReqQueryValueEx to fill in the buffer, and cast the buffer pointer to PCM_RESOURCE_LIST. The CM_RESOURCE_LIST documentation linked above tells you how to iterate over the list and extract the contents.
http://blogs.microsoft.co.il/ischen/2007/12/04/querying-device-hardware-resources-from-the-windows-registry-using-c/
The c# project properly decodes REG_RESOURCE_LIST data structure from registry which is based on CM_RESOURCE_LIST structure that is a list that contains many CM_FULL_RESOURCE_DESCRIPTORs.
I have a structure that I am converting to a byte array of length 37, then to a string from that.
I am writing a very basic activation type library, and this string will be passed between people. So I want to shorten it from length 37 to something more manageable to type.
Right now:
Convert the structure to a byte array,
Convert the byte array to a base 64 string (which is still too long).
What is a good way to shorten this string, yet still maintain the data stored in it?
Thanks.
In the general case, going from an arbitrary byte[] to a string requires more data, since we assume we want to avoid non-printable characters. The only way to reduce it is to compress before the base-whatever (you can get a little higher than base-64, but not much - and it certainly isn't any more "friendly") - but compression won't really kick in for such a short size. Basically, you can't do that. You are trying to fit a quart in a pint pot, and that doesn't work.
You may have to rethink your requirements. Perhaps save the BLOB internally, and issue a shorter token (maybe 10 chars, maybe a guid) that is a key to the actual BLOB.
Data compression may be a possiblity to check out, but you can't just compress a 40-byte message to 6 bytes (for example).
If the space of possible strings/types is limited, map them to a list (information coding).
I don't know of anything better than base-64 if you actually have to pass the value around and if users have to type it in.
If you have a central data store they can all access, you could just give them the ID of the row where you saved it. This of course depends on how "secret" this data needs to be.
But I suspect that if you're trying to use this for activation, you need them to have an actual value.
How will the string be passed? Can you expect users to perhaps just copy/paste? Maybe some time spent on clearing up superfluous line breaks that come from an email reader or even your "Copy from here" and "Copy to here" lines might bear more fruit!
Can the characters in your string have non-printable chars? If so, you don't need to base64-encode the bytes, you can simply create the string from them (saved 33%)
string str = new string(byteArray.Cast<char>().ToArray());
Also, are the values in the byte array restricted somehow? If they fall into a certain range (i.e., not all of the 256 possible values), you can consider stuffing two of each in each character of the string.
If you really have 37 bytes of non-redundant information, then you are out of luck. Compression may help in some cases, but if this is an activation key, I would recommend having keys of same length (and compression will not enforce this).
If this code is going to be passed over e-mail, then I see no problem in having an even larger key. Another option might be to insert hyphens every 5-or-so characters, to break it into smaller chunks (e.g. XXXXX-XXXXX-XXXXX-XXXXX-XXXXX).
Use a 160bit hash and hope no collisions? It would be much shorter. If you can use a look-up table, just use a 128 or even 64bit incremental value. Much much shorter than your 37 chars.
I have a very large CSV file (Millions of records)
I have developed a smart search algorithm to locate specific line ranges in the file to avoid parsing the whole file.
Now I am facing a trickier issue : I am only interested in the content of a specific column.
Is there a smart way to avoid looping line by line through a 200MB Files and retrieve only the content of a specific column?
I'd use an existing library as codeulike has suggested, and for a very good reason why read this article:
Stop Rolling Your Own CSV Parser!
You mean get every value from every row for a specific column?
You're probably going to have to visit every row to do that.
This C# CSV Reading library is very quick so you might be able to use it:
LumenWorks.Framework.IO.Csv by Sebastien Lorien
Unless all CSV fields have a fixed width (and even if empty there's still n bytes of blank space between the separators surrounding it), no.
If yes
Then each row, in turn, also has a fixed length and therefore you can skip straight to the first value for that column and, once you've read it, you immediately advance to next row's value for the same field, without having to read any intermediate values.
I think this is pretty simple - but I'm on a roll at the moment (and at lunch), so I'm going to finish it anyway :)
To do this, we first want to know how long each row is in characters (adjust for bytes according to Unicode, UTF8 etc):
row_len = sum(widths[0..n-1]) + n-1 + row_sep_length
Where n is the total number of columns on each row - this is a constant for the whole file. We add an extra n-1 to it to account for the separators between column values.
And row_sep_length is the length of the separator between two rows - usually a newline, or potentially a [carriage-return & line-feed] pair.
The value for a column row[r]col[i] will be offset characters from the start of row[r]where offset is defined as:
offset = i>0 ? sum(widths[0..i-1]) + i) : 0;
//or sum of widths of all columns before col[i]
//plus one character for each separator between adjacent columns
And then, assuming you've read the whole column value, up to the next separator, the offset to the starting character for next column value row[r+1]col[i] is calculated by subtracting the width of your column from the row length. This is yet another constant for the file:
next-field-offset = row_len - widths[i];
//widths[i] is the width of the field you are actually reading.
All the while - i is zero-based in this pseudo code as is the indexing of the vectors/arrays.
To read, then, you first advance the file pointer by offset characters - taking you to the first value you want. You read the value (taking you to the next separator) and then simply advance the file pointer by next-field-offset characters. If you reach EOF at this point, you're done.
I might have missed a character either way in this - so if it's applicable - do check it!
This only works if you can guarantee that all field values - even nulls - for all rows will be the same length, and that the separators are always the same length and that alll row separators are the same length. If not - then this approach won't work.
If not
You'll have to do it the slow way - find the column in each line and do whatever it is you need to do.
If you're doing a significant amount of work on the column value each time, one optimisation will be to pull out all the column values first into a list (set with a known initial capacity too) or something (batching at 100,000 a time or something like that), then iterate through those.
If you keep each loop focused on a single task, that should be more efficient than one big loop.
Equally, once you've batched a 100,000 column values you could use Parallel Linq to distribute the second loop (not the first since there's no point parallelising reading from a file).
There are only shortcuts if you can pose specific limitations on the data.
For example, you can only read the file line by line if you know that there are no values in the file that contain line breaks. If you don't know this, you have to parse the file record by record as a stream, and each record ends where there is a line break that is not inside a value.
However, unless you know that each line takes up exactly the same amount of bytes, there is no other way to read the file than to read line by line. The line breaks in a file is just another pair of characters, there is no other way to locate a line in a text file than to read all the lines that comes before it.
You can do similar shortcuts when reading a record if you can pose limiations on the fields in the records. If you for example know that the fields to the left of the one that you are interrested in are all numerical, you can use a simpler parsing method to find the start of the field.