I have consecutively numbered entities that I want to persist with the Azure Table Service, however the type of the RowKey column is problematic.
The number of the entity should be stored in the RowKey column, so I can query entities fast (PK = '..' && RowKey = 5), get newest entities (RowKey > 10) and query a certain set of entities (RowKey > 5 && RowKey < 10).
Since RowKey must be a string, lower-than comparisons are problematic ("100" < "11").
I thought about prepending zeros to the numbers (so that "100" > "011"), but I can't predict the number of entities (and thus the number of zeros).
I know I could just create an integer column, but I would loose the performance advantage of the indexed RowKey column (plus I don't have any other information suitable for RowKey).
Did anyone have this problem before?
I had a similar problem, with the added caveat that I also wanted to support having the RowKey sorted in descending order. In my case I did not care about supporting trillions of possible values because I was correctly using the PartitionKey and also using scoping prefixes when needed to further segment the RowKey (like "scope-id" -> "12-8374").
In the end I settled on an specific implementation of the general approach suggested by enzi. I used a modified version of Base64 encoding, producing a four character string, which supports over 16 million values and can be sorted in ascending or descending order. Here is the code, which has been unit tested but lacks range checking/validation.
/// <summary>
/// Gets the four character string representation of the specified integer id.
/// </summary>
/// <param name="number">The number to convert</param>
/// <param name="ascending">Indicates whether the encoded number will be sorted ascending or descending</param>
/// <returns>The encoded string representation of the number</returns>
public static string NumberToId(int number, bool ascending = true)
{
if (!ascending)
number = 16777215 - number;
return new string(new[] {
SixBitToChar((byte)((number & 16515072) >> 18)),
SixBitToChar((byte)((number & 258048) >> 12)),
SixBitToChar((byte)((number & 4032) >> 6)),
SixBitToChar((byte)(number & 63)) });
}
/// <summary>
/// Gets the numeric identifier represented by the encoded string.
/// </summary>
/// <param name="id">The encoded string to convert</param>
/// <param name="ascending">Indicates whether the encoded number is sorted ascending or descending</param>
/// <returns>The decoded integer id</returns>
public static int IdToNumber(string id, bool ascending = true)
{
var number = ((int)CharToSixBit(id[0]) << 18) | ((int)CharToSixBit(id[1]) << 12) | ((int)CharToSixBit(id[2]) << 6) | (int)CharToSixBit(id[3]);
return ascending ? number : -1 * (number - 16777215);
}
/// <summary>
/// Converts the specified byte (representing 6 bits) to the correct character representation.
/// </summary>
/// <param name="b">The bits to convert</param>
/// <returns>The encoded character value</returns>
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
static char SixBitToChar(byte b)
{
if (b == 0)
return '!';
if (b == 1)
return '$';
if (b < 12)
return (char)((int)b - 2 + (int)'0');
if (b < 38)
return (char)((int)b - 12 + (int)'A');
return (char)((int)b - 38 + (int)'a');
}
/// <summary>
/// Coverts the specified encoded character into the corresponding bit representation.
/// </summary>
/// <param name="c">The encoded character to convert</param>
/// <returns>The bit representation of the character</returns>
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
static byte CharToSixBit(char c)
{
if (c == '!')
return 0;
if (c == '$')
return 1;
if (c <= '9')
return (byte)((int)c - (int)'0' + 2);
if (c <= 'Z')
return (byte)((int)c - (int)'A' + 12);
return (byte)((int)c - (int)'a' + 38);
}
You can just pass false to the ascending parameter to ensure the encoded value will sort in the opposite direction. I selected ! and $ to complete the Base64 set since they are valid for RowKey values. This algorithm can be easily amended to support additional characters, though I firmly believe that larger numbers do not make sense for RowKey values as table storage keys must be efficiently segmented. Here are some examples of output:
0 -> !!!! asc & zzzz desc
1000 -> !!Dc asc & zzkL desc
2000 -> !!TE asc & zzUj desc
3000 -> !!is asc & zzF5 desc
4000 -> !!yU asc & zz$T desc
5000 -> !$C6 asc & zylr desc
6000 -> !$Rk asc & zyWD desc
7000 -> !$hM asc & zyGb desc
8000 -> !$x! asc & zy0z desc
9000 -> !0Ac asc & zxnL desc
I found an easy way but the previous solution is more efficient (regarding key length).
Instead of using all alphabets we can use just the numbers and the key is to make the length fixed (0000,0001,0002,.....):
public class ReadingEntity : TableEntity
{
public static string KeyLength = "000000000000000000000";
public ReadingEntity(string partitionId, int keyId)
{
this.PartitionKey = partitionId;
this.RowKey = keyId.ToString(KeyLength); ;
}
public ReadingEntity()
{
}
}
public IList<ReadingEntity> Get(string partitionName,int date,int enddate)
{
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
// Create the CloudTable object that represents the "people" table.
CloudTable table = tableClient.GetTableReference("Record");
// Construct the query operation for all customer entities where PartitionKey="Smith".
TableQuery<ReadingEntity> query = new TableQuery<ReadingEntity>().Where(TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionName),
TableOperators.And,TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, enddate.ToString(ReadingEntity.KeyLength)), TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, date.ToString(ReadingEntity.KeyLength)))));
return table.ExecuteQuery(query).ToList();
}
Hope this helps.
you can post-append a guid to an integer. that should help with sorts.
http://blogs.southworks.net/fboerr/2010/04/22/compsition-in-windows-azure-table-storage-choosing-the-row-key-and-simulating-startswith/
I solved this problem by creating a custom RowKey class that wraps around a String and provides an Increment method.
I can now define a range of valid characters (e.g. 0-9 + a-z + A-Z) and "count" within this range (e.g. az9 + 1 = aza, azZ + 1 = aA0). The advantage of this compared to using only numbers is that I have a far greater range of possible keys (62^ninstead of 10^n).
I still have to define the length of the string beforehand and mustn't change it, but now I can store pretty much any number of entities while keeping the string itself much shorter. For example, with 10 digits I can store ~8*10^17 keys and with 20 digits ~7*10^35.
The number of valid characters can of course be increased further to use the number of digits even more effectively, but in my case the above range was sufficient and is still readable enough for debugging purposes.
I hope this answer helps others who run into the same problem.
EDIT: Just as a side note in case anyone wants to implement something similar: You will have to create custom character ranges and can't just count from 0 upwards, because there are illegal characters (e.g. /, \) between the numbers (0-9) and the lowercase letters.
I found a potential solution if you're using Linq to query against Azure Table Storage.
You add something like this to your model for the table...
public int ID
{
get
{
return int.Parse(RowKey);
}
}
And then you can do this in your Linq query...
.Where(e => e.ID > 1 && e.ID < 10);
With this technique you're not actually adding the "ID" column to the table since it has no "set" operation in it.
The one thing I'm unsure about is what's happening behind the scenes exactly. I want to know what the query to Azure Table Storage looks like in its final form, but I'm not sure how to find that out. I haven't been able to find that information when debugging and using quickwatch.
UPDATE
I still haven't figured out what's happening, but I have a strong feeling that this isn't very efficient. I'm thinking the way to go is to create a sortable string as the OP did. Then you can use the RowKey.CompareTo() function in your Linq where clause to filter by a range.
Related
Implement the NextBiggerThan method that returns the nearest largest integer consisting of the digits of the given positive integer number and null if no such number exists. The method should return -1, if there is no nearest largest number.
Use iteration and selection statements. Extract digits using the remainder operator %. Don't use strings, collections or arrays.
At first, I thought it would be enough to swap the last 2 digits of the number. But now I see that it is crucial to use selection and iteration. Unfortunately, I have no clue how to correctly implement them here.
using System.Collections.Generic;
using System;
namespace NextBiggerTask
{
public static class NumberExtension
{
/// <summary>
/// Finds the nearest largest integer consisting of the digits of the given positive integer number; return -1 if no such number exists.
/// </summary>
/// <param name="number">Source number.</param>
/// <returns>
/// The nearest largest integer consisting of the digits of the given positive integer; return -1 if no such number exists.
/// </returns>
/// <exception cref="ArgumentException">Thrown when source number is less than 0.</exception>
public static int NextBiggerThan(int number)
{
int lastdigit = number % 10;
int lastdigits = number % 100;
int prelastdigit = (lastdigits - lastdigit) / 10;
int nearestnumber = number - lastdigits;
nearestnumber += 10 * lastdigit + prelastdigit;
return nearestnumber;
}
}
}
Let D be a list of the digits, in order, of number.
Let i be the smallest value such that D[i..] is non-increasing. (If i is 0, then there is no next value.)
Let j be index of the smallest value in D[i..] that is larger than D[i-1]
Then your answer is D[..i-2]+D[j]+sorted_ascending(D[i..] with D[j] replaced by D[i-1])
Example:
number = 114232, so D=[1,1,4,2,3,2]
i is 4 (for [3,2])
j is also 4 (for the 3)
So the result is [1,1,4] + [3] + [2,2]
I have two tables, both of them with a Phone column.
This column -NVARCHAR(30)- can have its stored data formatted in several different ways, from 0001 222 333 444 to 222 333 444 to (0001)-222333444 to - to even an empty string.
I would like to do a query using LINQ where the first three examples shown above would give a match, so I need to get rid of everything that's not a number and then get the last 9 digits out of the string. However, I haven't been able to do this using just one query, and instead I'm looping through each of my results and apply the phone number filters there. Any way this could be done with just the query?
String is essentially an IEnumerable<char>, so what you can do is:
var digits = s.Where(char.IsDigit);
Now, there's no real elegant way to take the last 9 digits: IEnumerable<> implies that there's no way to find out its length other than to iterate over it. What I can suggest is:
var digits = new string(s.Where(char.IsDigit).Reverse().Take(9).Reverse().ToArray());
Or you can get all fancy and write your own TakeLast() extension method:
public static IList<T> TakeLast<T>(this IEnumerable<T> enumerable, int n)
{
var queue = new Queue<T>(n);
foreach(var item in enumerable)
{
queue.Enqueue(item);
if(queue.Count > n)
queue.Dequeue();
}
return queue.ToList();
}
Which will greatly simplify your code:
var digits = new string(s.Where(char.IsDigit).TakeLast(9).ToArray());
Similar to Jonesopolis' answer, you can do the following to get the last N characters from a string:
var str = "0001 222 333 444";
var n = 9;
var result = string.Concat(str.Where(char.IsDigit).Skip(str.Length - n).Take(n));
this will skip the first few characters and pull back the last n number of characters. You may need to do a test on string length to make sure the string contains sufficient characters - I've not tested this so if the string is too short it might throw an error.
Assuming an EntityFramework mapped class like:
public class Person
{
public int Id { get; set; }
public string Phone { get; set; }
}
and an EF context that contains
public DbSet<Person> Persons
You can filter using something like
var ppl = context.Persons.Where(x => x.Phone.Replace(" ", "").EndsWith("222333444")).ToList();
This will produce SQL that looks like the following, satisfying your request for a filtering solution.
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Phone] AS [Phone]
FROM [dbo].[People] AS [Extent1]
WHERE REPLACE([Extent1].[Phone], N' ', N'') LIKE N'%222333444'
Then any formatting can be applied in a middle tier / model builder / automapper style solution.
How about a combination of Regex & LINQ :
Regex r1 = new Regex("[^0-9.]");
Regex r2 = new Regex("(.{9})$");
var Last9Digits = PhoneNumbers.Select(PhoneNo => r2.Match(r1.Replace(PhoneNo, "")));
/// <summary>
/// Get the last N characters of a string.
/// </summary>
public static string GetLast(this string source, int numberOfChars)
{
if (string.IsNullOrEmpty(source) || numberOfChars >= source.Length)
return source;
return source.Substring(source.Length - numberOfChars);
}
First of all, I'm not really sure if I have framed my question correctly, but what I'm looking for can be better explained by looking at the below visual representation:
I have a method which returns an int within the range of 0 and 360.
Now, for further manipulation, I would like to round? or get the closest match from the numbers which are offset by 30. So how can I achieve this. Also, is there a specific term for the function that I'm looking for?
You may also edit the question if you think it can be written better.
Best Regards,
Navik.
This should work for any list where the items are an equal distance apart (i.e. 30, 60, 90).
EDIT
I've updated the code to use AlexD's elegant solution so that it will work with lists of any 'step' value, and with any starting (or ending) value (i.e. it could start with a negative number, like: -20, -15, -10, -5, 0, 5, 10, 15, 20):
/// <summary>
/// Gets the value of the item in the list of
/// numbers that is closest to the given number
/// </summary>
/// <param name="number">Any number</param>
/// <param name="numbers">A list of numbers, sorted from lowest to highest,
/// where the difference between each item is the same</param>
/// <returns>The value of the list item closest to the given number</returns>
public static int GetClosestNumber(int number, List<int> numbers)
{
if (numbers == null) throw new ArgumentNullException("numbers");
if (numbers.Count == 0)
throw new
ArgumentException("There are no items to compare against.", "numbers");
if (numbers.Count == 1) return numbers[0]; // Short-circuit for single-item lists
var step = Math.Abs(numbers[1] - numbers[0]);
// Get closest number using a slight modification of AlexD's elegant solution
var closestNumber = (Math.Abs(number) + (step / 2)) / step *
step * (number < 0 ? -1 : 1);
// Ensure numbers is within min/max bounds of the list
return Math.Min(Math.Max(closestNumber, numbers[0]), numbers[numbers.Count - 1]);
}
I'm going to parse a position base file from a legacy system. Each column in the file has a fixed column width and each row can maximum be 80 chars long. The problem is that you don't know how long a row is. Sometime they only have filled in the first five columns, and sometimes all columns are used.
If I KNOW that all 80 chars where used, then I simple could do like this:
^\s*
(?<a>\w{3})
(?<b>[ \d]{2})
(?<c>[ 0-9a-fA-F]{2})
(?<d>.{20})
...
But the problem with this is that if the last columns is missing, the row will not match. The last column can even be less number of chars then the maximum of that column.
See example
Text to match a b c d
"AQM45A3A text " => AQM 45 A3 "A text " //group d has 9 chars instead of 20
"AQM45F5" => AQM 45 F5 //group d is missing
"AQM4" => AQM 4 //group b has 1 char instead of 2
"AQM4 ASome Text" => AQM 4 A "Some Text" //group b and c only uses one char, but fill up the gap with space
"AQM4FSome Text" => No match, group b should have two numbers, but it is only one.
"COM*A comment" => Comments do not match (all comments are prefixed with COM*)
" " => Empty lines do not match
How should I design the Regular Expression to match this?
Edit 1
In this example, EACH row that I want to parse, is starting with AQM
Column a is always starting at position 0
Column b is always starting at position 3
Column c is always starting at position 5
Column d is always starting at position 7
If a column is not using all its space, is files up with spaces
Only the last column that is used can be trimed
Edit 2
To make it more clearer, I enclose here soem exemple of how the data might look like, and the definition of the columns (note that the examples I have mentioned earlier in the question was heavily simplified)
I'm not sure a regexp is the right thing to use here. If I understand your structure, you want something like
if (length >= 8)
d = everything 8th column on
remove field d
else
d = empty
if (length >= 6)
c = everything 6th column on
remove field c
else
c = empty
etc. Maybe a regexp can do it, but it will probably be rather contrived.
Try using a ? after the groups which could not be there. In this case if some group is missing you would have the match.
Edit n, after Sguazz answer
I would use
(?<a>AQM)(?<b>[ \d]{2})?(?<c>[ 0-9a-fA-F]{2})?(?<d>.{0,20})?
or even a + instead of the {0,20} for the last group, if could be that there are more than 20 chars.
Edit n+1,
Better like this?
(?<a>\w{3})(?<b>\d[ \d])(?<c>[0-9a-fA-F][ 0-9a-fA-F])(?<d>.+)
So, just to rephrase: in your example you have a sequence of character, and you know that the first 3 belong to group A, the following 2 belong to group B, then 2 to group C and 20 to group D, but there might not be this many elements.
Try with:
(?<a>\w{0,3})(?<b>[ \d]{0,2})(?<c>[ 0-9a-fA-F]{0,2})(?<d>.{0,20})
Basically these numbers are now an upper limit of the group as opposed to a fixed size.
EDIT, to reflect your last comment: if you know that all your relevant rows start with 'AQM', you can replace group A with (?<a>AQM)
ANOTHER EDIT: Let's try with this instead.
(?<a>AQM)(?<b>[ \d]{2}|[ \d]$)(?<c>[ 0-9a-fA-F]{0,2})(?<d>.{0,20})
Perhaps you could use a function like this one to break the string into its column values. It doesn't parse comment strings and is able to handle strings that are shorter than 80 characters. It doesn't validate the contents of the columns though. Maybe you can do that when you use the values.
/// <summary>
/// Break a data row into a collection of strings based on the expected column widths.
/// </summary>
/// <param name="input">The width delimited input data to break into sub strings.</param>
/// <returns>
/// An empty collection if the input string is empty or a comment.
/// A collection of the width delimited values contained in the input string otherwise.
/// </returns>
private static IEnumerable<string> ParseRow(string input) {
const string COMMENT_PREFIX = "COM*";
var columnWidths = new int[] { 3, 2, 2, 3, 6, 14, 2, 2, 3, 2, 2, 10, 7, 7, 2, 1, 1, 2, 7, 1, 1 };
int inputCursor = 0;
int columnIndex = 0;
var parsedValues = new List<string>();
if (String.IsNullOrEmpty(input) || input.StartsWith(COMMENT_PREFIX) || input.Trim().Length == 0) {
return parsedValues;
}
while (inputCursor < input.Length && columnIndex < columnWidths.Length) {
//Make sure the column width never exceeds the bounds of the input string. This can happen if the input string doesn't end on the edge of a column.
int columnWidth = Math.Min(columnWidths[columnIndex++], input.Length - inputCursor);
string columnValue = input.Substring(inputCursor, columnWidth);
parsedValues.Add(columnValue);
inputCursor += columnWidth;
}
return parsedValues;
}
I have a application that I save this in the database:
FromLetter ToLetter
AAA AAZ
ABC MNL
what I need is to search like this AAC and returns record 1 and FBC and return record 2.
Is the same functionality if instead of letter I save dates. I need to do the same query.
I am using SQL Server and Entity Framework, any Idea how to do this?
Should be pretty straight forward. Here is a Linq to Entities solution, ignoring case:
Entity Framework/Linq solution strings:
string yourValue = somevalue;
var result = (from r in db.ExampleTable
where String.Compare(yourValue, r.FromLetter, true) == 1
&& String.Compare(yourValue, r.ToLetter, true) == -1
select r).First();
Dates:
DateTime yourValue = somevalue;
var result = (from r in db.ExampleTable
where yourValue >= r.FromDate
&& yourValue <= r.ToDate
select r).First();
I think it would be much easier to represent the FromLetter and ToLetter attributes using an integer. Especially if the length of the string is always just 3 - you can simply encode the number as:
(((letter1 - 'A') * 26 + (letter2 - 'A')) * 26) + (letter3 - 'A')
This will give you a number between 0 and 26^3 that represents the tripple and can be easily converted back to the string (using modulo and division as when converting numbers between numeric bases). This number fits into Int32 comfortably (up to 6 letters).
Searching for a string within a specified range would then be a simple search for an integer within a numeric range (which is easy to do and efficient).
Genius solution given by.... bunglestink
I wasted plenty of time in researching implementation of "between" clause for string in EF. This is helpful.