Parallel Querying Azure Storage

Parallel Querying Azure Storage - c#

I currently have a queury which looks along the lines of:
TableQuery<CloudTableEntity> query = new TableQuery<CloudTableEntity().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, PK));
foreach (CloudTableEntity entity in table.ExecuteQuery(query))
{
//Logic
}
I been researching about parallels, however, I cannot find any good code examples on how to use it. I want to be able to query thousands of partition keys like
CloudTableEntity().Where(PartitionKey == "11" || PartitionKey == "22")
Where I can have around 40000 Partition keys. Is there a good way to do this?

The following sample code will issue multiple partition key queries in parallel:
CloudTable table = tableClient.GetTableReference("xyztable");
List<string> pkList = new List<string>(); // Partition keys to query
pkList.Add("1");
pkList.Add("2");
pkList.Add("3");
Parallel.ForEach(
pkList,
//new ParallelOptions { MaxDegreeOfParallelism = 128 }, // optional: limit threads
pk => { ProcessQuery(table, pk); }
);
Where ProcessQuery is defined as:
static void ProcessQuery(CloudTable table, string pk)
{
string pkFilter = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, pk);
TableQuery<TableEntity> query = new TableQuery<TableEntity>().Where(pkFilter);
var list = table.ExecuteQuery(query).ToList();
foreach (TableEntity entity in list)
{
// Process Entities
}
}
Note that ORing two partition keys in the same query as you listed above will result in a full table scan. To avoid a full table scan, execute individual queries with one partition key per query as the sample code above demonstrates.
For more details on query construction please see http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx

Using table.ExecuteQuerySegmentedAsync will provide better performance

Related

How to design a system that deals with incrementing registration number in MongoDB

In MongoDB how to create Registration number ?
I need auto incrementing value but MongoDB doesn't have auto increment by default (risk in case concurrency failure),
so how to do it?
Eg. current registration no : 1
now when I insert a new record this must be 1 + 1 = 2

I do it like that
FindOneAndUpdate is atomic
static public async Task<long> NextInt64Async(IMongoDatabase db, string seqCollection, long q = 1, CancellationToken cancel = default)
{
long result = 1;
BsonDocument r = await db.GetCollection<BsonDocument>("seq").FindOneAndUpdateAsync<BsonDocument>(
filter: Builders<BsonDocument>.Filter.Eq("_id", seqCollection),
update: Builders<BsonDocument>.Update.Inc("seq", q),
options: new FindOneAndUpdateOptions<BsonDocument, BsonDocument>() { ReturnDocument = ReturnDocument.After, IsUpsert = true },
cancellationToken: cancel
);
if (r != null)
result = r["seq"].AsInt64;
return result;
}
....
await collection.InsertOneAsync(new Person() { Id = await NextInt64Async(db, "person"), Name = "Person" + i });
Find full example here
https://github.com/iso8859/learn-mongodb-by-example/blob/main/dotnet/02%20-%20Intermediate/InsertLongId.cs

If you need to avoid gaps, you can use the following approach that involves only updates to a single document that are atomic:
First, you pre-fill the invoice collection with a reasonable amount of documents (if you expect 1000 invoices per day, you could create the documents for a year in advance) that has a unique, increasing and gap-less number, e.g.
[
{ _id: "abcde1", InvoiceId: 1, HasInvoice: false, Invoice: null },
{ _id: "abcde2", InvoiceId: 2, HasInvoice: false, Invoice: null },
{ _id: "abcde3", InvoiceId: 3, HasInvoice: false, Invoice: null },
...
]
There should be a unique index on InvoiceId and for efficient querying/sorting during the updates another one on HasInvoice and InvoiceId. You'd need to insert new documents if you are about to run out of prepared documents.
When creating an invoice, you perform a FindOneAndModify operation that gets the document with the lowest InvoiceId that does not have an Invoice yet. Using the updates, you assign the Invoice, e.g.:
var filter = Builders<Invoice>.Filter.Eq(x => x.HasInvoice, false);
var update = Builders<Invoice>.Update
.Set(x => x.HasInvoice, true)
.Set(x => x.Invoice, invoiceDetails);
var options = new FindOneAndUpdateOptions<Invoice>()
{
Sort = Builders<Invoice>.Sort.Ascending(x => x.InvoiceId),
ReturnDocument = ReturnDocument.After,
};
var updatedInvoice = await invoices.FindOneAndUpdateAsync(filter, update, options);
FindOneAndModify returns the updated document so that you can access the assigned invoice id afterwards.
Due to the atomic execution of FindAndModify there is no need for transactions; gaps are not possible as always the lowest InvoiceId is found.

There is no built in way to achieve this, there are a few solutions for certain situations.
For example if you're using Mongo Realm you can define a DB trigger, I recommend following this guide
If you're using mongoose in your app there are certain plugins like mongoose-auto-increment that do it for you.
The way they work is by creating an additional collection that contains a counter to be used for every insert, however this is not perfect as it your db is still vulnerable to manual updates and human error. This is still the only viable solution that doesn't require preprocessing of some sort, I recommend to also create a unique index on that field to at least guarantee uniqueness.

DocumentDB filter an array by an array

I have a document that looks essentially like this:
{
"Name": "John Smith",
"Value": "SomethingIneed",
"Tags: ["Tag1" ,"Tag2", "Tag3"]
}
My goal is to write a query where I find all documents in my database whose Tag property contains all of the tags in a filter.
For example, in the case above, my query might be ["Tag1", "Tag3"]. I want all documents whose tags collection contains Tag1 AND Tag3.
I have done the following:
tried an All Contains type linq query
var tags = new List<string>() {"Test", "TestAccount"};
var req =
Client.CreateDocumentQuery<Contact>(UriFactory.CreateDocumentCollectionUri("db", "collection"))
.Where(x => x.Tags.All(y => tags.Contains(y)))
.ToList();
Created a user defined function (I couldn't get this to work at all)
var tagString = "'Test', 'TestAccount'";
var req =
Client.CreateDocumentQuery<Contact>(UriFactory.CreateDocumentCollectionUri("db", "collection"),
$"Select c.Name, c.Email, c.id from c WHERE udf.containsAll([${tagString}] , c.Tags)").ToList();
with containsAll defined as:
function arrayContainsAnotherArray(needle, haystack){
for(var i = 0; i < needle.length; i++){
if(haystack.indexOf(needle[i]) === -1)
return false;
}
return true;
}
Use System.Linq.Dynamic to create a predicate from a string
var query = new StringBuilder("ItemType = \"MyType\"");
if (search.CollectionValues.Any())
{
foreach (var searchCollectionValue in search.CollectionValues)
{
query.Append($" and Collection.Contains(\"{searchCollectionValue}\")");
}
}
3 actually worked for me, but the query was very expensive (more than 2000 RUs on a collection of 10K documents) and I am getting throttled like crazy. My result set for the first iteration of my application must be able to support 10K results in the result set. How can I best query for a large number of results with an array of filters?
Thanks.

The UDF could be made to work but it would be a full table scan and so not recommended unless combined with other highly-selective criteria.
I believe the most performant (index-using) approach would be to split it into a series of AND statements. You could do this programmatically building up your query string (being careful to fully escape and user-provided data for security reasons). So, the resulting query would look like:
SELECT *
FROM c
WHERE
ARRAY_CONTAINS(c.Tags, "Tag1") AND
ARRAY_CONTAINS(c.Tags, "Tag3")

Azure Tables: Get all row keys + specific column from entire table?

I have entities stored in one table that look like (truncated here):
public class Record : TableEntity
{
public double Version;
public string User;
}
Basically, I want to retrieve all the RowKeys and only the Version column of the entire table (i.e. the entire first 2 columns). There is only one PartitionKey since there aren't too many entries. How can I programmatically retrieve these columns? I'm aware of query projection and have seen the example on the Azure site, but I don't know how to extend it for my purpose here.
Any help would be appreciated!

Please try the following code. Essentially you would need to specify the columns you wish to fetch in SelectColumns property of your query. I have not specified any filter criteria because you mentioned that the table doesn't contain that many entities.
static void QueryProjectionExample()
{
var cred = CloudStorageAccount.DevelopmentStorageAccount;
var client = cred.CreateCloudTableClient();
var table = client.GetTableReference("TableName");
var query = new TableQuery<DynamicTableEntity>()
{
SelectColumns = new List<string>()
{
"RowKey", "Version"
}
};
var queryOutput = table.ExecuteQuerySegmented<DynamicTableEntity>(query, null);
var results = queryOutput.Results;
foreach (var entity in results)
{
Console.WriteLine("RowKey = " + entity.RowKey + "; Version = " + entity.Properties["Version"].StringValue);
}
}

dynamic query to azure tables

I'm using azure table storage to store blog posts. Each blog post can have different tags.
So I'm going to have three different tables.
One which will store the blog posts.
One to store the tags
One that will store the relation between the tags and posts
So my question is as following, is it possible to create dynamic search queuries? Because I do not know until at run time how many tags I want to search.
As I understand it you can only query azure table using LINQ. Or can I input a string query that I can change dynamically?
UPDATE
Here's some example data that's in the blog table
PartitionKey,RowKey,Timestamp,Content,FromUser,Tags
user1, 1, 2012-08-08 13:57:23, "Hello World", "root", "yellow,red"
blogTag table
PartitionKey,RowKey,Timestamp,TagId,TagName
"red", "red", 2012-08-08 11:40:29, 1, red
"yellow", "yellow", 2012-08-08 11:40:29, 2, yellow
relation table
PartitionKey,RowKey,Timestamp,DataId,TagId
1, 1, 2012-08-08 11:40:29, 1, 1
2, 1, 2012-08-08 13:57:23, 1, 2
One usage example of these tables is for example when I want to get all blog post with certain tag.
I have to query the tagId from the blogTag table
There after I need to search in the relation table for the dataId
Lastly I need to search blog table for blog post with that dataId
I'm using LINQ to perform the query and it looks like following
CloudTableQuery<DataTag> tagIds = (from e in ctx2.CreateQuery<DataTag>("datatags")
where e.PartitionKey == tags
select e).AsTableServiceQuery<DataTag>();
I tried Gaurav Mantri suggestion of using filter, and it works. But I'm afraid of how the effiency of that will be. And about the limitation of 15 discrete comparison that's only allowed.

You can simple build where clause and pass to where method for example:
var whereClause="(PartitionKey eq 'Key1') and (PartitionKey eq 'Key2')"
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("AccountDetails");
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference(<TableName>);
table.CreateIfNotExists();
TableQuery<YourAzureTableEntity> query =
new TableQuery<YourAzureTableEntity>()
.Where(whereClause));
var list = table.ExecuteQuery(query).ToList();

I am also facing exactly same problem. I did find one solution which I am pasting below:
public static IEnumerable<T> Get(CloudStorageAccount storageAccount, string tableName, string filter)
{
string tableEndpoint = storageAccount.TableEndpoint.AbsoluteUri;
var tableServiceContext = new TableServiceContext(tableEndpoint, storageAccount.Credentials);
string query = string.Format("{0}{1}()?filter={2}", tableEndpoint, tableName, filter);
var queryResponse = tableServiceContext.Execute<T>(new Uri(query)) as QueryOperationResponse<T>;
return queryResponse.ToList();
}
Basically it utilizes DataServiceContext's Execute(Uri) method: http://msdn.microsoft.com/en-us/library/cc646700.aspx.
You would need to specify the filter condition as you would do if you're invoking the query functionality through REST API (e.g. PartitionKey eq 'mypk' and RowKey ge 'myrk').
Not sure if this is the best solution :) Looking forward to comments on this.

It is possible, but it may not be a good idea. Adding multiple query parameters like that always results in a table scan. That's probably OK in a small table, but if your tables are going to be large it will be very slow. For large tables, you're better off running a separate query for each key combination.
That said, you can build a dynamic query with some LINQ magic. Here is the helper class I've used for that:
public class LinqBuilder
{
/// <summary>
/// Build a LINQ Expression that roughly matches the SQL IN() operator
/// </summary>
/// <param name="columnValues">The values to filter for</param>
/// <returns>An expression that can be passed to the LINQ .Where() method</returns>
public static Expression<Func<RowType, bool>> BuildListFilter<RowType, ColumnType>(string filterColumnName, IEnumerable<ColumnType> columnValues)
{
ParameterExpression rowParam = Expression.Parameter(typeof(RowType), "r");
MemberExpression column = Expression.Property(rowParam, filterColumnName);
BinaryExpression filter = null;
foreach (ColumnType columnValue in columnValues)
{
BinaryExpression newFilterClause = Expression.Equal(column, Expression.Constant(columnValue));
if (filter != null)
{
filter = Expression.Or(filter, newFilterClause);
}
else
{
filter = newFilterClause;
}
}
return Expression.Lambda<Func<RowType, bool>>(filter, rowParam);
}
public static Expression<Func<RowType, bool>> BuildComparisonFilter<RowType, ColumnType>(string filterColumnName, Func<MemberExpression, BinaryExpression> buildComparison)
{
ParameterExpression rowParam = Expression.Parameter(typeof(RowType), "r");
MemberExpression column = Expression.Property(rowParam, filterColumnName);
BinaryExpression filter = buildComparison(column);
return Expression.Lambda<Func<RowType, bool>>(filter, rowParam);
}
}
You would use it something like this:
var whereClause = BuildListFilter(queryColumnName, columnValues);
CloudTableQuery<RowType> query = (from r in tableServiceContext.CreateQuery<MyRow>("MyTable")
where r.PartitionKey == partitionKey
select r)
.Where(whereClause) //Add in our multiple where clauses
.AsTableServiceQuery(); //Convert to table service query
var results = query.ToList();
Note also that the Table service enforces a maximum number of constraints per query. The documented maximum is 15 per query, but when I last tried this (which was some time ago) the actual maximum was 14.

Building something like this in table storage is quite cumbersome; akin to forcing a square peg in a round hole.
Instead you could considered using Blob storage to store your Blogs and Lucene.NET to implement your search of tags. Lucene would also allow more complex searches like (Tag = "A" and Tag = "B" and Tag != "C") and in addition would also allow searching over the blog text itself, if you so choose.
http://code.msdn.microsoft.com/windowsazure/Azure-Library-for-83562538

How do I order a sql datasource of uniqueidentifiers in Linq by an array of uniqueindentifiers

I have a string list(A) of individualProfileId's (GUID) that can be in any order(used for displaying personal profiles in a specific order based on user input) which is stored as a string due to it being part of the cms functionality.
I also have an asp c# Repeater that uses a LinqDataSource to query against the individual table. This repeater needs to use the ordered list(A) to display the results in the order specified.
Which is what i am having problems with. Does anyone have any ideas?
list(A)
'CD44D9F9-DE88-4BBD-B7A2-41F7A9904DAC',
'7FF2D867-DE88-4549-B5C1-D3C321F8DB9B',
'3FC3DE3F-7ADE-44F1-B17D-23E037130907'
Datasource example
IndividualProfileId Name JobTitle EmailAddress IsEmployee
3FC3DE3F-7ADE-44F1-B17D-23E037130907 Joe Blo Director dsd#ad.com 1
CD44D9F9-DE88-4BBD-B7A2-41F7A9904DAC Maxy Dosh The Boss 1
98AB3AFD-4D4E-4BAF-91CE-A778EB29D959 some one a job 322#wewd.ocm 1
7FF2D867-DE88-4549-B5C1-D3C321F8DB9B Max Walsh CEO 1

There is a very simple (single-line) way of doing this, given that you get the employee results from the database first (so resultSetFromDatabase is just example data, you should have some LINQ query here that gets your results).
var a = new[] { "GUID1", "GUID2", "GUID3"};
var resultSetFromDatabase = new[]
{
new { IndividualProfileId = "GUID3", Name = "Joe Blo" },
new { IndividualProfileId = "GUID1", Name = "Maxy Dosh" },
new { IndividualProfileId = "GUID4", Name = "some one" },
new { IndividualProfileId = "GUID2", Name = "Max Walsh" }
};
var sortedResults = a.Join(res, s => s, e => e.IndividualProfileId, (s, e) => e);
It's impossible to have the datasource get the results directly in the right order, unless you're willing to write some dedicated SQL stored procedure. The problem is that you'd have to tell the database the contents of a. Using LINQ this can only be done via Contains. And that doesn't guarantee any order in the result set.

Turn the list(A), which you stated is a string, into an actual list. For example, you could use listAsString.Split(",") and then remove the 's from each element. I’ll assume the finished list is called list.
Query the database to retrieve the rows that you need, for example:
var data = db.Table.Where(row => list.Contains(row.IndividualProfileId));
From the data returned, create a dictionary keyed by the IndividualProfileId, for example:
var dic = data.ToDictionary(e => e.IndividualProfileId);
Iterate through the list and retrieve the dictionary entry for each item:
var results = list.Select(item => dic[item]).ToList();
Now results will have the records in the same order that the IDs were in list.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.