Azure table Storage continuation - c#

So, Microsoft decided to send diagnostic data to Azure table storage. I'm trying to query this storage and send it to another location for analytics via C# SDK. I can query just fine pull the hundreds of thousand of record, but it appears that the last continuation token they send will always receive a null response. Even if more data gets sent into table storage, my continuation token doesn't work, still gets a null continuation token and null data back.
Has anyone done anything like this? How can I continue "syncing" azure table data if the continuation tokens they send are broken?
public static List<PerfMonEntity> GetEventData(ref TableContinuationToken contToken)
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConfigurationManager.AppSettings["StorageConnectionString"]);
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable eventLogsTable = tableClient.GetTableReference("WADPerformanceCountersTable");
TableQuery<PerfMonEntity> query = new TableQuery<PerfMonEntity>();
var l = new List<PerfMonEntity>();
var segment = eventLogsTable.ExecuteQuerySegmented(query, contToken ?? new TableContinuationToken());
foreach (PerfMonEntity wadCounter in segment)
{
l.Add(wadCounter);
}
contToken = segment.ContinuationToken;
if (contToken == null)
{
Console.WriteLine("contToken is NULL!");
return null;
}
Console.WriteLine("partkey: {0}", contToken.NextPartitionKey ?? "");
Console.WriteLine("rowkey: {0}", contToken.NextRowKey ?? "");
return l;
}
-=-=-=-=-=-
while (num < loop)
{
List<PerfMonEntity> eleList = AzurePerfTable.GetEventData(ref contToken);
if (eleList != null)
returnedList.AddRange(eleList);
else
num = loop;
num += 1;
if (contToken != null)
AZContinuationToken.SetContToken(contToken);
Console.WriteLine("returnedlistsize: {0}", returnedList.Count<PerfMonEntity>());
}

The continuation token is null when there is no more data to return. When it's non-null, it means that there are additional entities to return in the next page. You can check for null to determine when you've retrieved the last page and then exit the loop.
Try writing your logic along these lines:
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable eventLogsTable = tableClient.GetTableReference("WADPerformanceCountersTable");
TableQuery query = new TableQuery();
Console.WriteLine("List perf counter results in pages:");
TableContinuationToken token = null;
do
{
var segment = eventLogsTable.ExecuteQuerySegmented(query, token, null, null);
foreach (var wadCounter in segment)
{
Console.WriteLine(wadCounter.PartitionKey);
Console.WriteLine(wadCounter.RowKey);
Console.WriteLine(wadCounter.Timestamp);
}
token = segment.ContinuationToken;
}
while (token != null);

Related

How do I iterate CloudBlobDirectory to copy the data?

I have written below code to iterate through the Gen2 storage blob
CloudStorageAccount sourceAccount = CloudStorageAccount.Parse(sourceConnection);
CloudStorageAccount destAccount = CloudStorageAccount.Parse(destConnection);
CloudBlobClient sourceClient = sourceAccount.CreateCloudBlobClient();
CloudBlobClient destClient = destAccount.CreateCloudBlobClient();
CloudBlobContainer sourceBlobContainer = sourceClient.GetContainerReference(sourceContainer);
// Find all blobs that haven't changed since the specified date and time
IEnumerable<ICloudBlob> sourceBlobRefs = FindMatchingBlobsAsync(sourceBlobContainer, transferBlobsNotModifiedSince).Result;
private static async Task<IEnumerable<ICloudBlob>> FindMatchingBlobsAsync(CloudBlobContainer blobContainer, DateTime transferBlobsNotModifiedSince)
{
List<ICloudBlob> blobList = new List<ICloudBlob>();
BlobContinuationToken token = null;
// Iterate through the blobs in the source container
do
{
BlobResultSegment segment = await blobContainer.ListBlobsSegmentedAsync(prefix: "", currentToken: token);
foreach (CloudBlobDirectory VARIABLE in segment.Results)
{
BlobResultSegment segment2 = await VARIABLE.ListBlobsSegmentedAsync(currentToken: token);
foreach (CloudBlobDirectory VARIABLE2 in segment2.Results)//Bad coding
{
//how do I get children count ?
}
}
}while (token != null);
}
This will iterate only 2 levels but not dynamically till the inner levels. I have blob in below hierarchy
--Container
--FolderA
--FolderAA
--FolderAA1
--File1.txt
--File2.txt
--FolderAA2
--File1.txt
--File2.txt
--FolderAA3
--FolderAB
--File8.txt
--FolderAC
--File9.txt
This hierarchy is dynamic
How do I loop and copy the blob content.
Note: I do not want to use CLI commands to copy. Because I won't have any control once copy started.
Update
Found some samples here: https://csharp.hotexamples.com/examples/Microsoft.WindowsAzure.Storage.Blob/CloudBlobContainer/ListBlobsSegmented/php-cloudblobcontainer-listblobssegmented-method-examples.html
Please see the sample code below:
class Program
{
static void Main(string[] args)
{
var storageAccount = CloudStorageAccount.Parse("UseDevelopmentStorage=true");
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference("test");
var blobs = FindMatchingBlobsAsync(container).GetAwaiter().GetResult();
foreach (var blob in blobs)
{
Console.WriteLine(blob.Name);
}
Console.WriteLine("-------------------------------------");
Console.WriteLine("List of all blobs fetched. Press any key to terminate the application.");
Console.ReadKey();
}
private static async Task<IEnumerable<ICloudBlob>> FindMatchingBlobsAsync(CloudBlobContainer blobContainer)
{
List<ICloudBlob> blobList = new List<ICloudBlob>();
BlobContinuationToken token = null;
// Iterate through the blobs in the source container
do
{
BlobResultSegment segment = await blobContainer.ListBlobsSegmentedAsync(prefix: "", useFlatBlobListing: true, BlobListingDetails.None, 5000, token, new BlobRequestOptions(), new OperationContext());
token = segment.ContinuationToken;
foreach(var item in segment.Results)
{
blobList.Add((ICloudBlob)item);
}
} while (token != null);
return blobList;
}
}

unable to access storage account blob getting BlobContainerValidationError error c#

I'm trying to read data from blob storage to use iothub import job.
I can able to write into blob file successfuly but while reading from blob it gives me below exception -
await registryManager.ImportDevicesAsync(containerSasUri, containerSasUri);
{"{\"Message\":\"ErrorCode:BlobContainerValidationError;Failed to read
devices blob from input container.\",\"ExceptionMessage\":\"Tracking
ID:6f06c1ce39f04494b929a2249ce069f2-G:9-TimeStamp:01/06/2019
11:57:23\"}"}
static string GetContainerSasUri(CloudBlobContainer container)
{
// Set the expiry time and permissions for the container.
// In this case no start time is specified, so the
// shared access signature becomes valid immediately.
var sasConstraints = new SharedAccessBlobPolicy();
sasConstraints.SharedAccessExpiryTime = DateTime.UtcNow.AddHours(24);
sasConstraints.Permissions =
SharedAccessBlobPermissions.Write |
SharedAccessBlobPermissions.Read |
SharedAccessBlobPermissions.Delete | SharedAccessBlobPermissions.Add | SharedAccessBlobPermissions.Create;
// Generate the shared access signature on the container,
// setting the constraints directly on the signature.
string sasContainerToken = container.GetSharedAccessSignature(sasConstraints);
// Return the URI string for the container,
// including the SAS token.
return container.Uri + sasContainerToken;
}
registryManager = RegistryManager.CreateFromConnectionString(connectionString);
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("connection-string");
// Create a blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("test");
CloudBlockBlob blob = container.GetBlockBlobReference("demo123.txt");
var containerSasUri = GetContainerSasUri(container);
// Provision 1,000 more devices
serializedDevices = new List<string>();
for (var i = 0; i < 5; i++)
{
// Create a new ExportImportDevice
// CryptoKeyGenerator is in the Microsoft.Azure.Devices.Common namespace
var deviceToAdd = new ExportImportDevice()
{
Id = i+"look",
Status = DeviceStatus.Enabled,
Authentication = new AuthenticationMechanism()
{
SymmetricKey = new SymmetricKey()
{
PrimaryKey = CryptoKeyGenerator.GenerateKey(32),
SecondaryKey = CryptoKeyGenerator.GenerateKey(32)
}
},
ImportMode = ImportMode.Create
};
// Add device to the list
serializedDevices.Add(JsonConvert.SerializeObject(deviceToAdd));
}
var tt = serializedDevices;
// Write the list to the blob
var sb = new StringBuilder();
serializedDevices.ForEach(serializedDevice => sb.AppendLine(serializedDevice));
//await blob.DeleteIfExistsAsync();
using (CloudBlobStream stream = await blob.OpenWriteAsync())
{
byte[] bytes = Encoding.UTF8.GetBytes(sb.ToString());
for (var i = 0; i < bytes.Length; i += 500)
{
int length = Math.Min(bytes.Length - i, 500);
await stream.WriteAsync(bytes, i, length);
}
}
// Call import using the blob to add new devices
// Log information related to the job is written to the same container
// This normally takes 1 minute per 100 devices
JobProperties importJob =
await registryManager.ImportDevicesAsync(containerSasUri, containerSasUri);
try to setup:
sasConstraints.SharedAccessStartTime = DateTimeOffset.UtcNow.AddMinutes(-5);
Update:
The default input blob name is devices.txt. In your implementation, the name of the input blob is demo123.txt, so you have to change:
await registryManager.ImportDevicesAsync(containerSasUri, containerSasUri, "demo123.txt");

ExecuteAsync() of Azure Table Storage failing to insert all the records

I am trying to insert 10000 records into Azure table storage. I am using ExecuteAsync() to achieve it, but somehow approximately around 7500 records are inserted and rest of the records are lost. I am purposely not using await keyword because I don't want to wait for the result, just want to store them in the table. Below is my code snippet.
private static async void ConfigureAzureStorageTable()
{
CloudStorageAccount storageAccount =
CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
TableResult result = new TableResult();
CloudTable table = tableClient.GetTableReference("test");
table.CreateIfNotExists();
for (int i = 0; i < 10000; i++)
{
var verifyVariableEntityObject = new VerifyVariableEntity()
{
ConsumerId = String.Format("{0}", i),
Score = String.Format("{0}", i * 2 + 2),
PartitionKey = String.Format("{0}", i),
RowKey = String.Format("{0}", i * 2 + 2)
};
TableOperation insertOperation = TableOperation.Insert(verifyVariableEntityObject);
try
{
table.ExecuteAsync(insertOperation);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
Is anything incorrect with the usage of the method?
You still want to await table.ExecuteAsync(). That will mean that ConfigureAzureStorageTable() returns control to the caller at that point, which can continue executing.
The way you have it in the question, ConfigureAzureStorageTable() is going to continue past the call to table.ExecuteAsync() and exit, and things like table will go out of scope, while the table.ExecuteAsync() task is still not complete.
There are plenty of caveats about using async void on SO and elsewhere that you will also need to consider. You could just as easily have your method as async Task but not await it in the caller yet, but keep the returned Task around for clean termination, etc.
Edit: one addition - you almost certainly want to use ConfigureAwait(false) on your await there, as you don't appear to need to preserve any context. This blog post has some guidelines on that and async in general.
According to your requirement, I have tested your scenario on my side by using CloudTable.ExecuteAsync and CloudTable.ExecuteBatchAsync successfully. Here is my code snippet about using CloudTable.ExecuteBatchAsync to insert records to Azure Table Storage, you could refer to it.
Program.cs Main
class Program
{
static void Main(string[] args)
{
CloudStorageAccount storageAccount =
CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
TableResult result = new TableResult();
CloudTable table = tableClient.GetTableReference("test");
table.CreateIfNotExists();
//Generate records to be inserted into Azure Table Storage
var entities = Enumerable.Range(1, 10000).Select(i => new VerifyVariableEntity()
{
ConsumerId = String.Format("{0}", i),
Score = String.Format("{0}", i * 2 + 2),
PartitionKey = String.Format("{0}", i),
RowKey = String.Format("{0}", i * 2 + 2)
});
//Group records by PartitionKey and prepare for executing batch operations
var batches = TableBatchHelper<VerifyVariableEntity>.GetBatches(entities);
//Execute batch operations in parallel
Parallel.ForEach(batches, new ParallelOptions()
{
MaxDegreeOfParallelism = 5
}, (batchOperation) =>
{
try
{
table.ExecuteBatch(batchOperation);
Console.WriteLine("Writing {0} records", batchOperation.Count);
}
catch (Exception ex)
{
Console.WriteLine("ExecuteBatch throw a exception:" + ex.Message);
}
});
Console.WriteLine("Done!");
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
}
TableBatchHelper.cs
public class TableBatchHelper<T> where T : ITableEntity
{
const int batchMaxSize = 100;
public static IEnumerable<TableBatchOperation> GetBatches(IEnumerable<T> items)
{
var list = new List<TableBatchOperation>();
var partitionGroups = items.GroupBy(arg => arg.PartitionKey).ToArray();
foreach (var group in partitionGroups)
{
T[] groupList = group.ToArray();
int offSet = batchMaxSize;
T[] entities = groupList.Take(offSet).ToArray();
while (entities.Any())
{
var tableBatchOperation = new TableBatchOperation();
foreach (var entity in entities)
{
tableBatchOperation.Add(TableOperation.InsertOrReplace(entity));
}
list.Add(tableBatchOperation);
entities = groupList.Skip(offSet).Take(batchMaxSize).ToArray();
offSet += batchMaxSize;
}
}
return list;
}
}
Note: As mentioned in the official document about inserting a batch of entities:
A single batch operation can include up to 100 entities.
All entities in a single batch operation must have the same partition key.
In summary, please try to check whether it could work on your side. Also, you could capture the detailed exception within your console application and capture the HTTP request via Fiddler to catch the HTTP error requests when you inserting records to Azure Table Storage.
How about using a TableBatchOperation to run batches of N inserts at once?
private const int BatchSize = 100;
private static async void ConfigureAzureStorageTable()
{
CloudStorageAccount storageAccount =
CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
TableResult result = new TableResult();
CloudTable table = tableClient.GetTableReference("test");
table.CreateIfNotExists();
var batchOperation = new TableBatchOperation();
for (int i = 0; i < 10000; i++)
{
var verifyVariableEntityObject = new VerifyVariableEntity()
{
ConsumerId = String.Format("{0}", i),
Score = String.Format("{0}", i * 2 + 2),
PartitionKey = String.Format("{0}", i),
RowKey = String.Format("{0}", i * 2 + 2)
};
TableOperation insertOperation = TableOperation.Insert(verifyVariableEntityObject);
batchOperation.Add(insertOperation);
if (batchOperation.Count >= BatchSize)
{
try
{
await table.ExecuteBatchAsync(batchOperation);
batchOperation = new TableBatchOperation();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
if(batchOperation.Count > 0)
{
try
{
await table.ExecuteBatchAsync(batchOperation);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
You can adjust BatchSize to what you need. Small disclaimer: I didn't try to run this, though it should work.
But I can't help but wonder why is your function async void? That should be reserved for event handlers and similar ones where you cannot decide the interface. In most cases you want to return a task. Because now the caller cannot catch exceptions that occur in this function.
async void is not a good practice unless it is an eventhandler.
https://msdn.microsoft.com/en-us/magazine/jj991977.aspx
If you plan to insert many records into azure table storage, batch insert is your best bet.
https://msdn.microsoft.com/en-us/library/azure/microsoft.windowsazure.storage.table.tablebatchoperation.aspx
Keep in mind that it has limit of 100 table operations per batch.
I had the same issue and fixed it by
forcing ExecuteAsync to wait for the results before it exist..
table.ExecuteAsync(insertOperation).GetAwaiter().GetResult()

How to get a list of all the blobs in a container in Azure?

I have the account name and account key of a storage account in Azure. I need to get a list of all the blobs in a container in that account. (The "$logs" container).
I am able to get the information of a specific blob using the CloudBlobClient class but can't figure out how to get a list of all the blobs within the $logs container.
There is a sample of how to list all of the blobs in a container at https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/#list-the-blobs-in-a-container:
// Retrieve the connection string for use with the application. The storage
// connection string is stored in an environment variable on the machine
// running the application called AZURE_STORAGE_CONNECTION_STRING. If the
// environment variable is created after the application is launched in a
// console or with Visual Studio, the shell or application needs to be closed
// and reloaded to take the environment variable into account.
string connectionString = Environment.GetEnvironmentVariable("AZURE_STORAGE_CONNECTION_STRING");
// Create a BlobServiceClient object which will be used to create a container client
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
// Get the container client object
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("yourContainerName");
// List all blobs in the container
await foreach (BlobItem blobItem in containerClient.GetBlobsAsync())
{
Console.WriteLine("\t" + blobItem.Name);
}
Here's the updated API call for WindowsAzure.Storage v9.0:
private static CloudBlobClient _blobClient = CloudStorageAccount.Parse("connectionstring").CreateCloudBlobClient();
public async Task<IEnumerable<CloudAppendBlob>> GetBlobs()
{
var container = _blobClient.GetContainerReference("$logs");
BlobContinuationToken continuationToken = null;
//Use maxResultsPerQuery to limit the number of results per query as desired. `null` will have the query return the entire contents of the blob container
int? maxResultsPerQuery = null;
do
{
var response = await container.ListBlobsSegmentedAsync(string.Empty, true, BlobListingDetails.None, maxResultsPerQuery, continuationToken, null, null);
continuationToken = response.ContinuationToken;
foreach (var blob in response.Results.OfType<CloudAppendBlob>())
{
yield return blob;
}
} while (continuationToken != null);
}
Update for IAsyncEnumerable
IAsyncEnumerable is now available in .NET Standard 2.1 and .NET Core 3.0
private static CloudBlobClient _blobClient = CloudStorageAccount.Parse("connectionstring").CreateCloudBlobClient();
public async IAsyncEnumerable<CloudAppendBlob> GetBlobs()
{
var container = _blobClient.GetContainerReference("$logs");
BlobContinuationToken continuationToken = null;
//Use maxResultsPerQuery to limit the number of results per query as desired. `null` will have the query return the entire contents of the blob container
int? maxResultsPerQuery = null;
do
{
var response = await container.ListBlobsSegmentedAsync(string.Empty, true, BlobListingDetails.None, maxResultsPerQuery, continuationToken, null, null);
continuationToken = response.ContinuationToken;
foreach (var blob in response.Results.OfType<CloudAppendBlob>())
{
yield return blob;
}
} while (continuationToken != null);
}
Using the new package Azure.Storage.Blobs
BlobServiceClient blobServiceClient = new BlobServiceClient("YourStorageConnectionString");
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("YourContainerName");
var blobs = containerClient.GetBlobs();
foreach (var item in blobs){
Console.WriteLine(item.Name);
}
Since you container name is $logs, so i think your blob type is append blob. Here's a method to get all blobs and return IEnumerable:
private static CloudBlobClient _blobClient = CloudStorageAccount.Parse("connectionstring").CreateCloudBlobClient();
public IEnumerable<CloudAppendBlob> GetBlobs()
{
var container = _blobClient.GetContainerReference("$logs");
BlobContinuationToken continuationToken = null;
do
{
var response = container.ListBlobsSegmented(string.Empty, true, BlobListingDetails.None, new int?(), continuationToken, null, null);
continuationToken = response.ContinuationToken;
foreach (var blob in response.Results.OfType<CloudAppendBlob>())
{
yield return blob;
}
} while (continuationToken != null);
}
The method can be asynchronous, just use ListBlobsSegmentedAsync. One thing you need to note is the argument useFlatBlobListing need to be true which means that ListBlobs will return a flat list of files as opposed to a hierarchical list.
Use ListBlobsSegmentedAsync which returns a segment of the total result set and a continuation token.
ref:https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet?tabs=windows
In WebApI -> Swagger
[HttpGet(nameof(GetFileList))]
public async Task<IActionResult> GetFileList()
{
BlobServiceClient blobServiceClient = new BlobServiceClient(_configuration.GetValue<string>("BlobConnectionString"));
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(_configuration.GetValue<string>("BlobContainerName"));
var blobs = containerClient.GetBlobs();
return Ok(blobs);
}

Windows Azure - Cleaning Up The WADLogsTable

I've read conflicting information as to whether or not the WADLogsTable table used by the DiagnosticMonitor in Windows Azure will automatically prune old log entries.
I'm guessing it doesn't, and will instead grow forever - costing me money. :)
If that's the case, does anybody have a good code sample as to how to clear out old log entries from this table manually? Perhaps based on timestamp? I'd run this code from a worker role periodically.
The data in tables created by Windows Azure Diagnostics isn't deleted automatically.
However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.
PS D:\> help Clear-WindowsAzureLog
NAME
Clear-WindowsAzureLog
SYNOPSIS
Removes Windows Azure trace log data from a storage account.
SYNTAX
Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD
evelopmentStorage] [-StorageAccountCredentials ] []
Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt
c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>]
[-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc
ountAndKey>] [<CommonParameters>]
You need to specify -ToUtc parameter, and all logs before that date will be deleted.
If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.
Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.
Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).
public static void TruncateDiagnostics(CloudStorageAccount storageAccount,
DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");
var query = new TableQuery();
var dt = startDateTime;
while (true)
{
dt = stepFunction(dt);
if (dt>finishDateTime)
break;
var l = dt.Ticks;
string partitionKey = "0" + l;
query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
query.Select(new string[] {});
var items = cloudTable.ExecuteQuery(query).ToList();
const int chunkSize = 200;
var chunkedList = new List<List<DynamicTableEntity>>();
int index = 0;
while (index < items.Count)
{
var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
chunkedList.Add(items.GetRange(index, count));
index += chunkSize;
}
foreach (var chunk in chunkedList)
{
var batches = new Dictionary<string, TableBatchOperation>();
foreach (var entity in chunk)
{
var tableOperation = TableOperation.Delete(entity);
if (batches.ContainsKey(entity.PartitionKey))
batches[entity.PartitionKey].Add(tableOperation);
else
batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
}
foreach (var batch in batches.Values)
cloudTable.ExecuteBatch(batch);
}
}
}
You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan. http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx
Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)
It does use a table scan to get the data but if run say once per day would not be too painful:
/// <summary>
/// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
/// </summary>
/// <param name="storageAccount"></param>
/// <param name="keepThreshold"></param>
public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
{
try
{
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");
TableQuery query = new TableQuery();
query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
var items = cloudTable.ExecuteQuery(query).ToList();
Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
foreach (var entity in items)
{
TableOperation tableOperation = TableOperation.Delete(entity);
if (!batches.ContainsKey(entity.PartitionKey))
{
batches.Add(entity.PartitionKey, new TableBatchOperation());
}
batches[entity.PartitionKey].Add(tableOperation);
}
foreach (var batch in batches.Values)
{
cloudTable.ExecuteBatch(batch);
}
}
catch (Exception ex)
{
Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
}
}
Here's my slightly different version of #Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.
static class LogHelper
{
/// <summary>
/// Periodically run a cleanup task for log data, asynchronously
/// </summary>
public static async void TruncateDiagnosticsAsync()
{
while ( true )
{
try
{
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );
// keep a weeks worth of logs
DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );
// do this until we run out of items
while ( true )
{
TableQuery query = new TableQuery();
query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
var items = cloudTable.ExecuteQuery( query ).Take( 1000 );
if ( items.Count() == 0 )
break;
Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
foreach ( var entity in items )
{
TableOperation tableOperation = TableOperation.Delete( entity );
// need a new batch?
if ( !batches.ContainsKey( entity.PartitionKey ) )
batches.Add( entity.PartitionKey, new TableBatchOperation() );
// can have only 100 per batch
if ( batches[entity.PartitionKey].Count < 100)
batches[entity.PartitionKey].Add( tableOperation );
}
// execute!
foreach ( var batch in batches.Values )
await cloudTable.ExecuteBatchAsync( batch );
Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
}
}
catch ( Exception ex )
{
Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
}
// run this once per day
await Task.Delay( TimeSpan.FromDays( 1 ) );
}
}
}
To start the process, just call this from the OnStart method in your worker role.
// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();
If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.
Slightly updated Chriseyre2000's code:
using ExecuteQuerySegmented instead of ExecuteQuery
observing TableBatchOperation limit of 100 operations
purging all Azure tables
public static void TruncateAllAzureTables(CloudStorageAccount storageAccount, DateTime keepThreshold)
{
TruncateAzureTable(storageAccount, "WADLogsTable", keepThreshold);
TruncateAzureTable(storageAccount, "WADCrashDump", keepThreshold);
TruncateAzureTable(storageAccount, "WADDiagnosticInfrastructureLogsTable", keepThreshold);
TruncateAzureTable(storageAccount, "WADPerformanceCountersTable", keepThreshold);
TruncateAzureTable(storageAccount, "WADWindowsEventLogsTable", keepThreshold);
}
public static void TruncateAzureTable(CloudStorageAccount storageAccount, string aTableName, DateTime keepThreshold)
{
const int maxOperationsInBatch = 100;
var tableClient = storageAccount.CreateCloudTableClient();
var cloudTable = tableClient.GetTableReference(aTableName);
var query = new TableQuery { FilterString = $"Timestamp lt datetime'{keepThreshold:yyyy-MM-ddTHH:mm:ss}'" };
TableContinuationToken continuationToken = null;
do
{
var queryResult = cloudTable.ExecuteQuerySegmented(query, continuationToken);
continuationToken = queryResult.ContinuationToken;
var items = queryResult.ToList();
var batches = new Dictionary<string, List<TableBatchOperation>>();
foreach (var entity in items)
{
var tableOperation = TableOperation.Delete(entity);
if (!batches.TryGetValue(entity.PartitionKey, out var batchOperationList))
{
batchOperationList = new List<TableBatchOperation>();
batches.Add(entity.PartitionKey, batchOperationList);
}
var batchOperation = batchOperationList.FirstOrDefault(bo => bo.Count < maxOperationsInBatch);
if (batchOperation == null)
{
batchOperation = new TableBatchOperation();
batchOperationList.Add(batchOperation);
}
batchOperation.Add(tableOperation);
}
foreach (var batch in batches.Values.SelectMany(l => l))
{
cloudTable.ExecuteBatch(batch);
}
} while (continuationToken != null);
}

Categories