Continuous Web Job with timer trigger and Blob trigger - c#

I have the following functions in the same web job console app that uses the azure jobs sdk and its extensions. The timed trigger queries an API end point for a file, does some additional work on it and then saves the file to the blob named blahinput. Now the second method "ProcessBlobMessage" is supposed to identify the new blob file in the blahinput and do something with it.
public static void ProcessBlobMessage([BlobTrigger("blahinput/{name}")] TextReader input,
string name, [Blob("foooutput/{name}")] out string output)
{//do something }
public static void QueryAnAPIEndPointToGetFile([TimerTrigger("* */1 * * * *")] TimerInfo timerInfo) { // download a file and save it to blob named blah input}
The problem here is :
When I deploy the above said web job as continuous, only the timer triggered events seems to get triggered while the function that is supposed to identify the new file never gets triggered. Is it not possible to have two such triggers in the same web job?

From this article: How to use Azure blob storage with the WebJobs SDK
The WebJobs SDK scans log files to watch for new or changed blobs. This process is not real-time; a function might not get triggered until several minutes or longer after the blob is created. In addition, storage logs are created on a "best efforts" basis; there is no guarantee that all events will be captured. Under some conditions, logs might be missed. If the speed and reliability limitations of blob triggers are not acceptable for your application, the recommended method is to create a queue message when you create the blob, and use the QueueTrigger attribute instead of the BlobTrigger attribute on the function that processes the blob.

Until the new blob trigger strategy is released, BlobTriggers are not reliable. The trigger is based on Azure Storage Analytics logs which stores logs on a Best-Effort basis.
There is an ongoing Github issue about this and there is also a PR regarding a new Blob scanning strategy.
This being said, check if you are using the Latest Webjobs SDK version 1.1.1 because there was an issue on prior versions that could lead to problems on BlobTriggers.

Related

How can I receive an event from Azure Event Hub without using a blob storage container?

I am new to using Azure Event Hubs but I was wondering how I can receive events from the Event Hub without using a blob storage container. Would it be possible to set up event triggers to download the message data whenever a new message is posted (sent)? Would it make sense to use a function like the one below?
[FunctionName("EventHubTriggerCSharp")]
public void Run([EventHubTrigger("samples-workitems", Connection = "EventHubConnectionAppSetting")] string myEventHubMessage, ILogger log)
{
log.LogInformation($"C# function triggered to process a message: {myEventHubMessage}");
}
(Taken from https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs-trigger?tabs=in-process%2Cfunctionsv2%2Cextensionv5&pivots=programming-language-csharp)
Is this the right approach for this problem? Could someone walk me through on why this would/wouldn't work and what exactly is happening here? I couldn't find a better description on the docs itself. Thanks in advance.
Receive events from the Event Hub without using a blob storage container is not possible. To capture event messages, we will have to maintain a checkpoint store.
Default checkpoint store we use is Azure Blob Storage. If not, we need to maintain custom checkpoint manager.
The event processor that supports the Functions trigger needs storage in order to communicate with other instances.
A Function trigger cannot use any of the Event Hubs client library types that can consume messages without storing them first. It can be done by using Event Grid schema.
Utilizing a block blob type storage in this method as well Event Schema
"blobType": "BlockBlob"
Created a sample replica for Sending event messages through the eventhub trigger and able to trigger to portal using blob storage as shown:
Reference:
You can also use EventProcessorHost.

Clearing history while debugging azure durable functions

Durable functions keep a state in storage, this is what makes them work, but it is very troublesome while debugging and developing. I have a large number of runs which have not completed and that the system tries to run again when I start the process. Some of the runs have erroneous data same which causes exceptions while others have been terminated early as something did not work as expected.
I don't want to run all the old cases when starting my application in debug (running against my local storage account). How can I automatically clear all data so only new functions will trigger?
You can use Azure Core Tools to purge the orchestration instance state.
First you need to make sure that the Azure Core Tools is installed for your particular Azure Function version. You can do this using the NPM package manager. (Note that this is for the Azure Functions Version - V3.)
npm install -g azure-functions-core-tools#3
Then open a command prompt in the root directory of your Azure Functions project. The Azure Core Tools requires the host.json file from your project to identify your orchestration instances.
You can use the following to look at all of the available actions:
func durable
You can then purge the instance history using the following:
func durable purge-history
There is now this VsCode extension, which now also has 'Purge Durable Functions History' feature. Type 'Purge Durable Functions History' in your Command Palette - and there you go. If you're not using VsCode, then the same tool is available as a standalone service, that you can either run locally or deploy into Azure.
You may call the PurgeInstanceHistoryAsync method with one of the following:
An orchestration instance ID
[FunctionName("PurgeInstanceHistory")]
public static Task Run(
[DurableClient] IDurableOrchestrationClient client,
[ManualTrigger] string instanceId)
{
return client.PurgeInstanceHistoryAsync(instanceId);
}
Time interval
[FunctionName("PurgeInstanceHistory")]
public static Task Run(
[DurableClient] IDurableOrchestrationClient client,
[TimerTrigger("0 0 12 * * *")]TimerInfo myTimer)
{
return client.PurgeInstanceHistoryAsync(
DateTime.MinValue,
DateTime.UtcNow.AddDays(-30),
new List<OrchestrationStatus>
{
OrchestrationStatus.Completed
});
}
Reference for code snippets above: https://learn.microsoft.com/en-gb/azure/azure-functions/durable/durable-functions-instance-management#purge-instance-history
For everyone else wondering just how on earth to do this.
Install the Microsoft Azure Storage Explorer
Add a connection to azure storage, but choose Local storage emulator
4. Use the defaults / click next.
At this point, Click on Local & Attached in the Explorer. Click on (Emulator Default Ports) (Key) -> Tables. Delete the task hug history table, and relaunch your application.
From this point, its only a matter of dev time to figure out a way to do it programatically.

Azure Functions: CosmosDBTrigger not triggering in Visual Studio

TL;DR: This example is not working for me in VS2017.
I have an Azure Cosmos DB and want to fire some logic when something adds or updates there. For that, CosmosDBTrigger should be great.
Tutorial demonstrates creating trigger in Azure Portal and it works for me. However, doing just the same thing in Visual Studio (15.5.4, latest by now) does not.
I use the default Azure Functions template, predefined Cosmos DB trigger and nearly default code:
[FunctionName("TestTrigger")]
public static void Run(
[CosmosDBTrigger("Database", "Collection", ConnectionStringSetting = "myCosmosDB")]
IReadOnlyList<Document> input,
TraceWriter log)
{
log.Info("Documents modified " + input.Count);
log.Info("First document Id " + input[0].Id);
}
App runs without errors but nothing happens when I actually do stuff in the database. So I cannot debug things and actually implement some required logic.
Connection string is specified in the local.settings.json and is considered. If I deliberately foul it, trigger spits runtime errors.
It all looks like the connection string is to a wrong database. But it is exactly the one, copypasted, string I have in the trigger made via Azure Portal.
Where could I go wrong? What else can I check?
Based on your comment, you were running both portal and local Apps at the same time for the same collection and the same lease collection.
That means both Apps were competing to each other for locks (leases) on collection processing. The portal App won in your case, took the lease, so the local App was sitting doing nothing.

Azure File System - Can I "Watch" or only poll?

I am an experienced windows C# developer, but new to the world of Azure, and so trying to figure out a "best practice" as I implement one or more Azure Cloud Services.
I have a number of (external, and outside of my control) sources that can all save files to a folder (or possibly a set of folders). In the current state of my system under Windows, I have a FileSystemWatcher set up to monitor a folder and raise an event when a file appears there.
In the world of Azure, what is the equivalent way to do this? Or is there?
I am aware I could create a timer (or sleep) to pass some time (say 30 seconds), and poll the folder, but I'm just not sure that's the "best" way in a cloud environment.
It is important to note that I have no control over the inputs - in other words the files are saved by an external device over which I have no control; so I can't, for example, push a message onto a queue when the file is saved, and respond to that message...
Although, in the end, that's the goal... So I intend to have a "Watcher" service which will (via events or polling) detect the presence of one or more files, and push a message onto the appropriate queue for the next step in my workflow to respond to.
It should be noted that I am using VS2015, and the latest Azure SDK stuff, so I'm not limited by anything legacy.
What I have so far is basically this (a snippet of a larger code base):
storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create a CloudFileClient object for credentialed access to File storage.
fileClient = storageAccount.CreateCloudFileClient();
// Obtain the file share name from the config file
string sharenameString = CloudConfigurationManager.GetSetting("NLRB.Scanning.FileSharename");
// Get a reference to the file share.
share = fileClient.GetShareReference(sharenameString);
// Ensure that the share exists.
if (share.Exists())
{
Trace.WriteLine("Share exists.");
// Get a reference to the root directory for the share.
rootDir = share.GetRootDirectoryReference();
//Here is where I want to start watching the folder represented by rootDir...
}
Thanks in advance.
If you're using an attached disk (or local scratch disk), the behavior would be like on any other Windows machine, so you'd just set up a file watcher accordingly with FileSystemWatcher and deal with callbacks as you normally would.
There's Azure File Service, which is SMB as-a-service and would support any actions you'd be able to do on a regular SMB volume on your local network.
There's Azure blob storage. These can not be watched. You'd have to poll for changes to, say, a blob container.
You could create a loop that polls the root directory periodically using
CloudFileDirectory.ListFilesAndDirectories method.
https://msdn.microsoft.com/en-us/library/dn723299.aspx
You could also write a small recursive method to call this in sub directories.
To detect differences you can build up an in memory hash map of all files and directories. If you want something like a persistent distributed cache then you can use ie. Redis to keep this list of files/directories. Every time you poll if the file or directory is not in your list then you detected a new file/ directory under root.
You could separate the responsibility of detection and business logic ie. a worker role keeps polling the directory and writes the new files to a queue and the consumer end another worker role/ web service that does the processing with that information.
Azure Blob Storage pushes events through Azure Event Grid. Blob storage has two event types, Microsoft.Storage.BlobCreated and Microsoft.Storage.BlobDeleted. So instead of long polling you can simply react to the created event.
See this link for more information:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-overview
I had a very similar requirement. I used BOX application. It has a Webhook feature for events occurring in Files or Folders: such as Add, Move, Delete etc..
Also there are some newer alternatives with Azure Autromation.
I'm pretty new to Azure too, and actually I'm investigating a file watcher type thing. I'm considering something involving Azure Functions, because of this, which looks like a way of triggering some code when a blog is created or updated. There's a way of specifying a pattern too: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob

Trigger WebJob at a particular time after record added to a database

I want to trigger an Azure Webjob 24Hours after I have added a record to a database using .NET . Obviously there will be multiple tasks for the Webjob to handle, all at their designated time. Is there a way ( in the Azure Library for .NET) in which i can schedule this tasks ?
I am free to use Message Queues , but I want to try and avoid the unnecessary polling of the WebJob for new messages.
If you want to trigger the execution of a WebJob 24 hours after a record insertion in a SQL database I would definitely use Azure Queues for this. So after you insert the record, just add a message to the queue.
In order to do this you can easily leverage the initialVisibilityDelay property that can be passed to the CloudQueue.AddMessage() method. This will make the message invisible for 24 hours in your case, and then it will appear to be processed by your Webjob. You don't have to schedule anything, just have a Continuous WebJob listening to a queue running.
Here's some sample code:
public void AddMessage(T message, TimeSpan visibilityDelay)
{
var serializedMessage = JsonConvert.SerializeObject(message);
var queue = GetQueueReference(message);
queue.AddMessage(new CloudQueueMessage(serializedMessage), null, visibilityDelay);
}
private static CloudQueue GetQueueReference(T message)
{
var storageAccount = CloudStorageAccount.Parse("Insert connection string");
var queueClient = storageAccount.CreateCloudQueueClient();
var queueReference = queueClient.GetQueueReference("Insert Queue Name");
queueReference.CreateIfNotExists();
return queueReference;
}
Hope this helps
Since the event of adding a record to the database is the trigger here, You can use Azure Management Libraries to create a Azure Scheduler Job to execute after 24hrs from the time the db record is inserted. Azure Scheduler Jobs can do only 3 things : make HTTP/HTTPS requests or Put Message in Queue. Since you do not want to poll queues, here are two options
Deploy the existing Web Job as Wep API where each task is reachable by unique URLs, so that the scheduler task can execute the right HTTP/HTTPS request
Create a new WebAPI/Wep API which takes accepts request (like a man in the middle) and pro-grammatically run the existing web job on demand, again using Azure management libraries.
Please let me know if any of these strategies help.
To invoke a WebJob from your Website,is not good idea rather than you can add the WebJob code inside your Website and simply call that code. you can still easily use the WebJob SDK from inside your Website.
https://github.com/Azure/azure-webjobs-sdk-samples
we wouldn't recommend to invoke the WebJob from your Website is that the invocation contains a secret you rather not store on your Website (deployment credentials).
Recommendation:
To separate WebJob and Website code, the best thing to do is to communicate using a queue, the WebJob listens on the queue and the Website pushes the request to the queue.

Categories