Azure File System - Can I "Watch" or only poll? - c#

I am an experienced windows C# developer, but new to the world of Azure, and so trying to figure out a "best practice" as I implement one or more Azure Cloud Services.
I have a number of (external, and outside of my control) sources that can all save files to a folder (or possibly a set of folders). In the current state of my system under Windows, I have a FileSystemWatcher set up to monitor a folder and raise an event when a file appears there.
In the world of Azure, what is the equivalent way to do this? Or is there?
I am aware I could create a timer (or sleep) to pass some time (say 30 seconds), and poll the folder, but I'm just not sure that's the "best" way in a cloud environment.
It is important to note that I have no control over the inputs - in other words the files are saved by an external device over which I have no control; so I can't, for example, push a message onto a queue when the file is saved, and respond to that message...
Although, in the end, that's the goal... So I intend to have a "Watcher" service which will (via events or polling) detect the presence of one or more files, and push a message onto the appropriate queue for the next step in my workflow to respond to.
It should be noted that I am using VS2015, and the latest Azure SDK stuff, so I'm not limited by anything legacy.
What I have so far is basically this (a snippet of a larger code base):
storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create a CloudFileClient object for credentialed access to File storage.
fileClient = storageAccount.CreateCloudFileClient();
// Obtain the file share name from the config file
string sharenameString = CloudConfigurationManager.GetSetting("NLRB.Scanning.FileSharename");
// Get a reference to the file share.
share = fileClient.GetShareReference(sharenameString);
// Ensure that the share exists.
if (share.Exists())
{
Trace.WriteLine("Share exists.");
// Get a reference to the root directory for the share.
rootDir = share.GetRootDirectoryReference();
//Here is where I want to start watching the folder represented by rootDir...
}
Thanks in advance.

If you're using an attached disk (or local scratch disk), the behavior would be like on any other Windows machine, so you'd just set up a file watcher accordingly with FileSystemWatcher and deal with callbacks as you normally would.
There's Azure File Service, which is SMB as-a-service and would support any actions you'd be able to do on a regular SMB volume on your local network.
There's Azure blob storage. These can not be watched. You'd have to poll for changes to, say, a blob container.

You could create a loop that polls the root directory periodically using
CloudFileDirectory.ListFilesAndDirectories method.
https://msdn.microsoft.com/en-us/library/dn723299.aspx
You could also write a small recursive method to call this in sub directories.
To detect differences you can build up an in memory hash map of all files and directories. If you want something like a persistent distributed cache then you can use ie. Redis to keep this list of files/directories. Every time you poll if the file or directory is not in your list then you detected a new file/ directory under root.
You could separate the responsibility of detection and business logic ie. a worker role keeps polling the directory and writes the new files to a queue and the consumer end another worker role/ web service that does the processing with that information.

Azure Blob Storage pushes events through Azure Event Grid. Blob storage has two event types, Microsoft.Storage.BlobCreated and Microsoft.Storage.BlobDeleted. So instead of long polling you can simply react to the created event.
See this link for more information:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-overview

I had a very similar requirement. I used BOX application. It has a Webhook feature for events occurring in Files or Folders: such as Add, Move, Delete etc..
Also there are some newer alternatives with Azure Autromation.

I'm pretty new to Azure too, and actually I'm investigating a file watcher type thing. I'm considering something involving Azure Functions, because of this, which looks like a way of triggering some code when a blog is created or updated. There's a way of specifying a pattern too: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob

Related

Create and access a sqlite-DB on Azurestorage via storage connection string from different Application servers

I have a Webapp for different customers-DBs which runs on several Application Servers. The customers are each assigned to an instance on a AS.
For several of the customers, certain data is saved in additional SQLite-DBs on the Application Servers themselves; when this kind of data is added, the WebApp tests whether the according SQLite-DB already exists on this AS and if not, it creates it by using the following code:
dbFileName = "C:\\" + dbFileName;
SQLiteConnection.CreateFile(dbFileName);
using (System.Data.SQLite.SQLiteConnection con = new System.Data.SQLite.SQLiteConnection("data source=" + dbFileName))
{
using (System.Data.SQLite.SQLiteCommand com = new System.Data.SQLite.SQLiteCommand(con))
{
con.Open();
The problem is that if I assign the customer to an instance on another AS, the SQLite-db has to be created again, since it can't access the one on the other AS.
Now my idea was to create the SQLite-dbs on some azure storage, where I could access it from every AS, but so far I'm not able to access it via a SQLite-Connection.
I have knowledge of my specific SAS (=Shared Access Signature) and Connectionstrings like the ones specified on https://www.connectionstrings.com/windows-azure/
but I'm not sure which part I should use for the SQLiteConnection.
Is it even possible?
The only examples on connections to Azurestorage that I found so far are via HttpRequests (like How to access Azure blob using SAS in C#) which doesn't help me or can anybody show me a way to use this for my problem?
Please tell me if you need more information, I'm kind of bad at explaining things and not taking into account that many things aren't common knowledge...
You can not use Azure storage blob as a normal file system. The data source in SQLite connection string should be a file path or memory.
If you want to use Azure storage blob, as far as I know, you can only mount Blob storage as a file system with blobfuse on Linux OS. But this is not 100% compatible with normal file systems.
Another choice is to use Azure storage file, it supports SMB protocol. You can mount a network driver and use it.

Azure Functions V2 With Service Bus Trigger in Development Team

We have Azure Functions (V2) that have been created with the Service Bus Trigger.
[FunctionName("MyFunctionName")]
public static async Task Run(
[ServiceBusTrigger("%MyQueueName%", Connection = "ServiceBusConnectionString")]
byte[] messageBytes,
TraceWriter log)
{
// code to handle message
}
The queue name is defined in the local.settings.json file:
{
"Values": {
...
"MyQueueName": "local-name-of-my-queue-in-azure",
...
}
}
This works quite well as when deployed we can set the environment variables to be dev-queue-name, live-queue-name etc for the various deployed environments that we have.
However, when more than one developer is connected locally, given that the local.settings.json file is in source control and needs to be to properly maintain the environment variables, then the local function app runners will all connect to the same queue, and it is random as to which developer's application will pick up and process the messages.
What we need is for each developer to have their own queue, but we do not want to have to remove the JSON config file from source control so that we can maintain a different file (as it contains other pieces of information that need updating).
How can we get each developer / computer running our application to have a unique queue name (but known so that we can create the service bus queues in the cloud)?
You can override the setting value via Environment variables. Settings specified as a system environment variable take precedence over values in the local.settings.json file. Just define an Environment variable called MyQueueName.
Having said that, I think that committing local.settings.json to source control is generally not recommended. I suppose you also store your Service Bus connection string there, which means you store your secrets in source control.
Note that default .gitignore file has it listed out.
If you need it in source control, I would commit a version of local.settings.json with all variables with fake values, and then make each developer setup the proper values locally and then ignore the changes on commit (set assume-unchanged).

Initializing Service Fabric Actors using DataPackages

I am building a proof of concept application using Azure Service Fabric and would like to initialize a few 'demo' user actors in my cluster when it starts up. I've found a few brief articles that talk about loading data from a DataPackage, which shows how to load the data itself, but nothing about how to create actors from this data.
Can this be done with DataPackages or is there a better way to accomplish this?
Data packages are just opaque directories with whatever files you want in there for each deployment. It doesn't load or process the data itself, you have to do all the heavy lifting as only your code knows what the data means. For example, if you had a data package named "SvcData", it would deploy the files in that package during deployment. If you had a file StaticDataMaster.json in that directory, you'd be able to access it when you service ran (either in your actor, or somewhere else). For example:
// get the data package
var DataPkg = ServiceInitializationParameters.CodePackageActivationContext.
GetDataPackageObject("SvcData");
// fabric doesn't load data it is just manages for you. data is opaque to Fabric
var customDataFilePath = DataPkg.Path + #"\StaticDataMaster.json";
// TODO: read customDatafilePath, etc.

Continuous Web Job with timer trigger and Blob trigger

I have the following functions in the same web job console app that uses the azure jobs sdk and its extensions. The timed trigger queries an API end point for a file, does some additional work on it and then saves the file to the blob named blahinput. Now the second method "ProcessBlobMessage" is supposed to identify the new blob file in the blahinput and do something with it.
public static void ProcessBlobMessage([BlobTrigger("blahinput/{name}")] TextReader input,
string name, [Blob("foooutput/{name}")] out string output)
{//do something }
public static void QueryAnAPIEndPointToGetFile([TimerTrigger("* */1 * * * *")] TimerInfo timerInfo) { // download a file and save it to blob named blah input}
The problem here is :
When I deploy the above said web job as continuous, only the timer triggered events seems to get triggered while the function that is supposed to identify the new file never gets triggered. Is it not possible to have two such triggers in the same web job?
From this article: How to use Azure blob storage with the WebJobs SDK
The WebJobs SDK scans log files to watch for new or changed blobs. This process is not real-time; a function might not get triggered until several minutes or longer after the blob is created. In addition, storage logs are created on a "best efforts" basis; there is no guarantee that all events will be captured. Under some conditions, logs might be missed. If the speed and reliability limitations of blob triggers are not acceptable for your application, the recommended method is to create a queue message when you create the blob, and use the QueueTrigger attribute instead of the BlobTrigger attribute on the function that processes the blob.
Until the new blob trigger strategy is released, BlobTriggers are not reliable. The trigger is based on Azure Storage Analytics logs which stores logs on a Best-Effort basis.
There is an ongoing Github issue about this and there is also a PR regarding a new Blob scanning strategy.
This being said, check if you are using the Latest Webjobs SDK version 1.1.1 because there was an issue on prior versions that could lead to problems on BlobTriggers.

Write a log file

I have a recent problem . I can upload file in my intetpub/wwwrooot/folder
But I can't write a log file in this same folder ...
I have all the permissions for the network service. Everything is on my server.
DirectoryInfo di = new DirectoryInfo(~/);
// Get a reference to each file in that directory.
FileInfo[] fiArr = di.GetFiles();
string strLogText = di;
// Create a writer and open the file:
StreamWriter log;
if (!System.IO.File.Exists("C:\\inetpub\\wwwroot\\logfile.txt"))
{
log = new StreamWriter("C:\\inetpub\\wwwroot\\logfile.txt");
}
else
{
log = System.IO.File.AppendText("C:\\inetpub\\wwwroot\\logfile.txt");
}
// Write to the file:
log.WriteLine(DateTime.Now);
log.WriteLine(strLogText);
log.WriteLine();
// Close the stream:
log.Close();
The error is the access is denied !
It works locally , but on my server it doesnt. On the folder Inetpub , I just need to allow writting for Network service ? That is strange because I can upload file and writting is already enable
Emged in case of exceptions your code does not close the streams on the log file and this is surely not good.
You should use a using statement around the streams so in any case streams are closed and disposed also in case of exceptions.
As Chris has suggested I would absolutely opt for a logging Framework and I would also avoid writing in that wwwroot folder.
ELMAH or NLog or Log4Net are good and easy alternatives far better than any custom logging lie you are doing right now and the big advantage of these technologies/libraries is that you can change the behaviour at runtime simply by editing the configuration file, no need to rebuild or redeploy anything...
my favourite is actually Log4Net, check these ones for a simple example on how to use it:
http://logging.apache.org/log4net/release/manual/configuration.html
Log4Net in App object?
Depending on the version of your server (windows 2008 and above), that directory has additional protection against writes.
I'd highly recommend you look into ELMAH to do your logging. It gives you a number of options including in memory or database backed and collects a LOT of additional data you might want.
Further, opening up various physical directory locations for write access is a HUGE security no-no.
On the server, is the web app running under an Application Pool that has alternate credentials, other than the normal network service account? If you haven't done so already, try turning on Auditing to see what user is trying to access the file.

Categories