I'm streaming data into BQ with .NET API. And I noticed in Process Explorer that new TCP/IP connections are created and ended over and over again. I'm wondering if it's possible to reuse the connection and avoid big overhead of connection creation and end?
public async Task InsertAsync(BaseBigQueryTable table, IList<IDictionary<string, object>> rowList, GetBqInsertIdFunction getInsert,CancellationToken ct)
{
if (rowList.Count == 0)
{
return;
}
string tableId = table.TableId;
IList<TableDataInsertAllRequest.RowsData> requestRows = rowList.Select(row => new TableDataInsertAllRequest.RowsData {Json = row,InsertId = getInsert(row)}).ToList();
TableDataInsertAllRequest request = new TableDataInsertAllRequest { Rows = requestRows };
bool needCreateTable = false;
BigqueryService bqService = null;
try
{
bqService = GetBigQueryService();
TableDataInsertAllResponse response =
await
bqService.Tabledata.InsertAll(request, _account.ProjectId, table.DataSetId, tableId)
.ExecuteAsync(ct);
IList<TableDataInsertAllResponse.InsertErrorsData> insertErrors = response.InsertErrors;
if (insertErrors != null && insertErrors.Count > 0)
{
//handling errors, removed for easier reading..
}
}catch{
//... removed for easier reading
}
finally
{
if (bqService != null)
bqService.Dispose();
}
}
private BigqueryService GetBigQueryService()
{
return new BigqueryService(new BaseClientService.Initializer
{
HttpClientInitializer = _credential,
ApplicationName = _applicationName,
});
}
** Follow up **
The answer given below seems to be the only solution to reduce http connections. however, I found using batch request on large mount of live data streaming could have some limitation. see my another questions on this: Google API BatchRequest: An established connection was aborted by the software in your host machine
Below link documents how to batch API calls together to reduce the number of HTTP connections your client has to make
https://cloud.google.com/bigquery/batch
After batch request is issued, you can get response and parse out all involved jobids. As an alternative you can preset jobids in batch request for each and every inner request. Note: you need to make sure those jobids are unique
After that you can check what is going on with each of these jobs via jobs.get https://cloud.google.com/bigquery/docs/reference/v2/jobs/get
Related
I'm trying to use the Microsoft.Bot.Connector.DirectLine .NET client to connect to my Direct Line Channel. My client application will have many conversations open at once (like 1000+).
What I'm trying to do is efficiently create a single Direct Line client object which can receive messages for all my conversations and NOT have a single client per conversation.
This below code is from:
https://learn.microsoft.com/en-us/azure/bot-service/bot-service-channel-directline-extension-net-client?view=azure-bot-service-4.0
The problem is that to create a new conversation I need to create a new client which I think would eventually exhaust use up a lot of sockets. Does anyone know if I can create a single connection and then listen for multiple conversations?
Thanks
static async Task Main(string[] args)
{
Console.WriteLine("What is your name:");
var UserName = Console.ReadLine();
var tokenClient = new DirectLineClient(
new Uri(endpoint),
new DirectLineClientCredentials(secret));
var conversation = await tokenClient.Tokens.GenerateTokenForNewConversationAsync();
var client = new DirectLineClient(
new Uri(endpoint),
new DirectLineClientCredentials(conversation.Token));
await client.StreamingConversations.ConnectAsync(
conversation.ConversationId,
ReceiveActivities);
var startConversation = await client.StreamingConversations.StartConversationAsync();
var from = new ChannelAccount() { Id = startConversation.ConversationId, Name = UserName };
var message = Console.ReadLine();
while (message != "end")
{
try
{
var response = await client.StreamingConversations.PostActivityAsync(
startConversation.ConversationId,
new Activity()
{
Type = "message",
Text = message,
From = from,
ChannelData = new Common.ChannelData() { FromNumber = "+17081234567"}
});
}
catch (OperationException ex)
{
Console.WriteLine(
$"OperationException when calling PostActivityAsync: ({ex.StatusCode})");
}
message = Console.ReadLine();
}
Console.ReadLine();
}
public static void ReceiveActivities(ActivitySet activitySet)
{
if (activitySet != null)
{
foreach (var a in activitySet.Activities)
{
if (a.Type == ActivityTypes.Message && a.From.Id == "MyBotName")
{
Console.WriteLine($"<Bot>: {a.Text}");
}
}
}
}
I think using the Direct Line streaming extensions would be problematic for your purposes. I'm guessing your custom SMS channel would itself be an app service. Since an app service can (and probably should, in your case) be scaled so that multiple instances are running simultaneously, suppose two SMS messages from the same conversation go to two instances of your channel. In addition to having each instance of your channel using many web sockets to talk to many bots, multiple instances of your channel may use duplicated web sockets to talk to the same bot. There's also the problem of each bot itself needing to support streaming extensions.
Rather than using using Direct Line streaming extensions, you might consider using traditional Direct Line. This would involve receiving activities from the bots by polling a Direct Line endpoint.
Since Direct Line is a channel itself that you'd be using on top of your own channel, you might also consider cutting out Direct Line altogether. That way you wouldn't have two channels between the user and the bot. You could send HTTP requests to each bot's endpoint directly, and the activities the bots would receive would contain the service URL for your channel, allowing your channel to receive messages from the bots.
I am using azure cosmos db with .net core 2.1 application. I am using gremlin driver with this. It's working fine but after every few days it start throwing socket exception on server and we have to recycle IIS pool. Average per day hits are 10000.
Now we are using default gateway mode. Should we have to switch to direct mode as it might be a firewall issue ?
Here is the implementation:
private DocumentClient GetDocumentClient( CosmosDbConnectionOptions configuration)
{
_documentClient = new DocumentClient(
new Uri(configuration.Endpoint),
configuration.AuthKey,
new ConnectionPolicy());
//create database if not exists
_documentClient.CreateDatabaseIfNotExistsAsync(new Database { Id = configuration.Database });
return _documentClient;
}
and in startup.cs:
services.AddSingleton(x => GetDocumentClient(cosmosDBConfig));
and here is how we are communicating with cosmos db:
private DocumentClient _documentClient;
private DocumentCollection _documentCollection;
private CosmosDbConnectionOptions _cosmosDBConfig;
public DocumentCollectionFactory(DocumentClient documentClient, CosmosDbConnectionOptions cosmosDBConfig)
{
_documentClient = documentClient;
_cosmosDBConfig = cosmosDBConfig;
}
public async Task<DocumentCollection> GetProfileCollectionAsync()
{
if (_documentCollection == null)
{
_documentCollection = await _documentClient.CreateDocumentCollectionIfNotExistsAsync(
UriFactory.CreateDatabaseUri(_cosmosDBConfig.Database),
new DocumentCollection { Id = _cosmosDBConfig.Collection },
new RequestOptions { OfferThroughput = _cosmosDBConfig.Throughput });
return _documentCollection;
}
return _documentCollection;
}
and then:
public async Task CreateProfile(Profile profile)
{
var graphCollection = await _graphCollection.GetProfileCollectionAsync();
var createQuery = GetCreateQuery(profile);
IDocumentQuery<dynamic> query = _documentClient.CreateGremlinQuery<dynamic>(graphCollection, createQuery);
if(query.HasMoreResults)
{
await query.ExecuteNextAsync();
}
}
I'm assuming that for communication with CosmosDB you are using HttpClient. The application should share a single instance of HttpClient.
Every time you make a connection after HttpClient disposal there are still a bunch of connections in the state of TIME_WAIT. This means that the connection was closed on one side ( OS ) but it is in "waiting for additional packets" state.
By default, Windows may hold this connection in this state for 240 seconds. There is a limit to how quickly OS can open new sockets. All this may lead to System.Net.Sockets.SocketException exception.
Very good article that explains in details why and how this problem appears digging into TCP diagram and explaining with more details.
UPDATED
Possible solution.
You are using the default ConnectionPolicy object. That object has a property called IdleTcpConnectionTimeout which controls the amount of idle time after which unused connections are closed. By default, idle connections are kept open indefinitely. The value must be greater than or equal to 10 minutes.
So the code could look like:
private DocumentClient GetDocumentClient( CosmosDbConnectionOptions configuration)
{
_documentClient = new DocumentClient(
new Uri(configuration.Endpoint),
configuration.AuthKey,
new ConnectionPolicy() {
IdleTcpConnectionTimeout = new TimeSpan(0,0,10,0)
});
//create database if not exists
_documentClient.CreateDatabaseIfNotExistsAsync(new Database { Id = configuration.Database });
return _documentClient;
}
Here is a link to ConnectionPolicy Class documentation
I want to get an alert when a service (grafana or influxdb) in an Azure virtual machine (Ubuntu 16.04) has stopped. I'd like to use c# to connect to the VM and check the status of grafana and influxdb services. Can anyone share a code sample that implements this?
Both services provide health endpoints that can be used to check their status from a remote server. There's no need to open a remote shell connection. In fact, it would be impossible to monitor large server farms if one had to SSH to each one.
In the simplest case, and ignoring networking issues, one can simply hit the health endpoints to check the status of both services. A rough implementation could look like this :
public async Task<bool> CheckBoth()
{
var client = new HttpClient
{
Timeout = TimeSpan.FromSeconds(30)
};
const string grafanaHealthUrl = "https://myGrafanaURL/api/health";
const string influxPingUrl = "https://myInfluxURL/ping";
var (grafanaOK, grafanaError) = await CheckAsync(client, grafanaHealthUrl,
HttpStatusCode.OK, "Grafana error");
var (influxOK, influxError) = await CheckAsync(client, influxPingUrl,
HttpStatusCode.NoContent,"InfluxDB error");
if (!influxOK || !grafanaOK)
{
//Do something with the errors
return false;
}
return true;
}
public async Task<(bool ok, string result)> CheckAsync(HttpClient client,
string healthUrl,
HttpStatusCode expected,
string errorMessage)
{
try
{
var status = await client.GetAsync(healthUrl);
if (status.StatusCode != expected)
{
//Failure message, get it and log it
var statusBody = await status.Content.ReadAsStringAsync();
//Possibly log it ....
return (ok: false, result: $"{errorMessage}: {statusBody}");
}
}
catch (TaskCanceledException)
{
return (ok: false, result: $"{errorMessage}: Timeout");
}
return (ok: true, "");
}
Perhaps a better solution would be to use Azure Monitor to ping the health URLs periodically and send an alert if they are down.
Here is something you can use to connect to Azure linux using SSH in c#
using (var client = new SshClient("my-vm.cloudapp.net", 22, "username", "password​"))
{
client.Connect();
Console.WriteLine("it worked!");
client.Disconnect();
Console.ReadLine();
}
Usually SSH server only allow public key auth or other two factor auth.
Change your /etc/ssh/sshd_configuncomment #PasswordAuthentication yes
# Change to no to disable tunnelled clear text passwords
#PasswordAuthentication yes
Later you can poll for installed services.
Also for an alternative solution, you can deploy a rest api in your linux VM to check the status of your service and the call it from C# httpclient for the status.
Hope it helps
I'm trying to subscribe to real-time updates with Cloud Firestore in c# using Google.Cloud.Firestore.V1Beta1. I'm using the following code, which receives updates for a short time, until the stream is closed. Has anyone got FirestoreClient.Listen to work?
// Create client
FirestoreClient firestoreClient = FirestoreClient.Create();
// Initialize streaming call, retrieving the stream object
FirestoreClient.ListenStream duplexStream = firestoreClient.Listen();
// Create task to do something with responses from server
Task responseHandlerTask = Task.Run(async () =>
{
IAsyncEnumerator<ListenResponse> responseStream = duplexStream.ResponseStream;
while (await responseStream.MoveNext())
{
ListenResponse response = responseStream.Current;
Console.WriteLine(response);
}
});
// Send requests to the server
var citiesPath = string.Format("projects/{0}/databases/{1}/documents/cities/CJThcwCipOtIEAm2tEMY", projectId, databaseId);
// Initialize a request
var dt = new DocumentsTarget { };
dt.Documents.Add(citiesPath);
ListenRequest request = new ListenRequest
{
Database = new DatabaseRootName(projectId, databaseId).ToString(),
AddTarget = new Target
{
Documents = dt
}
};
// Stream a request to the server
await duplexStream.WriteAsync(request);
// Await the response handler.
// This will complete once all server responses have been processed.
Console.WriteLine("Awaiting responseHandlerTask");
await responseHandlerTask;
Edit 1:
I've tried setting the expiration explicitly to never expire, but still no luck, I get 5 minutes in then receive a RST_STREAM.
//Setup no expiration for the listen
CallSettings listenSettings = CallSettings.FromCallTiming(CallTiming.FromExpiration(Expiration.None));
// Initialize streaming call, retrieving the stream object
FirestoreClient.ListenStream duplexStream = firestoreClient.Listen(listenSettings);
Edit 2:
It seems like a bit of a kludge, but I found it works to keep track of the last resetToken, catch the exception, then restart the request with the request token. I've updated the code that makes the original request to take an optional resumeToken.
ListenRequest request = new ListenRequest
{
Database = new DatabaseRootName(projectId, databaseId).ToString(),
AddTarget = new Target
{
Documents = dt
}
};
if (resumeToken != null)
{
Console.WriteLine(string.Format("Resuming a listen with token {0}", resumeToken.ToBase64()));
request.AddTarget.ResumeToken = resumeToken;
}
// Stream a request to the server
await duplexStream.WriteAsync(request);
It's not perfect, but I think it's the way Google implemented it in Node.js. It does result in an API call every 5 minutes, so there is some expense to it. Maybe that's the why it works this way?
Thanks
Until Jon finishes the official support, you can use something I put together if you need it right away. https://github.com/cleversolutions/FirebaseDotNetRamblings/blob/master/FirebaseDocumentListener.cs Its an extension method you can drop into your project and use like this:
//Create our database connection
FirestoreDb db = FirestoreDb.Create(projectId);
//Create a query
CollectionReference collection = db.Collection("cities");
Query qref = collection.Where("Capital", QueryOperator.Equal, true);
//Listen to realtime updates
FirebaseDocumentListener listener = qref.AddSnapshotListener();
//Listen to document changes
listener.DocumentChanged += (obj, e) =>
{
var city = e.DocumentSnapshot.Deserialize<City>();
Console.WriteLine(string.Format("City {0} Changed/Added with pop {1}", city.Name, city.Population));
};
I am currently trying to get a lot of data about video games out of Wikipedia using their public API. I've gotten some of the way. I can currently get all the pageid I need with their associated article title. But then I need to get their Unique Identifiers (Qxxxx where x are numbers) and that takes quite a while...possibly because I have to make single queries for every title (there are 22031) or because I don't understand Wikipedia Queries.
So I thought "Why not just make multiple queries at once?" so I started working on that, but I've run into the issue in the title. After the program has run for a while (usually 3-4 minutes) about a minute passes then the application crashes with the error in the title. I think it's because my approach is just bad:
ConcurrentBag<Entry> entrybag = new ConcurrentBag<Entry>(entries);
Console.WriteLine("Getting Wikibase Item Ids...");
Parallel.ForEach<Entry>(entrybag, (entry) =>
{
entry.WikibaseItemId = GetWikibaseItemId(entry).Result;
});
Here is the method that is called:
async static Task<String> GetWikibaseItemId(Entry entry)
{
using (var client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate }))
{
client.BaseAddress = new Uri("https://en.wikipedia.org/w/api.php");
entry.Title.Replace("+", "Plus");
entry.Title.Replace("&", "and");
String queryString = "?action=query&prop=pageprops&ppprop=wikibase_item&format=json&redirects=1&titles=" + entry.Title;
HttpResponseMessage response = await client.GetAsync(queryString);
response.EnsureSuccessStatusCode();
String result = response.Content.ReadAsStringAsync().Result;
dynamic deserialized = JsonConvert.DeserializeObject(result);
String data = deserialized.ToString();
try
{
if (data.Contains("wikibase_item"))
{
return deserialized["query"]["pages"]["" + entry.PageId + ""]["pageprops"]["wikibase_item"].ToString();
}
else
{
return "NONE";
}
}
catch (RuntimeBinderException)
{
return "NULL";
}
catch (Exception)
{
return "ERROR";
}
}
}
And just for good measure, here is the Entry Class:
public class Entry
{
public EntryCategory Category { get; set; }
public int PageId { get; set; }
public String Title { get; set; }
public String WikibaseItemId { get; set; }
}
Could anyone perhaps help out? Do I just need to change how I query or something else?
Initiating roughly 22000 http requests in parallel from one process is just too much. If your machine had unlimited resources and internet connection bandwidth, this would come close to a denial-of-service attack.
What you see is either TCP/IP port exhaustion or queue contention. To resolve it, process your array in smaller chunks, for example fetch 10 items, process those in parallel, then fetch the next ten, and so on.
Specifically Wikimedia sites have a recommendation to process requests serially:
There is no hard and fast limit on read requests, but we ask that you be considerate and try not to take a site down. Most sysadmins reserve the right to unceremoniously block you if you do endanger the stability of their site.
If you make your requests in series rather than in parallel (i.e. wait for the one request to finish before sending a new request, such that you're never making more than one request at the same time), then you should definitely be fine.
Be sure to check their API terms of service to learn whether and how many parallel requests would be in compliance.