Limiting number of calls using Yahoo YQL in C# - c#

I'm a bit new to C# and i'm running into a problem with YQL limiting the number of calls to 10,000 an hour. I keep getting my temp ban everytime I try to run my app. I read that Yahoo has a limit of 10,000 calls per hour but i'm a little confused about what exactly constitutes a "call." The code I"m using to get the XML from YQL is below:
public static string getXml(string sSymbol)
{
XDocument doc = XDocument.Load("http://www.google.com/ig/api?stock=" + sSymbol);
string xmlraw = doc.ToString();
string xml = xmlraw.Replace("'", "");
return xml;
}
Where sSymbol is a value that is returned from my SQL DB. I have roughly 2,000 stocks in my Database. I have also read that some people are saying 1,000 calls per hour so I have has misunderstood what I was reading.
The question I guess is two-fold: What constitutes a calls?
How can I avoid this rate limit if I want to download each of the 2,000 quotes per hour? Is it as simple as asking yahoo for 200 quotes per Load and calling the Load 10 times?

For this case a call is a request. If you want to make single stock requests you need 2000 calls. Fortunately you can make one call requesting more than one stock as with Yahoo.
http://www.google.com/ig/api?stock=MSFT&stock=IBM

Related

Bigquery internalError when streaming data

I'm getting the following error while streaming data:
Google.ApisGoogle.Apis.Requests.RequestError
Internal Error [500]
Errors [
Message[Internal Error] Location[ - ] Reason[internalError] Domain[global]
]
My code:
public bool InsertAll(BigqueryService s, String datasetId, String tableId, List<TableDataInsertAllRequest.RowsData> data)
{
try
{
TabledataResource t = s.Tabledata;
TableDataInsertAllRequest req = new TableDataInsertAllRequest()
{
Kind = "bigquery#tableDataInsertAllRequest",
Rows = data
};
TableDataInsertAllResponse response = t.InsertAll(req, projectId, datasetId, tableId).Execute();
if (response.InsertErrors != null)
{
return true;
}
}
catch (Exception e)
{
throw e;
}
return false;
}
I'm streaming data constantly and many times a day I have this error. How can I fix this?
We seen several problems:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take is an exponential-backoff with retry, even the support told to do so. Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.
There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.
You can check out this chart about your project health:
https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D
About times. Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes that can retry.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.

Batch get items using Parse API times out

I have an array of Object Ids which I need to retrieve from Parse. The size of the array varies greatly, and sometimes there are duplicates. Up until now, I've been prototyping, so I would use
string[] objectIds = new [] { "xT6...
...WhereContainedIn("objectId", objectIds);
And this would work okay. In real life, though, the size of the objectId array above can reach in the hundreds, and the query returns "operation was slow and timed out". I really have two questions here:
1) There has to be a better way to retrieve an array of objects, if you know the object Ids, but I couldn't find it. Is WhereContainedIn() the only solution here?
2) Are there any guidelines for how/when queries will simply fail? The documentation only mentions a limit of 1000 items to be retrieved, and nothing about the query going in. If it turns out that this query has to be batched, that would be okay, but there are no guidelines for batching, either.
So I have never used (or even heard of parse but reading throught the documentation I found this text about the limit maybe it would help.
"You can limit the number of results by calling Limit. By default, results are limited to 100, but anything from 1 to 1000 is a valid limit:"
https://www.parse.com/docs/dotnet_guide#queries-constraints

RightFax C# through RFCOMAPILib - Attachments

I'm trying to send faxes through RightFax in an efficient manner.
My users need to fax PDFs and even though the application is working fine, it is very slow for bulk sending (> 20 recipients, taking abt 40 seconds per fax).
// Fax created
fax.Attachments.Add(#"C:\\Test Attachments\\Products.pdf", BoolType.False);
fax.Send();
RightFax has this concept of *Library Documents, so what I thought we could do was to store a PDF document as a Library Document on the server and then reuse it, so there is no need to upload this PDF for n users.
I can create Library Documents without problems (I can retrieve them, etc.), but how do I add a PDF to this? (I have rights on the server.)
LibraryDocument doc2 = server.LibraryDocuments.Create;
doc2.Description = "Test Doc 1";
doc2.ID = "568"; // tried ints everything!
doc2.IsPublishedForWeb = BoolType.True;
doc2.PageCount = 2;
doc2.Save();
Also, once I created a fax, the API gives you an option to "StoreAsNewLibraryDocument", which is throwing an exception when run. System.ArgumentException: Value does not fall within the expected range
fax.StoreAsNewLibraryDocument("PRODUCTS","the products");
What matters for us is how to send say 500 faxes in the most efficient way possible using the API through RFCOMAPILib. I think that if we can reuse the PDF attached, it would greatly improve perfomance. Clearly, sending a fax in 40 seconds is unacceptable when you have hundreds of recipients.
How do we send faxes with attachments in the most efficient mode through the API?
StoreAsNewLibraryDocument() is the only practical way to store LibraryDocuments using the RightFax COM API, but assuming you're not using a pre-existing LibraryDocument, you have to call the function immediately after sending the first fax, which will have a regular file (not LibraryDoc) attachment.
(Don't create a LibraryDoc object on the server yourself, as you do above - you'd only do that if you have an existing file on the server that isn't a LibraryDocument, and you want to make it into one. You'll probably never encounter such a scenario.)
The new LibraryDocument is then referenced (in subsequent fax attachments) by the ID string you specify as the first argument of StoreAsNewLibraryDocument(). If that ID isn't unique to the RightFax Server's LibraryDocuments collection, you'll get an error. (You could use StoreAsLibraryDocumentUpdate() instead, if you want to actually replace the file on the server.) Also, remember to always specify the AttachmentType.
In theory, this should be all you really have to do:
' First fax:
fax.Attachments.Add(#"C:\\Test Attachments\\Products.pdf", BoolType.False);
fax.Attachments.Item(1).AttachmentType = AttachmentType.aFile;
fax.Send();
fax.StoreAsNewLibraryDocument("PRODUCTS", "The Products");
server.LibraryDocuments("PRODUCTS").IsPublishedForWeb = BoolType.True;
' And for all subsequent faxes:
fax.Attachments.Add(server.LibraryDocuments("PRODUCTS"));
fax.Attachments.Item(1).AttachmentType = AttachmentType.aLibraryDocument;
fax.Send();
The reason I say "in theory" is because this doesn't always work. Sometimes when you call StoreAsNewLibraryDocument() you end up with a LibraryDoc with a PageCount of zero. This happens seemingly at random, and is probably due to a bug in RightFax, or possibly a server misconfiguration. So it's a very good idea to check for...
server.LibraryDocuments("PRODUCTS").PageCount = 0
...before you send any of the subsequent faxes, and if necessary retry until it works, or (if it won't) store the LibraryDoc some other way and give up on StoreAsNewLibraryDocument().
Whereas, if you don't have that problem, you can usually send a mass-fax in about a 1/10th of the time it takes when you attach (and upload) the local file each time.
If someone from OpenText/RightFax reads this and can explain why StoreAsNewLibraryDocument() sometimes results in zero-page faxes, an additional answer about that would be appreciated quite a bit!

real time stock quotes, StreamReader performance optimization

I am working on a program that extracts real time quote for 900+ stocks from a website. I use HttpWebRequest to send HTTP request to the site and
store the response to a stream and open a stream using the following code:
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream stream = response.GetResponseStream ();
StreamReader reader = new StreamReader( stream )
the size of the received HTML is large (5000+ lines), so it takes a long time to parse it and extract the price. For 900 files,
It takes about 6 mins for parsing and extracting. Which my boss isn't happy with, he told me he'd want the whole process to be done in TWO mins.
I've identified the part of the program that takes most of time to finish is parsing and extracting. I've tried to optimize the code to make it faster, the following is
what I have now after some optimization:
// skip lines at the top
for(int i=0;i<1500;++i)
reader.ReadLine();
// read the line that contains the price
string theLine = reader.ReadLine();
// ... extract the price from the line
now it takes about 4 mins to process all the files, there is still a significant gap to what my boss's expecting. So I am wondering, is there other way that I
can further speed up the parsing and extracting and have everything done within 2 mins?
I was doing HTML screen scraping for a while with stock quotes but I found that Yahoo offers a great simple web service that is much better that loading websites.
http://www.gummy-stuff.org/Yahoo-data.htm
With this service you can request up to 100 stock quotes in a single request and it returns a csv formatted response with one line for every symbol. You can set what columns you want returned in the query string of the request. I built a small program that would query the service once a day for every stock in the stock market to get prices. It seemed to work well for me and was way faster than hitting websites for the data.
An example querystring would be
http://finance.yahoo.com/d/quotes.csv?s=GE&f=nkqwxyr1l9t5p4
Which returns text of
"GENERAL ELEC CO",32.98,"Jun 26","21.30 - 32.98","NYSE",2.66,"Jul 25",28.55,"Jul 3","-0.21%"
for(int i=0;i<1500;++i)
reader.ReadLine();
this particulary is not good. ReadLine reads all line and stores it somewhere, but no one uses it. Extra work for GC. Read byte-by-byte and catch \D \A.
Then don't use StreamReader at all! It is fat overhead, read from stream.
Hard to see how this is possible, StreamReader is blindingly fast compared to HttpWebRequest. Some basic assumptions: say you are downloading 900 files with 5000 lines, 100 chars each in 6 minutes. That means you need to download 900 x 5000 x 100 = 450 Megabytes. In 6 minutes, that requires a bandwidth of 450E6 / 6 / 60 * 8 = 10 Mbps.
What do you have? 10 Mbps is about typical for high-speed Internet service, although you need a server that can sustain this. To get it down to 2 seconds, you'll need to upgrade your service to 30 Mbps. Your boss can fix that.
About the speed improvement you saw: watch out for the cache.
If you really need to have real-time data fast then you should subscribe to the data feeds rather than scrape them off a site.
Alternatively, isn't there some token that you can search for to find the field/data pair(s) you need.
4 minutes sounds ridiculously long for reading in 900 files.

Techniques to make autocomplete on website more responsive

In my website's advanced search screen there are about 15 fields that need an autocomplete field.
Their content is all depending on each other's value (so if one is filled in, the other's content will change depending on the first's value).
Most of the fields have a huge amount of possibilities (1000's of entries at least).
Currently make an ajax call if the user stops typing for half a second. This ajax call makes a quick call to my Lucene index and returns a bunch of JSon objects. The method itself is really fast, but it's the connection and transferring of data that is too slow.
If I look at other sites (say facebook), their autocomplete is instant. I figure they put the possible values in their HTML, so they don't have to do a round trip. But I fear with the amounts of data I'm handling, this is not an option.
Any ideas?
Return only top x results.
Get some trends about what users are picking,
and order based on that, preferably
automatically.
Cache results for every URL & keystroke combination,
so that you don't have to round-trip
if you've already fetched the result
before.
Share this cache with all
autocompletes that use the same URL
& keystroke combination.
Of course,
enable gzip compression for the
JSON, and ensure you're setting your
cache headers to cache for some
time. The time depends on your rate
of change of autocomplete response.
Optimize the JSON to send down the
bare minimum. Don't send down
anything you don't need.
Are you returning ALL results for the possibilities or just the top 10 as json objects.
I notice a lot of people send large numbers of results back to the screen, but then only show the first few. By sending back small numbers of results, you can reduce the data transfer.
Return the top "X" results, rather than the whole list, to cut back on the number of options? You might also want to try and put in some trending to track what users pick from the list so you can try and make the top "X" the most used/most relvant. You could always return your most relevant list first, then return the full list if they are still struggling.
In addition to limiting the set of results to a top X set consider enabling caching on the responses of the AJAX requests (which means using GET and keeping the URL simple).
Its amazing how often users will backspace then end up retyping exactly the same content. Also by allowing public and server-side caching your could speed up the overall round-trup time.
Cache the results in System.Web.Cache
Use a Lucene cache
Use GET not POST as IE caches this
Only grab a subset of results (10 as people suggest)
Try a decent 3rd party autocomplete widget like the YUI one
Returning the top-N entries is a good approach. But if you want/have to return all the data, I would try and limit the data being sent and the JSON object itself.
For instance:
"This Here Company With a Long Name" becomes "This Here Company..." (you put the dots in the name client side--again; transfer a minimum of data).
And as far as the JSON object goes:
{n: "This Here Company", v: "1"}
... Where "n" would be the name and "v" would be the value.

Categories