How to upload data to Cloudsearch programmatically .Net

How to upload data to Cloudsearch programmatically .Net - c#

I am attempting to use Cloudsearch in lieu of SQL-based full-text indexing. However, I have had little luck thus far. Their API documentation is just horrendous, with almost no examples and no mention of using the SDK to do it. All they provide are some shoddy command line scripts.
My use-case is that I am decompiling an ALD file and need to store the resulting text data up there. The only listed methods involve using the command line or the web console, which won't do, seeing as I have tens of thousands of documents to manage. Surely there is a way I can pass it an index and some text data via the C# SDK.

You are right, there is not much sample code and I am not a C# programmer so I won't really try to code it but in an effort to get you in the right direction it seems you just need to instantiate an UploadDocumentsRequest object, populate the Documents property then pass it to AmazonCloudSearchDomainClient.UploadDocuments.
Their documentation to upload is http://docs.aws.amazon.com/sdkfornet/v3/apidocs/Index.html
The request is documented here http://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/CloudSearchDomain/TCloudSearchDomainUploadDocumentsRequest.html

I ended up using the Comb wrapper to handle the uploading, as it handles everything pretty handily in .NET. Pretty sure it uses the methods enumerated by dotcomly under the hood.

Related

Raster Band Calculations in C# using gdal_calc

I have two raster files in JP2 format. I need to combine the two and perform a calculation against the bands. Is there any way to do this in .NET and C#? Most references I see to performing this use Gdal's calc function in python.
I have tried utilizing the Gdal.Core and Gdal.Core.WindowsRuntime, but I don't see any wrappers for the Calculate call. Has anyone attempted to do this before, and, if so, how did you manage to make the call, or what library did you use?
Thanks,

In C# you need to do it manually as far as I know, you have to open both Datasets, get the bands that you need, make the calculations on them, and then Create a new output file, writing each of the new data to a different band.
There are some examples in the GDAL/OGR In CSharp page here:
https://trac.osgeo.org/gdal/browser/trunk/gdal/swig/csharp/apps
For rasters you'll need to read carefully GDALReadDirect.cs and GDALDatasetRasterIO.cs
If you really see that what you want to do has a simpler solution in Python I would do that instead.
GIS Stack Exchange is a good place to ask questions on these topics.

Sending control keys with DDeltaSolution UIDeskAutomationSpy

I have been using DDeltaSolution's UIDeskAutomationSpy to enhance some of my Coded UI testing, initially based on the MS Code UI test (cuit) framework.
However there is very limited documentation and even after using dotNetPeek to inspect the internals of the UIDeskAutomationSpy exe and associated dll, I can't see how to send control keys (Shift/Control/Alt) to a component.
There are two relevant methods
SendKeys()
SimulateSendKeys()
but both just take a string as input.
I've even got as far as thinking about trying to use Cecil to try and modify the binaries (is this possible?), but this is a desperate measure. Does anyone know any better, or know of any better documentation?
This is a surprisingly powerful tool, but no-one seems to have heard about it.

I'm not positive, but if I were you I'd try using the list of control key strings found HERE since automation spy is based on .NET. Let me know if it works!

Any tips on how i would go about extract Pandora likes and putting them on a spreadsheet? (C++/C#)

Fairly new to coding and i want a project to work on that could help me advance my skills. I'm not sure what language would be best for this sort of undertaking but i would definitely prefer to use C++ or C#.
For the first part of the program i basically would like to try and take all my pandora likes and put them on a spreadsheet with song name is one column and artist in the other. I don't see the formatting being too hard once i actually get the data i need, but i'm not really sure how to communicate with a server at all in this point in time. I'm guessing i probably won't be able to grab a raw list of likes so the i'm thinking my best course of action will be to first expand the likes list all the way, and then i need to read the text on the screen ro in the source code.
For the first step, expanding my like i found the HTML source code that actually does this:
<div class="show_more tracklike" data-nextLikeStartIndex="0" data-nextThumbStartIndex="5">Show more</div>"
Not sure if this is something i can work with but i was thinking if i could set data-nextThumbStartIndex="5" to be equal to the # of likes - 5 (the amount it shows by default) it would be fairly easy to expand the list. If not i would probably have to click the "show more" link repeatedly until i have all the likes on the page.
For the next step, getting the data i want, i think my best option would be to basically just grab the text that i physically see on the screen and worry about filtering and manipulating the data afterwards. The other option is looking at the source code, which i actually found the pieces of code where the info i want is stored. If i could retrieve the page's source code i think it would be relatively easy to pick out the data i actually want from that.
So yea that's about it, i know i'm pretty noob atm and what i'm saying is probably wrong and/or much more complicated than i think but i'm a pretty quick learner and at the very least if someone could point me in the right direction to communicate with a server that would be much appreciated.

This question is quite "wide" (and I have absolutely no knowledge of Pandora itself - can't access it from where I live).
In general, there are several different ways to solve this type of problem:
Screen Scraping - basically access the website as if you were a web-server, and from the HTML string that comes back, dig out the information you need. The problem here is that the data is not very suitable for "machine reading", as it often has no distinct points for the "reader" to find the relevant information, and it's difficult to sort the data from the "chaff".
AJAX api - "Asynchronous Java Script and XML" where the provider of the website has an interface to fetch certain data within to the web-browser - of course, if you "pretend" to be the web-browser, requesting the same type of information. You are relying on the website to have such an interface, but if it exists, the data is generally in a "more suitable form to be machine read" (typically XML, but not always).
JSON api - "Java Script Object Notation" is a similar solution to AJAX - like XML, JSON is a "human and machine readable format".
The latter two are definitely preferable, as the data coming back is meant for machine reading. The drawback is that you need to have "server side cooperation". The good thing here is that Pandora does have a JSON API. The bad thing is that it seems to be hard to use... Here's one discussion on the subject:
Making JSON calls to Unoffical Pandora API
The main principle here is that you send some stuff to the webserver, and receive a reply with the requested information. Exactly how this is done depends on the language/programming environment. A popular C++ solution is libcurl.
There is a Ruby Client here, using the JSON interface
https://github.com/nixme/pandora_client
A C# implementation to interface with Pandora is here:
http://pandoraunleashed.googlecode.com/svn/trunk/PandoraUnleashed/Pandora.cs
Unfortunately, I can't find any direct reference to "listing likes".

SQL 2008: returning data rows as JSON?

I think this question is like clay pidgeon shooting.. "pull... bang!" .. shot down.. but nevertheless, it's worth asking I believe.
Lots of JS frameworks etc use JSON these days, and for good reason I know. The classic question is "where to transform the data to JSON".
I understand that at some point in the pipeline, you have to convert the data to JSON, be it in the data access layer (I am looking at JSON.NET) or I believe in .NET 4.x there are methods to output/serialize as JSON.
So the question is:
Is it really a bad idea to contemplate a SQL function to output as JSON?
Qualifier:
I understand trying to output 1000's of rows like that isn't a good idea - in fact not really a good idea for web apps either way unless you really have to.
For my requirement, I need possibly 100 rows at a time...

The answer really is: it depends.
If your application is a small one that doesn't receive much use, then by all means do it in the database. The thing to bear in mind though is, what happens when your application is being used by 10x as many users in 12 months time?
If it makes it quick, simple and easy to implement JSON encoding in your stored procedures, rather than in your web code and allows you to get your app out and in use, then that's clearly the way to go. That said, it really doesn't take that much work to do it "properly" with solutions that have been suggested in other answers.
The long and short of it is, take the solution that best fits your current needs, whilst thinking about the impact it'll have if you need to change it in the future.

This is why [WebMethod] (WebMethodAttribute) exists.

Best to load the data to to the piece of program and then return it as JSON.
.NET 4 has a support for returning json, and i did it as a part of one ASP.NET MVC site and it was fairly simple and straightforward.
I recommend to move the transformation out of the sql server

I agree with the other respondents that this is better done in your application code. However... this is theoretically possible using SQL Server's ability to include CLR assemblies in the database using create assembly syntax. The choice is really yours. You could create an assembly to do the translation in .net, define that assembly to SQL Server and then use contained method(s) to serialize to JSON as return values from your stored procedures...

Better to load it using your standard data access technique and then convert to JSON. You can then use it in standard objects in .NET as well as your client side javascript.

If using .net mvc you serialize your results in your controllers and output a JsonResult, there's a method Controller.Json() that does this for you. If using webforms an http handler and the JavascriptSerializer class would be the way to go.

Hey thanks for all the responses.. it still amazes me how many people out there have the time to help.
All very good points, and certainly confirmed my feeling of letting the app/layer do the conversion work - as the glue between the actual data and frontend. I guess I haven't kept up too much with MVC or SQL-2008, and so was unsure if there were some nuggets worth tracking down.
As it worked out (following some links posted here, and further fishing) I have opted to do the following for the time being (stuck back using .NET 3.5 and no MVC right now..):
Getting the SQL data as a datatable/datareader
Using a simple datatable > collection (dictionary) conversion for a serializable list
Because right now I am using an ASHX page to act as the broker to the javascript (i.e.
via a JQuery AJAX call), within my ASHX page I have:
context.Response.ContentType = "application/json";
System.Web.Script.Serialization.JavaScriptSerializer json = new System.Web.Script.Serialization.JavaScriptSerializer();
I can then issue: json.serialize(<>)
Might seem a bit backward, but it works fine.. and the main caveat is that it is not ever returning huge amounts of data at a time.
Once again, thanks for all the repsonses!

Fastest PDF->text library for .NET project

I'm trying to create an application which will be basically a catalogue of my PDF collection. We are talking about 15-20GBs containing tens of thousands of PDFs. I am also planning to include a full-text search mechanism. I will be using Lucene.NET for search (actually, NHibernate.Search), and a library for PDF->text conversion. Which would be the best choice? I was considering these:
PDFBox
pdftotext (from xpdf) via c# wrapper
iTextSharp
Edit: Other good option seems to be using iFilters. How well (speed/quality) would they perform (Foxit/Adobe) in comparison to these libraries?
Commercial libraries are probably out of the question, as it is my private project and I don't really have a budget for commercial solutions - although PDFTextStream looks really nice.
From what I've read pdftotext is a lot faster than PDFBox. How well performs iTextSharp in comparison to pdftotext? Or maybe someone can recommend other good solutions?

If it is for a private project, is this going to an ongoing conversion process? E.g. after you've converted the 15-20Gb are you going to still be converting?
The reason I ask is because I'm trying to work out whether speed is your primary issue. If it were me, for example, converting a library of books, my primary concern would be the quality of the conversion, not the speed. I could always leave the conversion over-night/-weekend if necessary!

The desktop version of Foxit's PDF IFilter is free
http://www.foxitsoftware.com/pdf/ifilter/
It will automatically do the indexing and searching, but perhaps their index is available for you to use as well. If you are planning to use it in an application you sell or distribute, then I guess it won't be a good choice, but if it's just for yourself, then it might work.
The Foxit code is at the core my company's PDF Reader/Text Extraction library, which wouldn't be appropriate for your project, but I can vouch for the speed and quality of the results of the underlying Foxit engine.

I guess using any library is fine, but do you want to search all these 20Gb files at time of search?
For full text search, best is you can create a database, something like sqlite or any local database on client machine, read all pdf and convert them to plain text and store it in database when they are added first.
Your database can simpley be as following..
Table: PDFFiles
PDFFileID
PDFFilePath
PDFTitle
PDFAuthor
PDFKeywords
PDFFullText....
and you can search this table when you need to, this way your search will be extremely fast independent of type of pdf, plus this conversion from pdf to database is needed only when pdf is added to your collection or modified.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.