I want to search the google Api freebase. I want to get general amount of data. For example all Ids of songs, or films. I downloaded the data dumps gz file. I wonder what will be the best solution of parsing the file and getting the data I need. I am using .net c#.
There are a couple .NET libraries that can read the RDF format of the dumps:
SemWeb.NET
dotNetRdf
The data dumps are also formatted as tab separated values so you should be able to use any CSV parser to parse each line as a triple.
Make sure that you read through the developer docs on how the data dumps are formatted. Basically, each line forms a triple that has a subject, predicate and object. To get all the data about films you'll be looking for triples that have a predicate that starts with /film/.
Related
**
Hello
**
I'm creating a scheduling app that takes in 2 MS-Project .mpp files (master and updated) and converts the data into SQLite tables then compares them both and displays the results and allows you to write the changes you make back to the master file. I had issues with Microsoft Interop because I don't own Microsoft Project. Is MPXJ a viable solution? The documentation I've read on it doesn't have many examples. If so how would I read it in and read it back? Were using MS-project 2016
I know nothing of MPXJ, so sorry if I overlook a more straightforward answer. It sounds to me like another way of looking at your problem is you want to:
1. Parse a MS Project file (and then do it again) and store results in memory
2. Do some data manipulation and calculations of the in-memory project data
3. Put that data into a database
I think you're stuck at step 1 because without MS Project, you lack a parser; correct? There are other ways to parse a project file. The simplest may be to have your users first convert the files to a more open format (e.g. XML) when they save them from their instances of MS Project. Lacking that, there are certainly libraries out there that can parse a Project file. Try taking a look at Gantt Project, https://sourceforge.net/projects/ganttproject/ . Being open source, you could look at that parser as a starting point; I'm not a license expert, but you may even be able to re-use the code from there.
Good luck!
I'm doing a school project where i need hard data on twitch viewership. I found a great site, stats.twitchapps.com that has all of these data in charts.
However, I need the data in excel format to do statistical analysis on it for my class.
I have some background in C# programmin. I've been trying to scrape the data using C# and Json.Net), but i'm not having much luck....
Here is the php file that contains the chart: view-source:http://stats.twitchapps.com/categories.php
Can anyone point me in the right direction on how i might go about this?
The graph source data can be found here. It's an array of arrays with [JStimestamp, viewers]. I'm sure you'll find a way to get this into excel (e.g. convert the array to lines and then to a CSV format which can be imported into excel), although I'm not really sure why anyone would want to work with this amount of data in excel... Good luck with your project.
At the beginning I recommend you to visit this website. In the excel you can export data to csv, which can be loaded by $.get() method in jquery. Next step is parse your data to correct format (data array in series object) and return a chart.
I have a CSV file of data which I'm attempting to use in my Windows Phone 7 application.
I'm trying to figure out how to open and consume the data in my app. I've seen some examples around about using Linq to do this, but they use OleDB to open the CSV file.
Note: I want to deploy the data with my app, instead of using a Web Service, because it is cleaner and not that much data. If there is no coding way to do it, perhaps a way to convert to XML?
Nothing wrong with using light weight CSV input files.
You can use StreamReader.Readline to pull in a line at a time.
And String.Split to parse the comma seperated values into usable elements. For example:
string csv = "abc,123,def,456";
string[] elements = csv.Split(',');
foreach (string s in elements)
System.Diagnostics.Debug.WriteLine(s);
Still no third-party library for CSV-parsing in Windows Phone apps at the time of this answer...
If you need to parse CSV with quotes, see this answer.
I would like to parse AutoCAD's MText entity and extract the raw text. I see a pattern in the way the text is formatted. If this has already been solved, then I would not need to reinvent the wheel. I have searched online, but have not found sufficient information.
I am searching for any links or references on this subject.
Edit:
To further clarify, we are using the ODA (Open Design Aliance) libraries to access the DWG files. I am not familiar with this library. Another developer is using the library and extracting information from the files including MText entities. I am then provided with a file containing the MText text, which is what I am looking at. I am looking at the MText formatted text, which I have access to and am working with in C#.
Questions:
I asked the other developer if the ODA library provided a means to extract the raw text unformatted. His response was that it could, however that it would also result in the entity getting written back to the DWG file. I am interested in the raw text without affecting the original DWG file. Does ODA provide a way of extracting the raw text without altering the file?
I am interested in any documentation on the formatting rules of MText, so that I can consider writing a parser myself if necessary.
Is there anything out there to convert MText to RTF? I realize that RTF would not completely satisfy all formatting rules, but this could provide a satisfactory means of displaying the formatted text in a WinForms app. Given RTF I could also obtain the raw text.
This Forum thread includes a VB program to strip the control characters from the MText. The code indicates what should be done to strip each control character, so it should be straightforward to write something similar in C#.
Additionally, the documentation of the format codes is available in the AutoCAD documentation.
If you are using C# and the .NET interface, the Text property of the MText object provides the raw text:
MText mt;
...
string rawText = mt.Text;
If you want the formatting as well, the solution is different.
If you are parsing an AutoCAD file without AutoCAD, you need to specify what file type you are parsing. However, this question is basically a subset of the following questions:
Are there any libraries for parsing AutoCAD files?
Open source cad drawing (dwg) library in C#
.Net CAD component that can read/write dxf/ dwg files
Reading .DXF files
For DWG, the basic options are Open Design Alliance and AutoCAD RealDWG.
If this doesn't help, please provide more details as to exactly what you are trying to do.
If you are using C#, give the netDXF library a try.
I thought pseudo code should be like this:
DxfDocument dxf = new DxfDocument();
dxf = DxfDocument.Load(openFileDialog1.FileName);//load your file
//This extracts the raw text of your first text obj
dxf.MTexts[0].PlainText;
Currently we are saving files (PDF, DOC) into the database as BLOB fields. I would like to be able to retrieve the raw text of the file to be able to manipulate it for hit-highlighting and other functions.
Does anyone know of a simple way to either parse out the files and save the raw text on save, either via SQL or .net code. I have found that Adobe has a filtdump utility that will convert the PDF to text. Filtdump seems to be a command line tool, and i don't see a way to use a file stream. And what would the extractor be for Office documents and other file types?
-or-
Is there a way to pull out the raw text from the SQL Full text index, without using 3rd party filters?
Note i am trying to build a .net & MSSql solution without having to use a third party tool such as Lucene
If it isn't absolutely necessary to stream directly from SQL Server into your app, the hard part is parsing the PDF or DOC file formats.
The iTextSharp library will give you access to the innards of a PDF file:
http://itextsharp.sourceforge.net/
Here's a commercial product that claims to parse Word docs:
Aspose.Words
Edited to add:
I think you're also asking if there are ways to make SQL Server Full-text Indexing do the work for you by adding IFilters. This sounds like a good idea. I haven't done this myself, but MS has apparently supported a Word filter for a long time, and now Adobe has released a (free) PDF filter. There's a lot of information here:
Filter Central
10 Ways to Optimize SQL Server Full-text Indexing
SQL Server Full Text Search: Language Features - a little out of date but easy to understand.
SQL Server Full-Text Search feature uses IFilters for extracting plain text from PDF or Office file formats. You can install IFilters on your server or if your code is running on the same machine as SQL Server you're already have it.
Here is an article which shows how to use IFilters from .NET: http://www.codeproject.com/KB/cs/IFilter.aspx
You could from your C# application open the .doc file and save it as text and put both the text and .doc document into the database.
If you are using SQL 2008, then you could consider using the new FILESTREAM feature.
Your data is stored in a varbinary(max) column, but you can also access the raw data via a regular Win32 handle.
Here's some sample code showing how to get the handle.
I had this same issue... I solved it by adding the following to my application:
EPocalipse.IFilter.dll (for everything -but- Office 2007
documents, due to 64x Windows issues)
OpenXML SDK 2.0 (for Office 2007 Documents)
I use these to grab the plain text and then store it in the database alongside the binary data. Keep in mind that I am certainly not an expert, so there may be a better way to do this, but this works for everything but "Quick Save" pre-2007 Word Documents, which apparently aren't read by iFilters. I just have my users resave the document if that error occurs, and everything works fine.
Let me know if you'd like some sample code... I would post it right now, but it's a bit long.