LDIF Parser (C#)

LDIF Parser (C#) - c#

I am looking for an LDIF parser for C#. I am trying to parse an LDIF file so that I can check objects don't exist before adding them. Adding them when the already exist using ntdsSchemaAdd) causes an entry in the error logs.

A quick websearch revealed: http://wiki.github.com/skradel/Zetetic.Ldap/. They have provided a .net API.
From the page:
Zetetic.Ldap is a .NET library for
.NET 2 and above, which makes it
easier to work with directory servers
(like Active Directory, ADAM, Red Hat
Directory Server, and others). Some of
the key features of Zetetic.Ldap are:
1.LDIF file parsing and generation – Read and write the file format used
for moving data around between
directory systems
2.LDAP Entry-oriented API with change tracking – Create and modify directory
objects in a more natural way
3.LDAP Schema interrogation – Quick programmatic access to the kinds of
objects and fields your directory
server understands. Learn if an
attribute is a string, a number, a
date, etc., without lots of manual
research and re-parsing
4.LDIF Pivoter – Turn an LDIF file into a (comma or tab-delimited) flat
file for analysis or loading into
systems that don’t speak LDIF We built
the Zetetic.Ldap library to make
directory projects and programming
faster and easier, and release it here
in the hopes that others will find it
useful too. As far as we know, this is
the only .NET library that really
understands the LDIF specification.
Download link: http://github.com/downloads/skradel/Zetetic.Ldap/Zetetic.Ldap_20090831.zip

I would parse it myself.
If you look at the LDIF RFC for the EBNF, you'll see that it's not a very complex grammar.
I've parsed a large amount of LDIF before using Regexes reliably. Though your mileage may vary.

Related

excel to .XBRL converter

I am currently working on new programs that converts Excel Files to XBRL using the C# language (or any programming language).
As I read over the net, XBRL is following XML schema and it's where financial system are heading to, where all this institution wanted to have their own standard language.
There are ready to use software for this in the market but the client we have today requires us to write a program just for them.
Has anyone here in the forum had written a program similar to my need?

XBRL has their own (lengthy) format specification, which you can find here. Excel isn't innately compatible with that, unless you have a specific template or usage pattern in mind. There have been a number of "Excel to XBRL" scripts, but it appears as if they're all just a custom Excel document with a script that can generate the compliant XBRL or iXBRL document.
It's likely that you're going to need to map the requirements of the customer to the XBRL output they need, and I'm not aware of a .NET library that does that as it stands. I hate to deliver the bad news, but you'll need to get familiar with the nitty gritty requirements of the XBRL format specification, understand the XBRL Taxonomy the client wants to use, and build the code from that point forward.
That being said, there are a number of libraries to de/serialize XML in .NET. Start with the System.XML and System.XML.Schema namespaces to define your XBRL schema and generate/read your XML files.
You could also look into AutoMapper to convert tab delimited/csv/excel data into your objects (and vice versa). Specifically, look into the "Flattening" functionality there.

What schema, database, searching libraries are good for storing thousands of book pages in c# app

I want to write a C# program to store some books with the total of 5000 pages. But there are a few important issues here that I need your help and advice:
The ability to search all of the books’ content is one of the most important and challenging features of the app. The time that is needed to search a word should be about the time required to search a word in Microsoft Word or a PDF doc (with the same size) or more.
What method should I employ for storing the books so that more suitable approaches to searching the content would be in hand? Relational DB, MongoDB, couchDB, etc. which one is preferred?
For the case of using Database, what kind of Schema and indexing is required and important?
Which method or algorithm or library is better to be used for searching the whole content of the books? Is it possible to use lucene or Solr in a standalone windows app or would traditional searching method be better?
The program should be customized in such a way that the publisher would be able to add their own book contents. How can I handle this feature (can I use XML)?
The users should be able to add one or more lines from the contents to their favorite list. What is the best way to deal with this?

I think Solr will be able to meet most of these requirements. For #1, you can easily develop schema in Solr to hold various information in different formats. Solr's Admin UI has an Analysis tab that will help you greatly in developing your schema because it allows you to test your changes on the fly with different types of data. It is a huge time saver because you don't have to create a bunch of test content and index it in to test it. Additionally, if the contents of the books are in binary format you can use Apache Tika to perform text extraction. Solr also has a number of other bells and whistles that you may find helpful, such as highlighting and user query spell suggestion.
For #2, Solr will support updates to content via JSON files that can be sent to the update handler for your collection. It also supports atomic updates which you may find useful. It seems that in your case, you may need some kind of a security solution to sit on top of Solr to prevent publishers from modifying each other's content, however you will most likely run into this issue regardless of the type of solution you will use.
For #3, I am not sure what you are really looking for here. I think that for content search and retrieval you will find Solr a good fit. For general user information storage and etc, you may need a different tool, since that is kind of outside of scope of what Solr is supposed to do.
Hope it helps.

Attaching arbitrary data to DirectoryInfo/FileInfo?

I have a site which is akin to SVN, but without the version control.. Users can upload and download to Projects, where each Project has a directory (with subdirs and files) on the server. What i'd like to do is attach further information to files, like who uploaded it, how many times its been downloaded, and so on. Is there a way to do this for FileInfo, or should I store this in a table where it associates itself with an absolute path or something? That way sounds dodgy and error prone :\

It is possible to append data to arbitrary files with NTFS (the default Windows filesystem, which I'm assuming you're using). You'd use alternate data streams. Microsoft uses this for extended metadata like author and summary information in Office documents.
Really, though, the database approach is reasonable, widely used, and much less error-prone, in my opinion. It's not really a good idea to be modifying the original file unless you're actually changing its content.

As Michael Petrotta points out, alternate data streams are a nifty idea. Here's a C# tutorial with code. Really though, a database is the way to go. SQL Compact and SQLite are fairly low-impact and straightforward to use.

Fastest PDF->text library for .NET project

I'm trying to create an application which will be basically a catalogue of my PDF collection. We are talking about 15-20GBs containing tens of thousands of PDFs. I am also planning to include a full-text search mechanism. I will be using Lucene.NET for search (actually, NHibernate.Search), and a library for PDF->text conversion. Which would be the best choice? I was considering these:
PDFBox
pdftotext (from xpdf) via c# wrapper
iTextSharp
Edit: Other good option seems to be using iFilters. How well (speed/quality) would they perform (Foxit/Adobe) in comparison to these libraries?
Commercial libraries are probably out of the question, as it is my private project and I don't really have a budget for commercial solutions - although PDFTextStream looks really nice.
From what I've read pdftotext is a lot faster than PDFBox. How well performs iTextSharp in comparison to pdftotext? Or maybe someone can recommend other good solutions?

If it is for a private project, is this going to an ongoing conversion process? E.g. after you've converted the 15-20Gb are you going to still be converting?
The reason I ask is because I'm trying to work out whether speed is your primary issue. If it were me, for example, converting a library of books, my primary concern would be the quality of the conversion, not the speed. I could always leave the conversion over-night/-weekend if necessary!

The desktop version of Foxit's PDF IFilter is free
http://www.foxitsoftware.com/pdf/ifilter/
It will automatically do the indexing and searching, but perhaps their index is available for you to use as well. If you are planning to use it in an application you sell or distribute, then I guess it won't be a good choice, but if it's just for yourself, then it might work.
The Foxit code is at the core my company's PDF Reader/Text Extraction library, which wouldn't be appropriate for your project, but I can vouch for the speed and quality of the results of the underlying Foxit engine.

I guess using any library is fine, but do you want to search all these 20Gb files at time of search?
For full text search, best is you can create a database, something like sqlite or any local database on client machine, read all pdf and convert them to plain text and store it in database when they are added first.
Your database can simpley be as following..
Table: PDFFiles
PDFFileID
PDFFilePath
PDFTitle
PDFAuthor
PDFKeywords
PDFFullText....
and you can search this table when you need to, this way your search will be extremely fast independent of type of pdf, plus this conversion from pdf to database is needed only when pdf is added to your collection or modified.

Access a settings/preferences file on a server

My application has historically used an ini file on the same file server as the data it consumes is located to store per user settings so that they roam if the user logs on from multiple computers. To do this we had a file that looked like:
[domain\username1]
value1=foo
value2=bar
[domain\username2]
value1=foo
value2=baz
For this release we're trying to migrate away from using ini files due to limitations in the win32 ini read/write functions without having to write a custom ini file parser.
I've looked at app.config and user settings files and neither appear to be suitable. The former needs to be in the same folder as the executable, and the latter doesn't provide any means to create new values at runtime.
Is there a built in option I'm missing, or is my best path to write a preferences class of my own and use the framework's XML serialization to write it out?

I have found that the fastest way here is to just create an XML file that does what you want, then use XSD.exe to create a class and serialize the data. It is fast, and a few lines of code and works quite well.

Have you not checked out or have heard of nini which is a third party ini handler. I found it quite easy to use and simple in reading/writing to ini file.
For your benefit, it would mean very little changes, and easier to use.
The conversion from ini to another format needs to be weighed up, like the code impact, ease of programming (nitpicky aside, changing the code to use xml may be easy but it is limiting in that you cannot write to it). What would be the benefit in ripping out the ini codes and replace it with xml is a question you have to decide?
There may well be a knock on effect such as having to change it and adapt the code...but... for the for-seeable time, sure, ini is a bit outdated and old, but it is still in use, I cannot see Microsoft dropping the ini API support as it is very much alive and in use behind the scenes for driver installation...think of inf files used to specify where the drivers go and how is it installed...it is here to stay as the manufacturers of drivers have adopted it and is de-facto standard way of driver distribution...
Hope this helps,
Best regards,
Tom.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LDIF Parser (C#) - c#

I am looking for an LDIF parser for C#. I am trying to parse an LDIF file so that I can check objects don't exist before adding them. Adding them when the already exist using ntdsSchemaAdd) causes an entry in the error logs.

I would parse it myself. If you look at the LDIF RFC for the EBNF, you'll see that it's not a very complex grammar. I've parsed a large amount of LDIF before using Regexes reliably. Though your mileage may vary.

Related

excel to .XBRL converter

What schema, database, searching libraries are good for storing thousands of book pages in c# app

Attaching arbitrary data to DirectoryInfo/FileInfo?

Fastest PDF->text library for .NET project

Access a settings/preferences file on a server

Categories

Resources