How do you suggest I approach this unique problem? - c#

I have a website where I allow businesses to register what products they sell individually. Then a consumer can online and search for a product and receive a list of all the shops where it's currently selling.
Although they can upload one product at a time, I want to allow businesses to mass upload things they offer.
I was thinking of using a excel spreadsheet. Have them download the template, and then have them upload the filled in excel sheet.
Others have suggested telling them to create a CSV file, but that is counter-intuitive in my honest opinion. Most likely a secretary will be creating the product sheets and she won't have a clue about what a CSV is.
What is the best way to approach this?

Well, it partly depends on the businesses. If they are medium or large businesses, they'd probably rather submit the data via a webservice anyway - then they don't have to get a human involved at all, after the initial development. They can write an application to periodically suck information from their (inevitable) database of products, and post to your web service.
If you're talking about very small companies without their own IT departments, that's less feasible, and either Excel or CSV would be a better approach. (As Caladain says, it's pretty simple to export to CSV... but you should try from a number of different spreadsheet programs as they may well have different subtleties in their export format. Things like text encoding will be important as well.)
But here's a novel idea... how about you ask some sample companies what they would like you to do? Presumably you have some companies in mind already - if you don't, it's potentially going to be pretty hard to make sure you're really building the right thing.
Find out how they already store their product list, and how they'd want to upload it to you. Then consider how difficult that would be, and possibly go back to them with something which is almost as easy for them, but a lot easier for you to implement, etc.

While I personally don't like Excel very much, it seems to be the best accepted format to do such things (involving a manual process).
My experience is that CSV breaks easily, for instance it uses the regional settings to determine the separator which can cause incompatibilities on either the client or the server side. Also, many people just save the file in any Excel format because they just don't know the difference.
Creating the files can be pretty easily done with some XSLT (e.g. create XMLSS format files, which are "XML Spreadsheet 2003" format).
You may also want to have a look at the Excel Data Reader on Codeplex for parsing the files.

Reading in an Excel file is actually pretty easy with ODBC. Tutorial on it.

Related

What schema, database, searching libraries are good for storing thousands of book pages in c# app

I want to write a C# program to store some books with the total of 5000 pages. But there are a few important issues here that I need your help and advice:
The ability to search all of the books’ content is one of the most important and challenging features of the app. The time that is needed to search a word should be about the time required to search a word in Microsoft Word or a PDF doc (with the same size) or more.
What method should I employ for storing the books so that more suitable approaches to searching the content would be in hand? Relational DB, MongoDB, couchDB, etc. which one is preferred?
For the case of using Database, what kind of Schema and indexing is required and important?
Which method or algorithm or library is better to be used for searching the whole content of the books? Is it possible to use lucene or Solr in a standalone windows app or would traditional searching method be better?
The program should be customized in such a way that the publisher would be able to add their own book contents. How can I handle this feature (can I use XML)?
The users should be able to add one or more lines from the contents to their favorite list. What is the best way to deal with this?
I think Solr will be able to meet most of these requirements. For #1, you can easily develop schema in Solr to hold various information in different formats. Solr's Admin UI has an Analysis tab that will help you greatly in developing your schema because it allows you to test your changes on the fly with different types of data. It is a huge time saver because you don't have to create a bunch of test content and index it in to test it. Additionally, if the contents of the books are in binary format you can use Apache Tika to perform text extraction. Solr also has a number of other bells and whistles that you may find helpful, such as highlighting and user query spell suggestion.
For #2, Solr will support updates to content via JSON files that can be sent to the update handler for your collection. It also supports atomic updates which you may find useful. It seems that in your case, you may need some kind of a security solution to sit on top of Solr to prevent publishers from modifying each other's content, however you will most likely run into this issue regardless of the type of solution you will use.
For #3, I am not sure what you are really looking for here. I think that for content search and retrieval you will find Solr a good fit. For general user information storage and etc, you may need a different tool, since that is kind of outside of scope of what Solr is supposed to do.
Hope it helps.

Making data in an XML not able to be edited via a text editor

I'm currently following a tutorial series for a Tile Engine which uses XML files to store conversations between NPCs. A topic it doesn't appear to cover (I have only quickly glanced through the subsequent videos) is how to prevent the user from either altering or knowing in advance what the NPC is going to say by opening the XML file easily with a generic text editor.
The 2nd point of being able to read future conversations is not a real issue but something I wanted to think about, so if that's hard to implement I am not too fussed at this point.
How would I go about making the XML uneditable? I know vaguely about CRC32's which can check file integrity which may be useful and I also think there might be better ways to go about that (i.e. not with a CRC32).
The most extreme action I can think of would be to create my own arbitrary encoding for the conversation data, but the usefulness of XML files deters me from that slightly, and with the tutorials I'm following teaching me a lot things I don't know, I would prefer not to defer too far away from them!
Just looking for a direction really, thanks!
Xml is in its fundamentals an open format, so I mean there is not way how to make xml uneditable.
But you can have a copy of xml document (or some of fingerprint of xml) on your server (or on endpoints of NPC conversation) and then you can compare if xml document was edited or no.
If document was edited, you cas replace it with backup version or say to endpoints, that xml document was corrupted...
Historically, many games wrap multiple resources into a single binary file.
You might put it in a ZIP file (and maybe change the file extension). That would allow you to avoid having an XML file with an obvious name as a temptation for your users :).
Ultimately, you're asking something similar to the DRM question. I don't know whether your platform has an answer to that. (E.g., "using RSA encryption" is not secure as such; your program still has to decrypt the data at some point using the appropriate key, etc).

What library to use for CSV import in c#

I've looked at FileHelpers v2.0 but there is a serious problem woth that. I cannot define a class that maps to the record in the source/detination file.
The reason is I don't know what file I'm going to get. A big part of my program is mapping the file's fields to the database's fields... I don't know how many fields there wil be, nor wich will need to be imported.
I have no intention on rolling my own lib, especially since I have no control over the files that are going to be fed to my program.
Any solutions tot his?
Dennis
Check out the Fast CSV reader on the CodeProject. It helped me with my project a while ago. Its really easy to use, and is quite good.
You can use ADO.NET to directly read the .CSV file into a DataTable. If you don't know how many fields will exist in advance, this can be a useful means of working with the data. This also has the advantage of not requiring any external libraries.
For details, please see Deborah Kurata's article on the subject.
StreamReader has been fast enough for me for pretty much every text file, though you are pretty screwed if you cant even guarantee value ordering.

Fastest PDF->text library for .NET project

I'm trying to create an application which will be basically a catalogue of my PDF collection. We are talking about 15-20GBs containing tens of thousands of PDFs. I am also planning to include a full-text search mechanism. I will be using Lucene.NET for search (actually, NHibernate.Search), and a library for PDF->text conversion. Which would be the best choice? I was considering these:
PDFBox
pdftotext (from xpdf) via c# wrapper
iTextSharp
Edit: Other good option seems to be using iFilters. How well (speed/quality) would they perform (Foxit/Adobe) in comparison to these libraries?
Commercial libraries are probably out of the question, as it is my private project and I don't really have a budget for commercial solutions - although PDFTextStream looks really nice.
From what I've read pdftotext is a lot faster than PDFBox. How well performs iTextSharp in comparison to pdftotext? Or maybe someone can recommend other good solutions?
If it is for a private project, is this going to an ongoing conversion process? E.g. after you've converted the 15-20Gb are you going to still be converting?
The reason I ask is because I'm trying to work out whether speed is your primary issue. If it were me, for example, converting a library of books, my primary concern would be the quality of the conversion, not the speed. I could always leave the conversion over-night/-weekend if necessary!
The desktop version of Foxit's PDF IFilter is free
http://www.foxitsoftware.com/pdf/ifilter/
It will automatically do the indexing and searching, but perhaps their index is available for you to use as well. If you are planning to use it in an application you sell or distribute, then I guess it won't be a good choice, but if it's just for yourself, then it might work.
The Foxit code is at the core my company's PDF Reader/Text Extraction library, which wouldn't be appropriate for your project, but I can vouch for the speed and quality of the results of the underlying Foxit engine.
I guess using any library is fine, but do you want to search all these 20Gb files at time of search?
For full text search, best is you can create a database, something like sqlite or any local database on client machine, read all pdf and convert them to plain text and store it in database when they are added first.
Your database can simpley be as following..
Table: PDFFiles
PDFFileID
PDFFilePath
PDFTitle
PDFAuthor
PDFKeywords
PDFFullText....
and you can search this table when you need to, this way your search will be extremely fast independent of type of pdf, plus this conversion from pdf to database is needed only when pdf is added to your collection or modified.

How do I store a rating in a song?

I want to be able to store information about a song that has been opened using my application. I would like for the user to be able to give the song a rating and this rating be loaded every time the users opens that file using my application.
I also need to know whether I should store the ratings in a database or an xml file.
C# ID3 Library is a .Net class library for editing id3 tags (v1-2.4). I would store the ratings directly into the comments section of the mp3 since id3v1 does not have many of the storage features that id3v2 does. If you want to store additional information for each mp3, what about placing a unique identifier on the mp3 and then having that do a database lookup?
I would be cautious about adding custom tags to mp3s as it is an easy way to ruin a large library. Also, I have gone down this road before and while I enjoyed the programming knowledge that came out of it, trying something like the iTunes SDK or Last FM might be a better route.
I would use a single-file, zero-config database. SQL Server Compact in your case.
I don't think XML is a good idea. XML shines in data interchange and storing very small amounts of information. In this case a user may rate thousands of tracks ( I have personally in online radios that allow ratings), and you may have lots of other information to store about the track.
Export and import using XML export procedures if you have to. Don't use it as your main datastore.
I would store it in a file as it is easier to keep with the mp3 file itself. If all you're doing is storing ratings, would you consider setting the ID3 rating field instead?
For this type of very simple storage I don't think it really matters all that much. The pro's of XML is its very easy to deploy and its editable outside of your app. the con's are, its editible outside your application (could be good, could be bad, depends on your situation)
Maybe another option (just because you can ;-) is an OODBMS, check out DB4Objects, its seriously addictive and very, very cool.
As mentioned earlier it is better to store such information in media file itself. And my suggestion is to use TagLib# lib for this (best media metadata lib I can find). Very powerful and easy to use.
I would store the ratings in a XML file, that way it's easy to edit from the outside, easy to read in .NET and you don't have to worry about shipping a database for something simple with you application.
Something like this might work for you:
<Songs>
<Song Title="{SongTitle}">
<Path>{Song path}</Path>
<Rating>3</Rating>
</Song>
</Songs>
If the song format supports suitable meta data (eg. MP3), then follow Kevin's advice of using the meta data. This is by far the best way of doing it, and it is what the meta data is intended for.
If not, then it really depends on your application. If you want to share the rating information - especially over a web service, then I would go for XML: it would be trivial to supply your XML listings as one big feed, for example.
XML (or most other text formats) also have the advantage that they can be easily edited by a human in a text editor.
The database would have its advantages if you had a more closed system, you wanted speed and fast indexing, and/or have other tables you might want to store as well (eg. data about albums and bands).

Categories