How to include to build at .NET parsed csv?

How to include to build at .NET parsed csv? - c#

I have translation file with a 13000 of lines. Now at starting the app I read it from manifest resource
var resourceStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("filename.csv")
and parse them via CsvParser.
It's slow operation (takes ~2seconds). I am looking for ways to pre-parse it at build time, so I can access it like that:
var lines = SomeCode.ParsedLines;
Any recommendations how can I do that? I could just write a gigantic .cs file like
"ParsedLines= new string[,]{{"title1","title2"},{"word1","word2"}}"
but the problem is that the .csv file is frequently modified. My best guess is to create a code generator that will create this .cs file at each build, but I am wondering if there are any better approaches.

This is one of the textbook use cases for source generators. Using a source generator you can parse your csv file at the build time and generate source for a class which will be compiled in the next steps.
Another useful article - introducing C# source generators. Also potentially you can find useful - source generators cookbook, my sandbox source generator project.
Also you can try looking into processing the csv file manually without CsvHelper (since you control it and you are sure about formatting, escaping, etc.) via using standard File.ReadLines or via System.Pipelines to improve performance.

Related

Processing large XML files using .NET 3.5

What is the "recommended" approach for processing very large XML files in .NET 3.5?
For writing, I want to generate an element at a time then append to a file.
For reading, I would likewise want to read an element at a time (in the same order as written).
I have a few ideas how to do it using strings and File.Append, but does .NET 3.5 provide XML Api's for dealing with arbitrarily large XML files?

Without going into specifics this isn't easy to answer. .NET offers different methods to process XML files:
XmlDocument creates a DOM, supports XPath queries but loads the entire XML file into memory.
XElement/XDocument has support for LINQ and also reads the entire XML file into memory.
XmlReader is a forward-only reader. It does not read the entire file into memory.
XmlWriter is just like the XmlReader, except for writing
Based on what you say an XmlReader/XmlWriter combination seems like the best approach.

As Dirk said, using an XmlWriter/XmlReader combo sounds like the best approach. It can be very lengthy and if your XML file is fairly complex it gets very unwieldy. I had to do something similar recently with some strict memory constraints. My SO question might come in handy.
But personally, I found this method here on MSDN blogs to be very easy to implement and it neatly handles appending to the end of the XML file without fragments.

Try to make an *.xsd file out of your *.xml. You can than generate *.cs file from *.xsd file. After that load you *.xml file to your object. It should take less memory than whole file.
There is a plugin for VS2010 that gives option to generate *.cs file from *.xsd. It is called XSD2Code. In that plugin you have an option to decorate properties for serialization. For your *.xsd file named Settings you would get Settings.cs. You would than do something like this.
StreamReader str = new StreamReader("SomeFolder\\YourFile.xml");
XmlSerializer xmlSer = new XmlSerializer(typeof(TcpPostavke));
Settings m_settings = (Settings )xmlSer .Deserialize(str);
You can than query your list of objects with Linq.

Is there a XML to LINQ Generator?

I have an XML file that I want to base some unit tests off of. Currently I load the XML file from disk in the class initialize method. I would rather have this XML generated in the test instead of reading the file from disk. Are there any utilities that will automatically generate the LINQ to XML code to generate a given XML file?
Or are there better ways to do this? Is loading from disk OK for unit tests?

I would embed the XML file directly into the assembly - no need for a string resource or anything like that, just include it as an embedded resource (create a file, go to the properties in Visual Studio, and select "Embedded Resource").
Then you can read it using Assembly.GetManifestResourceStream, load the XML from that as you would any other stream, and you're away.
I've used this technique several times - it makes it a lot easier to see the data you're interested in.

Probably it's better to use some resource file, for example, a .resx file where you put the XML as a string resource. That's fast enough for a unit test and you don't have to do any magic. Reading from disk is not OK for various reasons (speed, need for configuration, etc.)

Access a settings/preferences file on a server

My application has historically used an ini file on the same file server as the data it consumes is located to store per user settings so that they roam if the user logs on from multiple computers. To do this we had a file that looked like:
[domain\username1]
value1=foo
value2=bar
[domain\username2]
value1=foo
value2=baz
For this release we're trying to migrate away from using ini files due to limitations in the win32 ini read/write functions without having to write a custom ini file parser.
I've looked at app.config and user settings files and neither appear to be suitable. The former needs to be in the same folder as the executable, and the latter doesn't provide any means to create new values at runtime.
Is there a built in option I'm missing, or is my best path to write a preferences class of my own and use the framework's XML serialization to write it out?

I have found that the fastest way here is to just create an XML file that does what you want, then use XSD.exe to create a class and serialize the data. It is fast, and a few lines of code and works quite well.

Have you not checked out or have heard of nini which is a third party ini handler. I found it quite easy to use and simple in reading/writing to ini file.
For your benefit, it would mean very little changes, and easier to use.
The conversion from ini to another format needs to be weighed up, like the code impact, ease of programming (nitpicky aside, changing the code to use xml may be easy but it is limiting in that you cannot write to it). What would be the benefit in ripping out the ini codes and replace it with xml is a question you have to decide?
There may well be a knock on effect such as having to change it and adapt the code...but... for the for-seeable time, sure, ini is a bit outdated and old, but it is still in use, I cannot see Microsoft dropping the ini API support as it is very much alive and in use behind the scenes for driver installation...think of inf files used to specify where the drivers go and how is it installed...it is here to stay as the manufacturers of drivers have adopted it and is de-facto standard way of driver distribution...
Hope this helps,
Best regards,
Tom.

LDIF Parser (C#)

I am looking for an LDIF parser for C#. I am trying to parse an LDIF file so that I can check objects don't exist before adding them. Adding them when the already exist using ntdsSchemaAdd) causes an entry in the error logs.

A quick websearch revealed: http://wiki.github.com/skradel/Zetetic.Ldap/. They have provided a .net API.
From the page:
Zetetic.Ldap is a .NET library for
.NET 2 and above, which makes it
easier to work with directory servers
(like Active Directory, ADAM, Red Hat
Directory Server, and others). Some of
the key features of Zetetic.Ldap are:
1.LDIF file parsing and generation – Read and write the file format used
for moving data around between
directory systems
2.LDAP Entry-oriented API with change tracking – Create and modify directory
objects in a more natural way
3.LDAP Schema interrogation – Quick programmatic access to the kinds of
objects and fields your directory
server understands. Learn if an
attribute is a string, a number, a
date, etc., without lots of manual
research and re-parsing
4.LDIF Pivoter – Turn an LDIF file into a (comma or tab-delimited) flat
file for analysis or loading into
systems that don’t speak LDIF We built
the Zetetic.Ldap library to make
directory projects and programming
faster and easier, and release it here
in the hopes that others will find it
useful too. As far as we know, this is
the only .NET library that really
understands the LDIF specification.
Download link: http://github.com/downloads/skradel/Zetetic.Ldap/Zetetic.Ldap_20090831.zip

I would parse it myself.
If you look at the LDIF RFC for the EBNF, you'll see that it's not a very complex grammar.
I've parsed a large amount of LDIF before using Regexes reliably. Though your mileage may vary.

C# Code Generation

I am looking at creating a small class generator for a project. I have been reading about CodeDOM so it the semantics of creating the classes does not appear to be an issue, but am unsure oh how to best integrate the generation into the development and deployment process.
How should I trigger the creation of the classes? I have read it should be part of the build process, how should I do this?
Where should the classes be created? I read that the files should not be edited by hand, and never checked into source control. Should I even worry about this and just generate the classes into the same directory as the generator engine?

Take a look at T4 templates (it's built in to VS2008). It allows you to create "template" classes that generate code for you. Oleg Sych is an invaluable resource for this.
Link for Oleg's tutorial on code generation.

The answers to your question depend partly on the purpose of your generated classes.
If the classes are generated as a part of the development, they should be generated as text files and checked into your SCM like any other class.
If your classes are generated dynamically at runtime as a part of the operation of your system, I wouldn't use the CodeDOM at all. I'd use Reflection.

I know of the presence of T4 templates (and know many people use them), but I have not used them myself. Aside from those, you have two main options:
Use a SingleFileGenerator to transform the source right inside the project. Whenever you save the document that you edit, it will automatically regenerate the code file. If you use source control, the generated file will be checked in as part of the project. There are a few limitations with this:
You can only generate one output for each input.
Since you can't control the order in which files are generated, and the files are not generated at build time, your output can only effectively be derived from a single input file.
The single file generator must be installed on the developer's machine if they plan to edit the input file. Since the generated code is in source control, if they don't edit the input then they won't need to regenerate the output.
Since the output is generated only when the input is saved, the output shouldn't depend on any state other than the exact contents of the input file (even the system clock).
Generate code as part of the build. For this, you write an MSBuild targets file. For this, you have full control of input(s) and output(s) so dependencies can be handled. System state can be treated as an input dependency when necessary, but be remember that every build that requires code generation takes longer than a build which uses a previouly generated result. The results (generated source files) are generally placed in the obj directory and added to the list of inputs going to csc (the C# compiler). Limitations of this method:
It's more difficult to write a targets file than a SingleFileGenerator.
The build depends on generating the output, regardless of whether the user will be editing the input.
Since the generated code is not part of the project, it's a little more difficult to view the generated code for things like setting breakpoints.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.