Sometimes in reading data involving many names and figures
reading line by line needs some serious concatenation work ,
is there any method that would allow me to read a specific data type? like the good old fscanf in C?
Thanks
Sara
I don't know if this is exactly what you are looking for, but the FileHelpers library has many utilities to help with reading fixed length and delimited text files.
From the site:
You can strong type your flat file (fixed or delimited) simply describing a class that maps to each record and later read/write your file as an strong typed .NET array
If what you are looking for is getting strongly typed objects from files, you should look at serialization and deserialization in the .net framework. These allow you to save object state into a file and read them back at a later time.
The System.IO namespace got everything you need.
You could use BinaryReader to read the data from the file, assuming the data is written out in binary form. For example, an Int32 would be written out as 4 bytes ie. the binary representation and not the text representation of the integer.
The usefullness of BinaryReader would depend on the control you have of the code generating the file, ie. can you write the data out using a BinaryWriter and of course how human readable you need the file to be.
Related
Ok. I know how to use Serialization and such, but since that only applies to Objects that's been marked with Serialization attribute - how can I for example load data and use it in an application without using Serialization? Say a data file.
Or, create a datacontainer with serialization that holds files not serialized.
Methods I've used is Binary Serialization and XML Serialization. Any other ways that can load unknown data and perhaps somehow use it in C#?
JSON serialization using JSON.NET
This eats everything! Including anonymous types.
Edit
I know you said "you don't want serialization", but based on your statement "[...]Objects that's been marked with Serialization attribute", I believe you didn't try JSON serialization using JSON.NET!
Maybe a definition of terms is in order; serialization is "the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment". Pretty much any method of converting "volatile" memory into persistent data and back is "serialization", so even if you roll your own scheme to do it, you're "serializing".
That said, it sounds like you simply don't want to use .NET binary serialization. That's actually the right idea; binary serialization is simple, but very code- and environment-dependent. Moving a serializable class to a different namespace, or serializing a file using the Microsoft CLR and then trying to deserialize it in Mono, can break binary serialization.
First and foremost, you MUST be able to determine what type of object you should try to create based on the file. You simply cannot open some "random" file and expect to be able to get anything meaningful out of it without knowing how the data is structured within the file. The easiest way is for the file to tell you, by specifying the type name of the object it was created from (which you will hopefully have available in your codebase). Most built-in serializers do it this way. Other ways the file can inform consumers of its format include file, row and/or field header codes (very common in older standards as they economize on file size) and extension/MIME type.
With that sorted out, deserialization can take place. If the file was serialized using a built-in serializer, simply use that, but if it's an older format (CSV, fixed-length) then you will have to parse the file, line by line, into objects representing lines, collected within a main object representing the file.
Have a look at the ETL (Extract-Transform-Load) process pattern. This is a modular, scaleable architecture pattern for taking files and turning them into data the program can work with:
Extract - This part of the system is pointed at the filesystem, or other incoming "pipe" for raw data, and its job is to open the file, extract the data into a very basic object format that can be further manipulated, and put those objects into an in-memory "queue" for the Transform step. The goal is to get data from the pipe as fast and efficiently as possible, but you are required at this point to have some knowledge of the data you are working with so that you can effectively encapsulate it for further processing; actually turning the data into the format you really want happens later.
Transform - This part of the system takes the extracted data, and performs the logic that will put that data into a hydrated object from your codebase. This is where, given information from the Extract step about the type of file the data was extracted from, you instantiate a domain object that represents the data model, slice the raw data up into the chunks that will be stored as data members, perform any type conversions (data you get from a file is usually either in string format or in raw bits and must be marshalled or otherwise converted into data types that better represent the concept of the data), and validate that the internal structure of the new object is consistent and meets known business rules. Hydrated, valid objects are placed in an output queue to be processed by the Load step.
Load - This step takes the hydrated, valid business objects from the Transform step and persists them into the data store that is used by your system (such as a SQL database or the program's native flat file format).
Well, the old fashioned way was to use stream access operations and read out the data you wanted. This way you could read/write to pretty much any file.
Serialization simply automates this process based on some contract.
Based on your comment, I'm guessing that your requirement is to read any kind of file without having a contract in the first place.
Let's say you have a raw file with the first byte specifying the length of a string and the next set of bytes representing the string;
For example, 5 | H | e | l | l | o
var stream = File.Open(filename);
var length = stream.ReadByte();
byte[] b = new byte[length];
stream.Read(b, 0, length);
var string = Encoding.ASCII.GetString(b);
Binary I/O is as raw as it gets.
Check MSDN for more.
I'm trying to write a simple reader for AutoCAD's DWG files in .NET. I don't actually need to access all data in the file so the complexity that would otherwise be involved in writing a reader/writer for the whole file format is not an issue.
I've managed to read in the basics, such as the version, all the header data, the section locator records, but am having problems with reading the actual sections.
The problem seems to stem from the fact that the format uses a custom method of storing some data types. I'm going by the specs here:
http://www.opendesign.com/files/guestdownloads/OpenDesign_Specification_for_.dwg_files.pdf
Specifically, the types that depend on reading in of individual bits are the types I'm struggling to read. A large part of the problem seems to be that C#'s BinaryReader only lets you read in whole bytes at a time, when in fact I believe I need the ability to read in individual bits and not simply 8 bits or a multiple of at a time.
It could be that I'm misunderstanding the spec and how to interpret it, but if anyone could clarify how I might go about reading in individual bits from a stream, or even how to read in some of the variables types in the above spec that require more complex manipulation of bits than simply reading in full bytes then that'd be excellent.
I do realise there are commercial libraries out there for this, but the price is simply too high on all of them to be justifiable for the task at hand.
Any help much appreciated.
You can always use BitArray class to do bit wise manipulation. So you read bytes from file and load them into BitArray and then access individual bits.
For the price of any of those libraries you definitely cannot develop something stable yourself. How much time did you spend so far?
i am writing a project in C#
i wanna save a class in binary file and then read that file it in C
i wanna know how can i do it without serialize and deserialize
please help me
You are talking about cross-platform serialization.
A few options:
serialize it as text (xml, json); text is still binary, after all - and simple
serialize it manually
use a third party cross-platform serializer
But whatever you do, don't use BinaryFormatter. The reason I stress this is that it is probably the first thing you'll see if you search for C# binary serialization, but is entirely inappropriate for your purposes. The format is proprietary, and includes type information that only makes sense from .NET (not really from unmanaged C).
I'm quite attached to "protocol buffers" as a serialization API, and there are both C# and C versions here.
Saving the state of an object to a file means serializing it.
Reading the state of an object from a file means deserializing it.
You have to use serialization/deserialization to do what you want.
Since you need to do this across different languages, using the built in serializers would probably not be very helpful.
You can use one of the XML serializers for the C# part, but then would have to parse the XML out in c.
Another option is to write your own custom serizlizer to do this. This way you have full control over the file format.
Do you want to save a class? This is not possible since classes are compiled into assemblies (exe,dll) in .net.
I think what you want is to save the state of an object or better suited, a struct to a file.
You can write all fields of the class to a file using the BinaryWriter class. Also you can have a look at this.
I presume you mean you want to have a C# application write a file. Then have a separate C/C++ application read that file? On that assumption, in C# you'll need to look into the System.IO namespace, and specifically the FileStream class.
On a side note, I'd really recommend writing a C# Class Library project that handles this read/write via .NET serialization classes and then invoke it nativly from your C# code, and use COM ([assembly: ComVisible(true)]) to access your .NET code from your C/C++ code.
I want to read a binary file which was created outside of my program. One obvious way in C# to read a binary file is to define class representing the file and then use a BinaryReader and read from the file via the Read* methods and assign the return values to the class properties.
What I don't like with the approach is that I manually have to write code that reads the file, although the defined structure represents how the file is stored. I also have to keep the order correct when I read.
After looking a bit around I came across the BinaryFormatter which can automatically serialize and deserialze object in binary format. One great advantage would be that I can read and also write the file without creating additional code. However I wonder if this approach is good for files created from other programs on not just serialized .NET objects. Take for example a graphics format file like BMP. Would it be a good idea to read the file with a BinaryFormatter or is it better to manually and write via BinaryReader and BinaryWriter? Or are there any other approaches which suit better? I'am not looking for concrete examples but just for an advice what is the best way to implement that.
You'd have to be very VERY lucky to find an external file format that happened to map perfectly to the format the BinaryFormatter puts out. The BinaryFormatter obviously adds information on the types/things you're serializing, as well as the data itself, whereas a "normal" binary file format will generally be "these bytes are this, then these bytes are this".
When I've done this in the past (reading SWF headers springs to mind recently) I've always just used a file stream and processed/mapped it manually.
I use the excellent FileHelpers library when I work with text data. It allows me to very easily dump text fields from a file or in-memory string into a class that represents the data.
In working with a big endian microcontroller-based system I need to read a serial data stream. In order to save space on the very limited microcontroller platform I need to write raw binary data which contains field of various multi-byte types (essentially just dumping a struct variable out the serial port).
I like the architecture of FileHelpers. I create a class that represents the data and tag it with attributes that tell the engine how to put data into the class. I can feed the engine a string representing a single record and get an deserialized representation of the data. However, this is different from object serialization in that the raw data is not delimited in any way, it's a simple binary fixed record format.
FileHelpers is probably not suitable for reading such binary data as it cannot handle the nulls that show up and* I suspect that there might be unicode issues (the engine takes input as a string, so I have to read bytes from the serial port and translate them into a unicode string before they go to my data converter classes). As an experiment I have set it up to read the binary stream and as long as I'm careful to not send nulls it works quite well so far. It is easy to set up new converters that read the raw data and account for endian foratting issues and such. It currently fails on nulls and cannot process multiple records (it expect a CRLF between records).
What I want to know is if anyone knows of an open-source library that works similarly to FileHelpers but that is designed to handle binary data.
I'm considering deriving something from FileHelpers to handle this task, but it seems like there ought to be something already available to do this.
*It turns out that it does not complain about nulls in the input stream. I had an unrelated bug in my test program that came up where I expected a problem with the nulls. Should have investigated a little deeper first!
I haven't used filehelpers, so I can't do a direct comparison; however, if you have an object-model that represents your objects, you could try protobuf-net; it is a binary serialization engine for .NET using Google's compact "protocol buffers" wire format. Much more efficient than things like xml, but without the need to write all your own serialization code.
Note that "protocol buffers" does include some very terse markers between fields (typically one byte); this adds a little padding, but greatly improves version tolerance. For "packed" data (i.e. blocks of ints, say, from an array) this can be omitted if desired.
So: if you just want a compact output, it might be good. If you need a specific output, probably less so.
Disclosure: I'm the author, so I'm biased; but it is free.
When I am fiddling with GPS data in the SIRFstarIII binary mode, I use the Python interactive prompt with the serial module to fetch the stream from the USB/serial port and the struct module to convert the bytes as needed (per some format defined by SIRF). Using the interactive prompt is very flexible because I can read the string to a variable, process it, view the results and try again if needed. After the prototyping stage is finished, I have the data format strings that I need to put into the final program.
Your question doesn't mention anything about why you have a C# tag. I understand FileHelpers is a C# library, but I that doesn't tell me what environment you are working in. There is an implementation of Python for .NET called IronPython.
I realize this answer might mean you have to learn a new language, but having an interactive prompt is a very powerful tool for any programmer.