I've been looking on certain sites for some time now, but I cant seem to find anything usable about file formats.
There is a certain file format on my computer, which I want to re-create to make add-ons for a program. Unfortunatly I would be the first to do so for that certain format, which makes it all the more hard. There are programs to ádd information to the file, but those programs are not open-source unfortunatly. But that does mean it's possible to figure out the file format somehow.
The closest I came to finding usable information about re-creating a file format was, "open it in notepad or a hex editor, and see if you can find anything usable"..
This certain file format contains information, so nothing like music files or images in case you'r wondering.
I'm just wondering if there is any guide on how to create a file format, or figuring out how an existing file format works. I believe this sort of format is called a Tabulated data format?
It really does depend on the file format.
Ideally, you find some documentation on how the file works, and use that. This is easy if the file uses a public format, so for HTML files or PNG files you can easily find that information. Proprietary formats often have published spec's too, or at least a publicly available API for manipulating them, depending on the company's policy on actively encouraging this sort of extension.
Next best is using examples of working code (whether published source or reverse engineered in itself) that deal with the file as a reference implementation.
Otherwise, reverse engineering is as good as you can do. Opening it in notepad and a hex editor (even with a binary format, looking at it parsed as text can tell you something; even with a text-based format, looking at it in a hex editor can tell you if they are making use of non-printable characters) is indeed the way to go. It's a detective job and while sometimes easy, often very hard, esp. since you may miss ways they deal with edge-cases that aren't hit in the samples you use.
The difficulty with obscure formats distributed with games is that they are often compiled from either a declarative definition language, a scripting language or directly from a set of resources like textures and meshes.
In some games, one compiled file will contain bits and pieces of all of the above, with no available documentation on the tools and formats used to piece it together. Some people call that "fun".
If you can't get anything from the hex, can't find any documentation and can't find a tool to read the file, you're probably best off asking the community to see if anyone is familiar with the technology.
Related
I am searching from last two days but did not find any thing.
My requirement is to create a document viewer in my web application (C#.Net) and I don't want to use any third party tool for this. Can I convert the files in image or PDF or in any common formate which can be easly render on web page. I also can not use Introp object.
Any help will be highly appreciated
You mention in one of your comments that you'd like to write all the code yourself but don't know where to start. Here's how I would go about it...
First, you'll need to familiarize yourself with the Microsoft Office Format specification. You can find that here (there's a link to the technical specification). Office documents are actually a .zip file with an XML file inside along with any binary data representing attachments. Just renamed a .docx file as .zip and you'll be able to open it up and see the XML and any other supporting documents inside (same is true for xlsx, etc...).
Then you'll need to become intimately familiar with either PDF or HTML, as your job now will be to convert the various Office document structure into PDF or HTML structure, being sure to respect page layout, margins, order, etc...
As others have said, this is a large task which is why third party tools exist today. Also, each third party toolset has it's limitation as this is really hard to "get right" in all situations and there will be edge cases that work for one document and not another (because maybe they didn't use Microsoft Word to save the .docx, maybe they used OpenOffice and OpenOffice interpreted the standard slightly differently...)
If you cannot use COM/Interop technologies in your solution, you can take a look at the specialized 3rd party options. I see that you prefer not to use them, however, there are no existing built-in solutions in the .NET Framework. Check out my answer in a similar thread that describes how to accomplish exactly the same task using 3rd party libraries (for example, DevExpress, since I have experience with it). In addition, take a look at the Documents demo, where you can see how to create images/thumbnails from different types of MS Office documents.
I believe what you need is an intermediate representation of the documents which can be converted into an image for the viewer to display.
Lets me try to explain with the below diagram:
You can use tools like smallpdf or OfficeToPDF to do that. Just integrate them into your application.
Small PDF(https://smallpdf.com/library-detail)
officetopdf (https://officetopdf.codeplex.com/)
I have just started looking at some work related with reading a mpeg-ts file. This is my first project with video streaming and my first task is to read the program names from the file.
I am currently looking at FFMpeg and FFProbe and have experience in C# and wanted to know which tool/language I should use to do this?
Or do I need another tool or language?
I have launched TSReader and I can see the PAT section which contains the information.
I've had good luck with NetBeans Java IDE and the ProjectX source code. since ProjectX is designed to transform different formats, it tends to have a lot of descriptive info about the file available on the UI and relatively easy to figure out variable naming in the code as well.
Contrast with other programs, which may be more mysterious in their decoding of the format, because they don't ever display the raw header info and don't have those variables named so clearly in code.
I'm in a situation where I'd like to, using C#, look at .iso files that are in a directory and determine if they are indeed video discs (DVD/BD or similar).
I don't need to actually distinguish the type, just a blanket "yes this is a video disc". Is there a way to do this?
the ISO file is actually a CD Image in file format. The easiest way to determine what is on it is to mount it with a Virtual CD program. Or you can look at the file contents.
Here is the Specifications for ISO files
http://users.telenet.be/it3.consultants.bvba/handouts/ISO9960.html
After you are able to determine what information is on the disk then you can determine if there is video information on it by finding out what the contents of those files are.
That is a much more daunting task then just determining the file structure.
This specification file will only define ISO files. Other cd formats will need to be read using their own Specifications...
You can determine if the file is of type ISO using the header data
Here is a Stack Question explaining in a little more detail.
Using .NET, how can you find the mime type of a file based on the file signature not the extension
EDIT
Looking into the Mime type thing a little more reveals that Microsoft will have to have a registered mime type for that header data. It may not know that it is an ISO and may tell you application/octet-stream If this is the case then you can instead use your own judgement with the same first 256 bytes. Determine some things that tell you that it is an ISO file that you can handle. Usually you can tell what type and version a file is with the first 20 bytes or so.
I did some searching around for a library that you could use to read/write ISO files. You just need the read part obviously and this project is something you could probably use http://discutils.codeplex.com/
As another mentioned, an ISO file contains a file system. The easiest way to read it is to mount it as a virtual drive, using any one of a number of utilities. Once you've mounted it as a drive, then you can determine that it likely contains a movie by inspecting the file system (i.e. using Directory.GetFiles and similar methods in C#).
If you want to read the file's contents directly (without mounting it), I'm not sure what to tell you. I've heard that 7-zip has an API that will let you read the files. You might also check out DiscUtils, which claims to be able to read ISO files.
Once you can read the contents of the file system, see the "Filesystem" section of http://en.wikipedia.org/wiki/DVD-Video. That will tell you what files and directories you should expect to see in the ISO of a DVD movie.
Note that the files' existence is an indication that the image probably contains a DVD movie. There's no way to tell for sure without examining the files' contents individually. Tracking down the specifications for the individual file types might be a more difficult task.
try using IMAPIv2 to interrogate the iso.
This link doesnt do that.. but it should get you started in the right direction.
How to create optical ISO using IMAPIv2
I need to read & modify metadata of files uploaded in our server. Is there a generic api/library for reading Metadata in a Key-Value type of deal?
This means that it may be proprietary files such as .doc/docx, .xls/xlsx, etc. And free stuff like .rtf, .txt, .jpg
Thanks for all the help
There's no library for reading the metadata of "all known filetypes" in any language, because it's pretty much impossible.
You may be able to find libraries capable of reading a particular format or family of closely-related formats, which is the most common solution and works in most situations.
For the formats you've listed, libraries do exist. JPG has some support built into C#, I think, through some of the System libraries. TXT is simple text, that's supported in most languages. RTF has some support, mainly through the RichTextBox control, I think. For the other two, I would look into Office's SDK or perhaps the Office development stuff for Visual Studio, those might have more information.
There is a program, TrID, that can identify file formats based on their data, which may be of some interest. It doesn't do proper metadata reading, but it is the closest thing to a universal file reader that exists (that I'm aware of).
I have a table with a binary column which stores files of a number of different possible filetypes (PDF, BMP, JPEG, WAV, MP3, DOC, MPEG, AVI etc.), but no columns that store either the name or the type of the original file. Is there any easy way for me to process these rows and determine the type of each file stored in the binary column? Preferably it would be a utility that only reads the file headers, so that I don't have to fully extract each file to determine its type.
Clarification: I know that the approach here involves reading just the beginning of each file. I'm looking for a good resource (aka links) that can do this for me without too much fuss. Thanks.
Also, just C#/.NET on Windows, please. I'm not using Linux and can't use Cygwin (doesn't work on Windows CE, among other reasons).
you can use these tools to find the file format.
File Analyser
http://www.softpedia.com/get/Programming/Other-Programming-Files/File-Analyzer.shtml
What Format
http://www.jozy.nl/whatfmt.html
PE file format analyser
http://peid.has.it/
This website may be helpful for you.
http://mark0.net/onlinetrid.aspx
Note:
i have included the download links to make sure that you are getting the right tool name and information.
please verify the source before you download them.
i have used a tool in the past i think it is File Analyser, which will tell you the closest match.
happy tooling.
This is not a complete answer, but a place to start would be a "magic numbers" library. This examines the first few bytes of a file to determine a "magic number", which is compared against a known list of them. This is (at least part) of how the file command on Linux systems works.
Someone else asked a similar question and posted the code used to do exactly this. You should be able to take what is posted here, and slightly modify it so that it pulls from your database.
https://stackoverflow.com/questions/58510
In addition to that, it looks like someone has written a library based off of magic numbers to do this, however, it looks like the site requires registration, and some form of alternate access in order to download this lirbary. The documentation is avaliable for free without registration, that may be helpful.
http://software.topcoder.com/catalog/c_component.jsp?comp=13249160&ver=2
The easiest way I know is to use file command that it is also available in Windows with Cygwin .
A lot of filetypes have well defined headers that begin the file. You could check the first few bytes to check to see how the file begins.
Easiest way to do this would be through access to a *nix (or cygwin) system that has the 'file' command:
$ file visitors.*
visitors.html: HTML document text
visitors.png: PNG image data, 5360 x 2819, 8-bit colormap, non-interlaced
You could write a C# application that piped the first X bytes of each binary column to the file command (using - as the file name)
You need to use some p/invoke interop code to call the SHGetFileInfo method from the Win32 API. This article may also help.