C# Merge VOB files, is it possible? - c#

All,
I'm making a training kit that has content given to use with 2 VOB files that I need the software to automatically merge to 1. We'll be getting upto 10-15 vob files from this vender and our requirements are to move to a single file.
Is merging these files as easy as opening byte streams and combining them?
Thanks!

If the specifications of the files match it should be possible to use the header from the first file and copy the remaining files minus their header into one file. But the specifications needs to match exactly on everything from encoding type and parameters to number of audio channels.
If so, then all you need to do is read all the files and skip the first xxx bytes of every file except the first one.
It won't work if the VOB-files are encrypted (DVD encryption).
Note: This is a job specialized tools do well. They are optimized and (more or less) bug free. So if you can, use them (i.e. from the command line).

No, it is not simple merging. Otherwise old DOS command >type 1.VOB, 2.VOB > Final.VOB would have done the job.
Unless it is for some learning, just use any VOB merging tool to merge these two.

A lot of this is probably going to depend on if the VOB files have the same resolution and bit rate, as well ensuring a lot of other encoding parameters are the same. If they are using the exact same encoding parameters, simply doing a concatenation of the files will probably work. My experience with DVDs shows that files from the DVD work fine when this is done. However, my first guess is that this wouldn't work if there was any format differences between the files.

Related

How to determine file type?

I need to know if my file is audio file: mp3, wav, etc...
How to do this?
Well, the most robust way would be to write a parser for the file types you want to detect and then just try – if there are no errors, it's obviously of the type you tried. This is an expensive approach, however, but it would ensure that you can successfully load the file as well since it will also check the rest of the file for semantic soundness.
A much less expensive variant would be to look for “magic” bytes – signatures at the start or known offsets of the file. For example, if a file starts with an ID3 tag you can be reasonably sure it's an MP3 file. If a file starts with RIFF¼↕☻ WAVEfmt, then it's a WAV file. However, such detection cannot guarantee you that the file is really of that type – it could just be the signature and following that garbage.
While you can use the extension to make a reasonable guess as to what the file is it's not guaranteed to work 100% of the time. If you are targeting Windows then it will work 99.9% of the time as that's how Windows keeps track of what file is what type.
If you are getting your files from non-Windows sources the only sure way is to open the file and look for a specific string or set of bytes which will unambiguously identify it. For example, you could look for the ID3 tags in an mp3 file:
The ID3v1 tag occupies 128 bytes, beginning with the string TAG.
or
ID3v2 tags are of variable size, and usually occur at the start of the file, to aid streaming media.
How far you go depends on how robust you want your solution to be, and does rely on there being a header or pattern that's always present.
Doing it this way can help guard against malicious content where someone posts a piece of malware as a mp3 file (say) and hopes that it will just be run by a program prone to some exploit (a buffer overrun perhaps).
You can use the file extension to figure it out:
using System.IO;
class Program
{
static void Main()
{
string filepath = #"C:\Users\Sam\Documents\Test.txt";
string extension = Path.GetExtension(filepath);
if (extension == ".mp3")
{
Console.WriteLine(extension);
}
}
}
The file extension is the first point of call for the OS to figure out what file type it's dealing with, if you really want to know the file type 100% the only way to do it is read into the file. But this comes with a catch, image files are easy as they include headers in a pretty easy to read format, however it can get a little more complex with a completely variable file type.
You could check out this post on an old post for a bit of help. Here is a post about finding just media file types.
Ultimately it depends on why your trying to do this.
Path.GetExtension(PathToFile)
See this post. You end up passing the first (up to) 256 bytes of data from the file to FindMimeFromData (part of the Urlmon.dll).

C#: Archiving a File into Parts of 100MB

In my application, the user selects a big file (>100 mb) on their drive. I wish for the program to then take the file that was selected and chop it up into archived parts that are 100 mb or less. How can this be done? What libraries and file format should I use? Could you give me some sample code? After the first 100mb archived part is created, I am going to upload it to a server, then I will upload the next 100mb part, and so on until the upload is finished. After that, from another computer, I will download all these archived parts, and then I wish to connect them into the original file. Is this possible with the 7zip libraries, for example? Thanks!
UPDATE: From the first answer, I think I'm going to use SevenZipSharp, and I believe I understand now how to split a file into 100mb archived parts, but I still have two questions:
Is it possible to create the first 100mb archived part and upload it before creating the next 100mb part?
How do you extract a file with SevenZipSharp from multiple splitted archives?
UPDATE #2: I was just playing around with the 7-zip GUI and creating multi-volume/split archives, and I found that selecting the first one and extracting from it will extract the whole file from all of the split archives. This leads me to believe that paths to the subsequent parts are included in the first one (or is it consecutive?). However, I'm not sure if this would work directly from the console, but I will try that now, and see if it solves question #2 from the first update.
Take a look at SevenZipSharp, you can use this to create your spit 7z files, do whatever you want to upload them, then extract them on the server side.
To split the archive look at the SevenZipCompressor.CustomParameters member, passing in "v100m". (you can find more parameters in the 7-zip.chm file from 7zip)
You can split the data into 100MB "packets" first, and then pass each packet into the compressor in turn, pretending that they are just separate files.
However, this sort of compression is usually stream-based. As long as the library you are using will do its I/O via a Stream-derived class, it would be pretty simple to implement your own Stream that "packetises" the data any way you like on the fly - as data is passed into your Write() method you write it to a file. When you exceed 100MB in that file, you simply close that file and open a new one, and continue writing.
Either of these approaches would allow you to easily upload one "packet" while continuing to compress the next.
edit
Just to be clear - Decompression is just the reverse sequence of the above, so once you've got the compression code working, decompression will be easy.

Drive searching

I am developing an application and I would like to be able to search the whole drive for a regular expression. I would prefer to do this in c# but I can call other languages. Is there any easy way to just seek through all the binary data on a drive from begining to end?
Here's an implementation of grep in C#
http://dotnet.jku.at/applications/Grep/Src.aspx
You can modify to follow subdirectories -- it works off of an array of filenames.
AFAIK there is no simple way to do this on raw binary data (You would need direct disk control).
If file-basis is enough enumerating all files, opening them for binary shared reading (catch the exceptions for the ones that are system protected) and then looking for the data should be straightforward. However this will be quite slow as enumerating and opening all files will take some time.
I don't think C# can read all files / data for the drive the OS is on, since the OS locks some files.
You could use the System.IO namespace to enumerate all files, and then scan them individually byte by byte, this obviously would take a long time.
Do you really want to do this ? How are you going to search:
.doc
.xls
.pdf
.html
etc.? Each file type will represent the string you're searching for in different ways.
This article shows how to read data directly from the disk. Everything they do from C++ could be done from C# using PInvoke.

How do I split a big file into smaller ones (more FTP friendly), and merge them back later?

My server doesnt allow upload/download of big files. On the other hand, I built a bootstrapper that needs to upload/download big files.
How can I split a big file into smaller subfiles.. and do the merging later on?
An already done c# library would be great... but I'm happy hear suggestions about how to program this myself... or even use a utility.
** Windows platform **
On Unix, you can use the split command to break apart the file, and then use cat to concatenate them together.
split -b 1024M bigfile.tar.gz bigfile
This will create oodles of files like bigfileaa bigfileab, etc.
So then ftp all the little beasties to the destination and do the cat:
cat bigfile* > bigfile.tar.gz
On Windows, you might have an option in your Zip application to break apart an archive and remerge it on the other end. Actually, a googling of the search terms: zip split turns up several such options.
On windows you can easyly split it with WinRar.
Or you do it "with your own hand":
1) 1
2) 2
Every zip program I've ever used has this ability.
7zip is my current favorite on windows. It has a nice command line version, too.
You can make a split and join program with a handful of lines each. Just read some fixed amount (512KB, 4MB, whatever) from a file and write it out to a new file. Repeat this (and change the filename you write to) until you reach the end of the file.
Another program needs to read from these files and write their contents (one after another) to a target file.
Pretty easy, really, and if you want to get some programming experience it would be a good exercise.
Or, you can write a small application to meet your needs...
Just bytes read and then write....So, it can eazily split the big file into small ones

Is there an easy way to determine the type of a file without knowing the file's extension?

I have a table with a binary column which stores files of a number of different possible filetypes (PDF, BMP, JPEG, WAV, MP3, DOC, MPEG, AVI etc.), but no columns that store either the name or the type of the original file. Is there any easy way for me to process these rows and determine the type of each file stored in the binary column? Preferably it would be a utility that only reads the file headers, so that I don't have to fully extract each file to determine its type.
Clarification: I know that the approach here involves reading just the beginning of each file. I'm looking for a good resource (aka links) that can do this for me without too much fuss. Thanks.
Also, just C#/.NET on Windows, please. I'm not using Linux and can't use Cygwin (doesn't work on Windows CE, among other reasons).
you can use these tools to find the file format.
File Analyser
http://www.softpedia.com/get/Programming/Other-Programming-Files/File-Analyzer.shtml
What Format
http://www.jozy.nl/whatfmt.html
PE file format analyser
http://peid.has.it/
This website may be helpful for you.
http://mark0.net/onlinetrid.aspx
Note:
i have included the download links to make sure that you are getting the right tool name and information.
please verify the source before you download them.
i have used a tool in the past i think it is File Analyser, which will tell you the closest match.
happy tooling.
This is not a complete answer, but a place to start would be a "magic numbers" library. This examines the first few bytes of a file to determine a "magic number", which is compared against a known list of them. This is (at least part) of how the file command on Linux systems works.
Someone else asked a similar question and posted the code used to do exactly this. You should be able to take what is posted here, and slightly modify it so that it pulls from your database.
https://stackoverflow.com/questions/58510
In addition to that, it looks like someone has written a library based off of magic numbers to do this, however, it looks like the site requires registration, and some form of alternate access in order to download this lirbary. The documentation is avaliable for free without registration, that may be helpful.
http://software.topcoder.com/catalog/c_component.jsp?comp=13249160&ver=2
The easiest way I know is to use file command that it is also available in Windows with Cygwin .
A lot of filetypes have well defined headers that begin the file. You could check the first few bytes to check to see how the file begins.
Easiest way to do this would be through access to a *nix (or cygwin) system that has the 'file' command:
$ file visitors.*
visitors.html: HTML document text
visitors.png: PNG image data, 5360 x 2819, 8-bit colormap, non-interlaced
You could write a C# application that piped the first X bytes of each binary column to the file command (using - as the file name)
You need to use some p/invoke interop code to call the SHGetFileInfo method from the Win32 API. This article may also help.

Categories