Find appended text from txt file - c#

i want to write a code in a way,if there is a text file placed in a specified path, one of the users edited the file and entered new text and saved it.now,i want to get the text which is appended last time.
here am having file size for both before and after append the text
my text file size is 1204kb from that i need to take the end of 200kb text alone is it possible

This can only be done if you're monitoring the file size in real-time, since files do not maintain their own histories.
If watching the files as they are modified is a possibility, you could perhaps use a FileSystemWatcher and calculate the increase in file size upon any modification. You could then read the bytes appended since the file last changes, which would be very straightforward.

Do you know how big the file was before the user appended the text? If not, there's no way of telling... files don't maintain a revision history (in most file systems, anyway).

You can keep track of the file pointer . Eg If you are using C language then you can go to the end of the file using fseek(fp,SEEK_END) and then use ftell(fp) which will give you the current position of the file pointer . After the user edits and saves the file , when you rerun the code you can check with the new position original position . If the new position is greater than the original position offset those number of bytes with the file pointer

As #Jon Skeet alludes to in his answer, the only way to tell specifically what text that was "appended", is by knowing how large the file was before it was changed. The rest of the characters is thus what was "appended".
Note that I quote appended above since I get two conflicting meanings from your question; edited and appended.
If the user only appends text, which is taken to mean "add more text only at the end", then the previous-size approach should in theory work.
However, if the user freely edits the text, by adding text in random spots, and perhaps even removing or changing existing text, then you need a whole 'nother approach to this.
If it's the latter, I might have something you could use, a binary patching implementation that can also be used to figure out from an older copy of the same file what was changed in a newer copy. It isn't easy to use, and might not give you exactly what you want, but as I said, it's hard to tell exactly what your question is.

If your program is running the entire time, you could grab a copy of the file in memory. Then in a separate thread periodically read the new file and compare the two.

If you want your program to be notified when file is changed, use FileSystemWatcher. However, it will only notify you, when file is changed while your program is running and will not provide you with appended text. You will get only information about which file was changed.
FileSystemWatcher watcher = new FileSystemWatcher(Environment.CurrentDirectory, "test.txt");
while (true)
{
var changedResult =
watcher.WaitForChanged(WatcherChangeTypes.Changed);
Console.WriteLine(changedResult.Name);
}
Or:
FileSystemWatcher watcher = new FileSystemWatcher(Environment.CurrentDirectory, "test.txt");
watcher.Changed += watcher_Changed;
static void watcher_Changed(object sender, FileSystemEventArgs e)
{
Console.WriteLine(e.FullPath);
Console.WriteLine(e.ChangeType);
}

Best solution imo is to write a small app which has to be used to change the file in question. This application can then insert additional info into the file which allows you to keep the entire revision history.

Related

How do i start reading a text file from a specific point?

So my question is basically, how do i start reading a file from a specific line, like for example line 14 until line 18?
Im working on a simple ContactList app and the only thing missing is deleting the information from a specific name. The user can create a new contact which has a name, a number and an address as information. I want the user to also be able to delete the data of that person by typing in their name. Then, the program should read the name and all of the 4 lines under it and remove them from the text File. How could i achieve this?
You can jump to any offset within a file. However, there isn't any way to know where a particular line begins unless you know the length of every line.
If you are writing a contact app, you should not use a regular text file unless:
You pad line lengths so that you can easily calculate the position of each line.
You are loading the entire file into memory.
You can't. You need to read the first n lines in order to find out which line has which number. Except if your records have a fixed length per line (which is not a good idea - there's always someone with a longer name that you could think of).
Likewise, you can't delete a line from the text file. The space on disk does not move by itself. You need an algorithm that implements safe saving and rearranges the data:
foreach line in input_file:
if line is needed:
write line to temporary_output_file
else:
ignore (don't write = delete)
delete input_file
move temporary_output_file to input_file
Disadvantage: you need about double the disk space while input_file and temporary_output_file both exist.
With safe saving, the NTFS file system driver will give the moved file the same time stamp that it had before deleting the file. Read the Windows Internals 7 book (should be part 2, chapter 11) to understand it in detail.
Depending on how large the contact list is (probably it's less than 10M entries), there's no problem of loading the whole database into memory, deleting the record and then writing everything back.

OpenFileDialog & SaveFileDialog Pop-up search with filter in C#

I have openFileDialog and saveFileDialog with filter (only .dvbcfg extention):
SaveFileDialog saveFileDialog = new SaveFileDialog();
saveFileDialog.Filter = "DVB Configuration File (*.dvbcfg)|*.dvbcfg";
saveFileDialog.DefaultExt = "dvbcfg";
saveFileDialog.AddExtension = true;
It works properly, but when I'm trying to type filename manually it shows files with any extentions w/o filtering and opens/saves them (first - open file, second - save file):
ScreenShot
How to show only files that matches saveFileDialog.Filter?
P.S. I have overwrite function in saveFileDialog.
UPD I have another option - throw an exception when user selected wrong filetype, but I have no idea how to get only file extention from saveFileDialog.FileName string.
At a certain point, you have to "trust" your users. You can steer them towards good ways of working with your program, but at a certain point, you have to recognise that you've put enough simple barriers in their way to prevent accidental misuse1 but you're unlikely to be able to create enough barriers (in these dialogs) to prevent malicious misuse.
The problem is that using wrong file may cause damage to expensive equipment (DVB-3030 Digital Modulator in this case) even if I'm using try/catch to get variables from files (they need to be integers, in try segment I have Convert.ToInteger32) and variable ranges in if/else checks (for example Frequency range should be 10MHz - 90 MHz with 100Hz step). Since program will be used by students, they can purposely try to break it.
And nothing in your current question (or sought answer) would prevent someone from renaming any arbitrary file to have a .dvbcfg extension.
At this point, you "trust" that the user has given you the filename they wish to use. What you need to do next is to validate the contents of the file. If it has a .dvbcfg extension but isn't actually a valid DVB config file, you need to reject it. If it doesn't have a .dvbcfg extension (hey, maybe they're working with an old file system that only allows 8.3 file names :-)) but turns out to have valid content, why be churlish and reject that file?
I would recommend more than just wrapping ToInteger32 calls in try/catch. Go through the file. Ensure it contains exactly what it should and nothing else. Read each parameter value and probably use TryParse on those. Because your code now "expects" to receive invalid inputs. Then validate ranges, etc.
1Which I'd say you've already got.

C# - Compare Two Text Files

Background
I'm developing a simple windows service which monitors certain directories for file creation events and logs these - long story short, to ascertain if a file was copied from directory A to directory B. If a file is not in directory B after X time, an alert will be raised.
The issue with this is I only have the file to go on for information when working out if it has made its way to directory B - I'd assume two files with the same name are the same, but as there are over 60 directory A's and a single directory B - AND the files in any directory A may accidentally be the same as another (by date or sequence) this is not a safe assumption...
Example
Lets say, for example, I store a log that file "E17999_XXX_2111.txt" was created in directory C:\Test. I would store the filename, file path, file creation date, file length and the BOM for this file.
30 seconds later, I detect that the file "E17999_XXX_2111.txt" was created in directory C:\FinalDestination... now I have the task of determining whether;
a) the file is the same one created in C:\Test, therefore I can update the first log as complete and stop worrying about it.
b) the file is not the same and I somehow missed the previous steps - therefore I can ignore this file because it has found its way to the destination dir.
Research
So, in order to determine if the file created in the destination is exactly the same as the one created in the first instance, I've done a bit of research and found the following options:
a) filename compare
b) length compare
c) a creation-date compare
d) byte-for-byte compare
e) hash compare
Problems
a) As I said above, going by Filename alone is too presumptuous.
b) Again, just because the length of the contents of a file is the same, it doesn't necessarily mean the files are actually the same.
c) The problem with this is that a copied file is technically a new file, therefore the creation date changes. I would want to set the first log as complete regardless of the time elapsed between the file appearing in directory A and directory B.
d) Aside from the fact that this method is extremely slow, it appears there's an issue if the second file has somehow changed encoding - for example between ANSII and ASCII, which would cause a byte mis-match for things like ascii quotes
I would like not to assume that just because an ASCII ' has changed to an ANSII ', the file is now different as it is near enough the same.
e) This seems to have the same downfalls as a byte-for-byte compare
EDIT
It appears the actual issue I'm experiencing comes down to the reason for the difference in encoding between directories - I'm not currently able to access the code which deals with this part, so I can't tell why this happens, but I am looking to implement a solution which can compare files regardless of encoding to determine "real" differences (i.e. not those whereby a byte has changed due to encoding)
SOLUTION
I've managed to resolve this now by using the SequenceEqual comparison below after encoding my files to remove any bad data if the initial comparison suggested by #Magnus failed to find a match due to this. Code below:
byte[] bytes1 = Encoding.Convert(Encoding.GetEncoding(1252), Encoding.ASCII, Encoding.GetEncoding(1252).GetBytes(File.ReadAllText(FilePath)));
byte[] bytes2 = Encoding.Convert(Encoding.GetEncoding(1252), Encoding.ASCII, Encoding.GetEncoding(1252).GetBytes(File.ReadAllText(FilePath)));
if (Encoding.ASCII.GetChars(bytes1).SequenceEqual(Encoding.ASCII.GetChars(bytes2)))
{
//matched!
}
Thanks for the help!
You would then have to compare the string content if the files. The StreamReader (which ReadLines uses) should detect the encoding.
var areEquals = System.IO.File.ReadLines("c:\\file1.txt").SequenceEqual(
System.IO.File.ReadLines("c:\\file2.txt"));
Note that ReadLines will not read the complete file into memory.

Finding "empty" portions in a file

EDIT 1:
I build a torrent application; Downloading from diffrent clients simultaneously. Each download represent a portion for my file and diffrent clients have diffrent portions.
After a download is complete, I need to know which portion I need to achieve now by Finding "empty" portions in my file.
One way to creat a file with fixed size:
File.WriteAllBytes(#"C:\upload\BigFile.rar", new byte[Big Size]);
My portion Arr that represent my file as portions:
BitArray TorrentPartsState = new BitArray(10);
For example:
File size is 100.
TorrentPartsState[0] = true; // thats mean that in my file, from position 0 until 9 I **dont** need to fill in some information.
TorrentPartsState[1] = true; // thats mean that in my file, from position 10 until 19 I **need** to fill in some information.
I seatch an effective way to save what the BitArray is containing even if the computer/application is shut down. One way I tought of, is by xml file and to update it each time a portion is complete.
I don't think its smart and effective solution. Any idea for other one?
It sounds like you know the following when you start a transfer:
The size of the final file.
The (maximum) number of streams you intend to use for the file.
Create the output file and allocate the required space.
Create a second "control" file with a related filename, e.g. add you own extension. In that file maintain an array of stream status structures corresponding to the network streams. Each status consists of the starting offset and number of bytes transferred. Periodically flush the stream buffers and then update the control file to reflect the progress made and committed.
Variations on the theme:
The control file can define segments to be transferred, e.g. 16MB chunks, and treated as a work queue by threads that look for an incomplete segment and a suitable server from which to retrieve it.
The control file could be a separate fork within the result file. (Who am I kidding?)
You could use a BitArray (in System.Collections).
Then, when you visit on offset in the file, you can set the BitArray at that offset to true.
So for your 10,000 byte file:
BitArray ba = new BitArray(10000);
// Visited offset, mark in the BitArray
ba[4] = true;
Implement a file system (like on a disk) in your file - just use something simple, should be something available in the FOS arena

SharpZipLib - progress through fileS during extract

this has to be really easy, and it certainly seems to be a very frequently asked question, but I can't for the life of me find a 'straightforward' answer.
I want to create a ProgressBar that shows a Zip file being extracted by SharpZipLib.
The FastZip and FastZipEvents classes give progress on individual files but not on position within the overall Zip. That is, if there Zip contains 200 files, what file is currently being extracted. I don't care about the progress through individual files (e.g. 20KB through 43KB in Foo.txt).
I think I could fudge a way of doing this by first creating a ZipFile and to access the Count property. And then... using ZipInputStream or FastZip to extract and keep progress count myself but I think that means the Zip is effectively unzipped twice (once entirely into memory) and I don't like that.
Any clean way of doing this?
Regarding your last sentence: "I think that means the Zip is effectively unzipped twice".
Reading the content table of a zip file doesn't cost a lot at all (and doesn't access the contained files. You probably noticed that when you looked at a zip file with a "password" and only needed to enter the password when you tried to extract a file. You can look at the entries/content table just fine).
So I see nothing wrong with the approach of first checking the index/content table, storing the entry count (maybe even with compressed/uncompressed size?) and using the stream based api later.
FYI: DotNetZip has ExtractProgress event for this sort of thing. Code:
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
zip.ExtractProgress = MyExtractProgress;
zip.ExtractAll(TargetDirectory);
}
The extractprogress handler looks like this:
private void MyExtractProgress(object sender, ExtractProgressEventArgs e)
{
switch (e.EventType)
{
case ZipProgressEventType.Extracting_BeforeExtractEntry:
....
case ZipProgressEventType.Extracting_EntryBytesWritten:
...
case ZipProgressEventType.Extracting_AfterExtractEntry:
....
}
}
You could use it to drive the familiar 2-progressbar UI, with one bar showing progress for the archive, and another bar showing progress for the individual file within the archive.

Categories