How do i start reading a text file from a specific point? - c#

So my question is basically, how do i start reading a file from a specific line, like for example line 14 until line 18?
Im working on a simple ContactList app and the only thing missing is deleting the information from a specific name. The user can create a new contact which has a name, a number and an address as information. I want the user to also be able to delete the data of that person by typing in their name. Then, the program should read the name and all of the 4 lines under it and remove them from the text File. How could i achieve this?

You can jump to any offset within a file. However, there isn't any way to know where a particular line begins unless you know the length of every line.
If you are writing a contact app, you should not use a regular text file unless:
You pad line lengths so that you can easily calculate the position of each line.
You are loading the entire file into memory.

You can't. You need to read the first n lines in order to find out which line has which number. Except if your records have a fixed length per line (which is not a good idea - there's always someone with a longer name that you could think of).
Likewise, you can't delete a line from the text file. The space on disk does not move by itself. You need an algorithm that implements safe saving and rearranges the data:
foreach line in input_file:
if line is needed:
write line to temporary_output_file
else:
ignore (don't write = delete)
delete input_file
move temporary_output_file to input_file
Disadvantage: you need about double the disk space while input_file and temporary_output_file both exist.
With safe saving, the NTFS file system driver will give the moved file the same time stamp that it had before deleting the file. Read the Windows Internals 7 book (should be part 2, chapter 11) to understand it in detail.
Depending on how large the contact list is (probably it's less than 10M entries), there's no problem of loading the whole database into memory, deleting the record and then writing everything back.

Related

Replace the text in a file, but faster

So I made a math problem program that basically reads one number from a text file (only number in that text file) and replaces it with a number+1 if number is not a solution.
Now the issue is, if I only add a text in the next row using
sw.WriteLine(text);
that makes the calculations really fast, doing 100k+ numbers in a few seconds, but it's just adding the number to the text file without deleting previous.
Alternatively I used
string[] lines = File.ReadAllLines("numbers.txt");
foreach (string line in lines)
{
lines[0] = Convert.ToString(biginta);
}
File.WriteAllLines("numbers.txt", lines);
but that made my program run considerably slower.
Is there a way I can replace text in a .txt file by using already open filestream?
I'm new to c# so my whole program is basically a Frankenstein of a code.
I'm using a file to store the next number needed to run because I turn off my pc overnight.
Honestly the quickest solutiion to this is the following: Read the file once, do several (like 100) calculations without saving and then store the current number back into the file.
Tune the interval so that you store the current state once every 5 seconds or so.
That gives you still a good starting point (at most 5 seconds lost work) but also reduces disk IO to the point where it won't slow down the calculation any more.

RIFF ICMT tag size doesn't seem to match data

I am trying to read the data stored in an ICMT tag on a WAV file generated by a noise monitoring device.
The RIFF parsing code all seems to work fine, except for the fact that the ICMT tag seems to have data after the declared size. As luck would have it, it's the timestamp, which is the one absolutely critical piece of info for my application.
SYN is hex 16, which gives a size of 22, which is up to and including the NUL before the timestamp. The monitor documentation is no help; it says that the tag includes the time, but their example also has the same issue.
It is the last tag in the enclosing list, and the size of the list does include it - does that mean it doesn't need a chunk ID? I'm struggling to find decent RIFF docs, but I can't find anything that suggests that's the case; also I can't see how it'd be possible to determine that it was the last chunk and so know to read it with no chunk ID.
Alternatively, the ICMT comment chunk is the last thing in the file - is that a special case? Can I just get the time by reading everything from the end of the declared length ICMT to the end of the file and assume that will always work?
The current parser behaviour is that it's being read after the channel / dB information as a chunk ID + size, and then complaining that there was not enough data left in the file to fulfil the request.
No, it would still need its own ID. No, being the last thing in the file is no special case either. What you're showing here is malformed.
Your current parser errors correctly, as the next thing to be expected again is a 4 byte ID followed by 4 bytes for the length. The potential ID _10: is unknown and would be skipped, but interpreting 51:4 as DWORD for the length of course asks for trouble.
The device is the culprit. Do you have other INFO fields which use NULL bytes? If not then I assume the device is naive enough to consider a NULL the end of a string, despite producing himself strings with multiple NULLs.
Since I encountered countless files not sticking to standards I can only say your parser is too naive as well: it knows how long the encapsulating list is and thus could easily detect field lengths that would not fit anymore. And could ignore garbage like that. Or, in your case, offer the very specific option "add to last field".

Set limit of text file

I am writing lines to text file. Is there a way to limit the maximum number of lines in a text file. So that I am not allowed to write after that limit.
Or
if i continue to write after the max line limit the oldest written lines are deleted to accommodate the newly added lines.
There is ... but you shouldn't be hitting it ... And if you ARE ... well, maybe a text file isn't what you're looking for.
Size wise, a file has different limitations depending on your file system ... NTFS (almost 16TB), FAT (fat 32 is almost 4GB), unix file systems will have their limitations, and so on ...
here you have answers about the size: one answer, and another
Like they suggest, your limit will be the size of the file.
As for your comment:
You can set the limit to whatever you wish.
What you do then is up to you ... if you decide to overwrite the file, it'll delete and start afresh. if you decide to append, it'll append to the end.
I would suggest create a queue of a 100 strings, and if you push new ones, drop the last one in the queue. Then you can just have that class save the log whenever, wherever and however you want.
Create your own method like this
public void writeLines(string filePath,string[] lines,int limit)
{
var buffer=Enumerable.Empty<string>();
if(File.Exists(filePath))
buffer=File.ReadAllLines(path);
File.WriteAllLines(filePath,lines);
int range=limit-lines.Length+buffer.Count;
File.AppendAllLines(filePath,buffer.Take(range));
}
The answer to the first question is very simple:
Know your storage limit.
Know your current file size.
If the new line's length plus current file size is more than the storage limit, don't append it.
Now, the second is kind of tricky. As pointed out by several participants on this thread, line-by-line threshold manipulation can be very very costly.
Let's do some napkin simulations, and assume you're inserting 1024 bytes (1KB) at each Append, and your storage limit is 1GB. Once you insert the last line (n. 1048576), you decide you need to remove the first line. There's a few ways to accomplish this, but the majority of them will involve loading the whole collection minus the initial line elsewhere (memory, disk, you name it) and Appending the new one. Not exactly the most practical approach - you'll be manipulating a stack a million times larger than the content you want to add, just for the sake of adding it.
Solution 1
Cursor buffer
On our example you have 1048576 possible entries (1KB records on 1GB file).
Start filling it up; save the current position (cursor) elsewhere.
Once you reach the limit, your cursor resets; you overwrite position 0, then 1, and so forth.
Advantages: Very low disk cost.
Disadvantages: You'll need to keep track of your current cursor somewhere.
Solution 2
Text blocks
Assume 1GB storage, and max 1MB files for this example.
Start filling up File #0.
Once it reaches 1MB, close it. Open File #1. Rinse and repeat.
Once you fill up file #1023 (thus reaching the 1GB max), delete oldest file (#0). Create file #1024. Continue your logging.
Advantages: Low manipulation cost - you only run one delete operation.
Disadvantages: You don't delete only one entry - you delete a whole block.

C# code to perform Binary search in a very big text file

Is there a library that I can use to perform binary search in a very big text file (can be 10GB).
The file is a sort of a log file - every row starts with a date and time. Therefore rows are ordered.
I started to write the pseudo-code on how to do it, but I gave up since it may seem condescending. You probably know how to write a binary search, it's really not complicated.
You won't find it in a library, for two reasons:
It's not really "binary search" - the line sizes are different, so you need to adapt the algorithm (e.g. look for the middle of the file, then look for the next "newline" and consider that to be the "middle").
Your datetime log format is most likely non-standard (ok, it may look "standard", but think a bit.... you probably use '[]' or something to separate the date from the log message, something like [10/02/2001 10:35:02] My message ).
On summary - I think your need is too specific and too simple to implement in custom code for someone to bother writing a library :)
As the line lengths are not guaranteed to be the same length, you're going to need some form of recognisable line delimiter e.g. carriage return or line feed.
The binary search pattern can then be pretty much your traditional algorithm. Seek to the 'middle' of the file (by length), seek backwards (byte by byte) to the start of the line you happen to land in, as identified by the line delimiter sequence, read that record and make your comparison. Depending on the comparison, seek halfway up or down (in bytes) and repeat.
When you identify the start index of a record, check whether it was the same as the last seek. You may find that, as you dial in on your target record, moving halfway won't get you to a different record. e.g. you have adjacent records of 100 bytes and 50 bytes respectively, so jumping in at 75 bytes always takes you back to the start of the first record. If that happens, read on to the next record before making your comparison.
You should find that you will reach your target pretty quickly.
You would need to be able to stream the file, but you would also need random access. I'm not sure how you accomplish this short of a guarantee that each line of the file contains the same number of bytes. If you had that, you could get a Stream of the object and use the Seek method to move around in the file, and from there you could conduct your binary search by reading in the number of bytes that constitute a line. But again, this is only valid if the lines are the same number of bytes. Otherwise, you would jump in and out of the middle of lines.
Something like
byte[] buffer = new byte[lineLength];
stream.Seek(lineLength * searchPosition, SeekOrigin.Begin);
stream.Read(buffer, 0, lineLength);
string line = Encoding.Default.GetString(buffer);
This shouldn't be too bad under the constraint that you hold an Int64 in memory for every line-feed in the file. That really depends upon how long the line of text is on average, given 1000 bytes per line you be looking at around (10,000,000,000 / 1000 * 4) = 40mb. Very big, but possible.
So try this:
Scan the file and store the ordinal offset of each line-feed in a List
Binary search the List with a custom comparer that scans to the file offset and reads the data.
If your file is static (or changes rarely) and you have to run "enough" queries against it, I believe the best approach will be creating "index" file:
Scan the initial file and take the datetime parts of the file plus their positions in the original (this is why has to be pretty static) encode them some how (for example: unix time (full 10 digits) + nanoseconds (zero-filled 4 digits) and line position (zero filed 10 digits). this way you will have file with consistent "lines"
preform binary search on that file (you may need to be a bit creative in order to achieve range search) and get the relevant location(s) in the original file
read directly from the original file starting from the given location / read the given range
You've got range search with O(log(n)) run-time :) (and you've created primitive DB functionality)
Needless to say that if the file data file is updated "too" frequently or you don't run "enough" queries against the index file you mat end up with spending more time on creating the index file than you are saving from the query file.
Btw, working with this index file doesn't require the data file to be sorted. As log files tend to be append only, and sorted, you may speed up the whole thing by simply creating index file that only holds the locations of the EOL marks (zero-filled 10 digits) in the data file - this way you can preform the binary search directly on the data-file (using the index file in order to determinate the seek positions in the original file) and if lines are appended to the log file you can simply add (append) their EOL positions to the index file.
The List object has a Binary Search method.
http://msdn.microsoft.com/en-us/library/w4e7fxsh%28VS.80%29.aspx

Find appended text from txt file

i want to write a code in a way,if there is a text file placed in a specified path, one of the users edited the file and entered new text and saved it.now,i want to get the text which is appended last time.
here am having file size for both before and after append the text
my text file size is 1204kb from that i need to take the end of 200kb text alone is it possible
This can only be done if you're monitoring the file size in real-time, since files do not maintain their own histories.
If watching the files as they are modified is a possibility, you could perhaps use a FileSystemWatcher and calculate the increase in file size upon any modification. You could then read the bytes appended since the file last changes, which would be very straightforward.
Do you know how big the file was before the user appended the text? If not, there's no way of telling... files don't maintain a revision history (in most file systems, anyway).
You can keep track of the file pointer . Eg If you are using C language then you can go to the end of the file using fseek(fp,SEEK_END) and then use ftell(fp) which will give you the current position of the file pointer . After the user edits and saves the file , when you rerun the code you can check with the new position original position . If the new position is greater than the original position offset those number of bytes with the file pointer
As #Jon Skeet alludes to in his answer, the only way to tell specifically what text that was "appended", is by knowing how large the file was before it was changed. The rest of the characters is thus what was "appended".
Note that I quote appended above since I get two conflicting meanings from your question; edited and appended.
If the user only appends text, which is taken to mean "add more text only at the end", then the previous-size approach should in theory work.
However, if the user freely edits the text, by adding text in random spots, and perhaps even removing or changing existing text, then you need a whole 'nother approach to this.
If it's the latter, I might have something you could use, a binary patching implementation that can also be used to figure out from an older copy of the same file what was changed in a newer copy. It isn't easy to use, and might not give you exactly what you want, but as I said, it's hard to tell exactly what your question is.
If your program is running the entire time, you could grab a copy of the file in memory. Then in a separate thread periodically read the new file and compare the two.
If you want your program to be notified when file is changed, use FileSystemWatcher. However, it will only notify you, when file is changed while your program is running and will not provide you with appended text. You will get only information about which file was changed.
FileSystemWatcher watcher = new FileSystemWatcher(Environment.CurrentDirectory, "test.txt");
while (true)
{
var changedResult =
watcher.WaitForChanged(WatcherChangeTypes.Changed);
Console.WriteLine(changedResult.Name);
}
Or:
FileSystemWatcher watcher = new FileSystemWatcher(Environment.CurrentDirectory, "test.txt");
watcher.Changed += watcher_Changed;
static void watcher_Changed(object sender, FileSystemEventArgs e)
{
Console.WriteLine(e.FullPath);
Console.WriteLine(e.ChangeType);
}
Best solution imo is to write a small app which has to be used to change the file in question. This application can then insert additional info into the file which allows you to keep the entire revision history.

Categories