Does NTFS support checksums per file [duplicate] - c#

This question already has answers here:
There is in Windows file systems a pre computed hash for each file?
(3 answers)
Closed 7 years ago.
Since I don't like to use software already on market to teach myself in new techniques I'm developing a tool looking for duplicates of files based on their hashes.
Reading the file entries from a path is not the problem but hashing the files takes it's amount of time.
Does NTFS natively support a per file checksum which I can use?
Since my lag of knowledge of NTFS internally I don't know which search terms to use. ntfs+checksum+file is widely useless.

No, there is no hashes in NTFS. File writes will become very slow if any change on e.g. 10MB file requires hash recalc.

Related

C# write data into CSV file [duplicate]

This question already has answers here:
Should log file streams be opened/closed on each write or kept open during a desktop application's lifetime?
(13 answers)
Closed 6 years ago.
i need to write data into a CSV file each 600 msec using a c# application . The question: is better open and close file each time or keep it open until the end of write data actions? Note: i will change file name each day and each 60000 record
Thanck a lot for your opinions
CSV files are really easy to write to. If you don't know how to write to a file, the dotnetperls is your friend. You can simply call BinaryWriter.Write() to write anything. Write a value then a comma. That's it! If this file is going to be edited by the user at the time of running the application, then don't keep it open. Otherwise, keeping it open makes sure nothing unexpected happens.

Detect when user writes/deletes file in specific folder [duplicate]

This question already has answers here:
Notification when a file changes?
(3 answers)
Closed 8 years ago.
I have inherited application that, among many other things, has to watch if user writes/deletes text file into specific folder.
Currently, the application uses timer and polls after 5 seconds. I find this ineffective, and wish to improve this part of code.
My question is about existence of the .NET function that monitors changes in directory. Is there such function I can use to detect when a file is written/deleted in a specified folder?
Thank you.
Yes, you have the FileSystemWatcher class. It does exactly what you're looking for
Yes there is. I would suggest you take a look at the FileSystemWatcher class:
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher%28v=vs.110%29.aspx
It's quite easy to set up, and it monitors for Win32 events, so is relatively inexpensive to use.

Time based serial key generation for PC Software [duplicate]

This question already has answers here:
How can I create a product key for my C# application?
(14 answers)
Closed 9 years ago.
How can i generate serial key for the C# Desktop Application (Windows application) ?
E.g. Software expires after a month. (Trial version).
If user changes Machine time then hoe could it be possible to validate the software for the specified time ?
There are many ways you can generate serial keys for your application in C#. You will most likely have make some sort of trade off between the simplicity (ie. the length of the key, readability, etc), and the security of a particular system.
I would recommend Software Protector(http://softwareprotector.clizware.net/) and SKGL (https://skgl.codeplex.com/). Software Protector would give you a user interface where you can generate your keys and SKGL API would allow you to validate those inside your own application. If you like, you can also include the source code of SKGL API (currently available in C# and VB.NET). You can set a time limit from 0 to 999, 8 custom features, and machine locking.
Regarding the time changing issue, the only way I see is to look up the local time (for that time zone) online using time.windows.com and check if that is equal to the current time on the pc. Please check this article: https://skgl.codeplex.com/discussions/472444
Please note that I am developing both SKGL API and Software Protector, which means that my answer might have a slight tendency!

How much is the filename size limit [duplicate]

This question already has answers here:
Maximum filename length in NTFS (Windows XP and Windows Vista)?
(15 answers)
Closed 9 years ago.
I want to rename image files uploaded to my website and give them a bit of description. I'm using ASP.NET, C# and of course Windows hosting. the file names will contain Unicode characters. how much is the filename size limit in these conditions?
Individual components of a filename (i.e. each subdirectory along the
path, and the final filename) are limited to 255 characters, and the
total path length is limited to approximately 32,000 characters.
Source
MSDN more reading
However, the issue will be more due to the browser as the full URL is limited to a number of characters and it's different per browser. Some posts here suggest you should limited to 2000 characters.
To read more about browser limits, I suggest you read here but please note for future proofing, the comments I've made here and posts I've cited will become outdated. You need to do your own research at the time of reading this!

C#, Fastest (Best?) Method of Identifying Duplicate Files in an Array of Directories [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to recurse several directories and find duplicate files between the n number of directories.
My knee-jerk idea at this is to have a global hashtable or some other data structure to hold each file I find; then check each subsequent file to determine if it's in the "master" list of files. Obviously, I don't think this would be very efficient and the "there's got to be a better way!" keeps ringing in my brain.
Any advice on a better way to handle this situation would be appreciated.
You could avoid hashing by first comparing file sizes. If you never find files with the same sizes you don't have to hash them. You only hash a file once you find another file with the same size, then you hash them both.
That should be significantly faster than blindly hashing every single file, although it'd be more complicated to implement that two-tiered check.
I'd suggest keeping multiple in-memory indexes of files.
Create one that indexes all files by file length:
Dictionary<int, List<FileInfo>> IndexBySize;
When you're processing new file Fu, it's a quick lookup to find all other files that are the same size.
Create another that indexes all files by modification timestamp:
Dictionary<DateTime, List<FileInfo>> IndexByModification;
Given file Fu, you can find all files modified at the same time.
Repeat for each signficiant file characteristic. You can then use the Intersect() extension method to compare multiple criteria efficiently.
For example:
var matchingFiles
= IndexBySize[fu.Size].Intersect(IndexByModification[fu.Modified]);
This would allow you to avoid the byte-by-byte scan until you need to. Then, for files that have been hashed, create another index:
Dictionary<MD5Hash, List<FileInfo>> IndexByHash;
You might want to calculate multiple hashes at the same time to reduce collisions.
Your approach sounds sane to me. Unless you have very good reason to assume that it will not suffice your performance requirements, I'd simply implement it this way and optimize it later if necessary. Remember that "premature optimization is the root of evil".
the best practice , as John Kugelman said , is first to compare two files with the same size , if they have different sizes , its obvious that they are not duplicates.
if you find two files with same size , for better performance , you can compare the first 500 KB of two files , if the first 500 KB are same , you can compare the rest of the bytes. in this way you dont have to read all bytes of a (for example ) 500 MB file to gain its hash, so you save time and boost performance
For a byte-comparison where you're expecting many duplicates, then you're likely best off with the method you're already looking at.
If you're really concerned about efficiency and know that duplicates will always have the same filename, then you could start by comparing filenames alone and only hash bytes when you find a duplicate name. That way you'd save the time of hashing files that have no duplicate in the tree.

Categories