File duplicate search on save - c#

I'm using a C# ASP.NET application i have folder name xyz and in this folder some file are stored like jpg, doc etc with their specific name.
But when i add a file which already exists in this folder, but was saved under different name.
I want to ask how it is possible to find such file that have different name but actually is the same?

Your question is very difficult to understand, but I think you're asking how to identify duplicate files: different files that have the same contents.
One way to do that is to hash the contents of each file (using a hash function such as SHA-1) and store the results in a Dictionary, using the hash as the key, and a List of filenames as the value. If two (or more) files have the same contents, they'll have the same hash value, so they'll all be filed under the same key in the dictionary. After you've hashed all the files and put the results into the dictionary, you can go through its values and check whether any of the lists contain more than one item.

void SaveFile(string fileName)
{
string folderPath = Server.MapPath("~/xyz");
DirectoryInfo dirInfo = new DirectoryInfo(folderPath);
FileInfo fileInfo = new FileInfo(fileName);
// comparison algorithm based on size and creation date
bool exists = (from fi in dirInfo.EnumerateFiles(folderPath)
where fi.Size == fileInfo.Size &&
fi.CreationTimeUtc == fileInfo.CreationTimeUtc
select fi).Any();
// comparison algorithm based on hash
string fileHash = ComputeHash(fileInfo.FullPath);
bool exists = (from fi in dirInfo.EnumerateFiles(folderPath)
where String.Equals(
ComputeHash(fi.FullPath),
fileHash,
StringComparison.Ordinal)
select fi).Any();
}
A sample how to get the md5 hash of a file, see more.
string ComputeHash(string fileName)
{
byte[] bytes;
using (Stream stream = new FileStream(fileName, FileMode.Open))
{
MD5 md5 = new MD5CryptoServiceProvider();
bytes = md5.ComputeHash(stream);
}
StringBuilder sb = new StringBuilder(retVal.Length);
for (int i = 0; i < bytes.Length; i++)
{
sb.Append(bytes[i].ToString("x2"));
}
return sb.ToString();
}

If i understand the question correctly, you want to know that 2 files are the same file, even though they have a different filename.
I suppose you can read each file byte by byte and compare:
public static bool AreEqual(string f1, string f2)
{
var fi1 = new FileInfo(f1);
var fi2 = new FileInfo(f2);
// first check that they are the same size, obviously a pre-req for them being equal
if (f1.Length != f2.Length)
{
return false;
}
var sr1 = new FileStream(f1, FileMode.Open);
var sr2 = new FileStream(f2, FileMode.Open);
for (int i = 0; i < f1.Length; i++)
{
byte[] left = new byte[1];
byte[] right = new byte[1];
sr1.Read(left, i, 1);
sr2.Read(right, i, 1);
if (left[0] != right[0])
{
return false;
}
}
return true;
}

Related

Excel compare by byte array

I would like to achieve excel sheets comparison by comparing excels sheets transferred into byte arrays
Actually my code looks like:
public static Document FileToByteArray(string fileName)
{
System.IO.FileStream fs = new System.IO.FileStream(fileName, System.IO.FileMode.Open, System.IO.FileAccess.Read);
System.IO.BinaryReader binaryReader = new System.IO.BinaryReader(fs);
long byteLength = new System.IO.FileInfo(fileName).Length;
byte[] fileContent = binaryReader.ReadBytes((int)byteLength);
fs.Close();
fs.Dispose();
binaryReader.Close();
Document Document = new Document
{
DocContent = fileContent
};
return Document;
}
public class Document
{
public byte[] DocContent { get; set; }
}
And finally main code:
private static void CompareImportedExportedExcels(string ingredientName, string ingredientsExportFile, AuthorizedLayoutPage authorizedBackofficePage, IngredientsPage ingredientsPage)
{
authorizedBackofficePage.LeftMenuComponent.ChooseLeftSectionOption<IngredientsPage>();
ingredientsPage.FiltersComponent.UseStringFilter(FiltersOptions.IngredientName, ingredientName);
ingredientsPage.ExportIngredientsElement.Click();
var downloadResult = DownloadHelper.WaitUntilDownloadedCompare(ingredientsExportFile);
string ingredientExportExcelFile = DownloadHelper.SeleniumDownloadPath + ingredientsExportFile;
var exelToByteArray1 = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, #"..\..\..") + #"\TestData\" + "ImportFiles" + #"\IngredientsImport.xlsx");
var excelArray1 = ExcelsExtensions.FileToByteArray(exelToByteArray1);
var excelArray2 = ExcelsExtensions.FileToByteArray(ingredientExportExcelFile);
if (excelArray1.DocContent.Length == excelArray2.DocContent.Length)
{
Console.WriteLine("Excels are equal");
DownloadHelper.CheckFileDownloaded(ingredientsExportFile);
}
else
{
Console.WriteLine("Excels are not equal");
DownloadHelper.CheckFileDownloaded(ingredientsExportFile);
Assert.Fail("Seems that imported and exported excels were not the same! Check it!");
}
}
What's the current status:
Above code works correctly speaking about getting .Lenght and compare it between two excels. Problem appears with different comparison where firstly the exported excel is placed inside .ZIP file. I need to unpack it and then compare. Although excel sheets are the same .Lenght value is different and it fails.
var downloadResult = DownloadHelper.WaitUntilDownloadedCompare(productsExportFile);
string stockProductZIPFile = DownloadHelper.SeleniumDownloadPath + productsExportFile;
string stockProductUnzippedFilePath = DownloadHelper.SeleniumDownloadPath + productsExportFile;
var pathToUnzip = DownloadHelper.SeleniumDownloadPath + productsExportFolderFile;
ZipFile zip = ZipFile.Read(stockProductZIPFile);
zip.ExtractAll(pathToUnzip);
string stockProductExportedExcel = DownloadHelper.SeleniumDownloadPath + "\\ProductsExport" + #"\Stock Products.xlsx";
var exelToByteArray1 = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, #"..\..\..") + #"\TestData\" + "ImportFiles" + #"\StockProduct.xlsx");
var excelArray1 = ExcelsExtensions.FileToByteArray(exelToByteArray1);
var excelArray2 = ExcelsExtensions.FileToByteArray(stockProductExportedExcel);
if (excelArray1.DocContent.Length == excelArray2.DocContent.Length)
{
Console.WriteLine("Excels are equal");
DownloadHelper.CheckFileDownloaded(stockProductUnzippedFilePath);
DownloadHelper.CheckFileDownloaded(pathToUnzip);
}
else
{
Console.WriteLine("Excels are not equal");
DownloadHelper.CheckFileDownloaded(stockProductUnzippedFilePath);
DownloadHelper.CheckFileDownloaded(pathToUnzip);
Assert.Fail("Seems that imported and exported excels were not the same! Check it!");
}
Ideas to solve
First of all I'm not sure if comparing those two by .Lenght is good idea. For one case it works but for the other it's not. I'm not sure if it is connected with packing sheet to .zip format and then unpacking it? Actually in second (broken) scenario products size actually differs. Oracle product has 4 KB and exported one has 10 KB (even thought their data inside is the same).

FileInfo remove file from list

I have a method in C# which gets files in a directory this way:
FileInfo[] fileInfo = new DirectoryInfo(mypath).GetFiles();
Some of the files in the directory are not the ones we need to process (the only way to know is by its content, not the file extension) so we would like to remove them from the FileInfo list (not from disk).
I was searching for a simple way to exclude a file in the FileInfo array but there seems not to be a way.
Here's the whole code which checks the files we only need in the directory the user selects:
int number_of_files = fileInfo.Length;
for (int i = 0; i < number_of_files ; ++i)
{
string file= fileInfo[i].FullName;
BinaryReader br = new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read), Encoding.ASCII);
byte[] preamble = new byte[132];
br.Read(preamble, 0, 132);
if (preamble[128] != 'D' || preamble[129] != 'I' || preamble[130] != 'C' || preamble[131] != 'M')
{
if (preamble[0] + preamble[1] != 0008)
{
return; //Rather than return, remove the file in question from the list....
}
}
br.Dispose();
}
Any ideas how can I do this?
Instead of removing the file from the FileInfo[] array, consider just creating a separate list that collects all files that you do want to keep:
FileInfo[] files = new DirectoryInfo(mypath).GetFiles();
List<FileInfo> filteredFiles = new List<FileInfo>();
foreach (FileInfo file in fileInfos)
{
string file= fileInfo[i].FullName;
using (var stream = new FileStream(file, FileMode.Open, FileAccess.Read))
using (var br = new BinaryReader(stream, Encoding.ASCII))
{
byte[] preamble = new byte[132];
br.Read(preamble, 0, 132);
if (preamble[128] != 'D' || preamble[129] != 'I' || preamble[130] != 'C' || preamble[131] != 'M')
{
if (preamble[0] + preamble[1] != 0008)
{
// skip this file
continue;
}
// keep the file
filteredFiles.Add(file);
// do something else with the file
}
}
}
You should think about whether reading the files just to filter them is really worth the effor though. If you later end up processing the filtered files too, you should really consider doing that at the same time, so you don’t have to open the file twice (once to figure out that you want to keep it, and once to actually process it). That way, you could also get rid of the filteredFiles list since you can just skip the files you are not interested in and process the other ones.

What should the statements be if a character has been entered in a textbox in C#?

Question 1:
As the code states, if spaces are entered in the textbox then the button remains disabled, but if a character or string of characters has been entered when what should be wrote to enable the button? I think there should be an if statement but I don't know the statement.
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
UTF8Encoding utf81 = new UTF8Encoding();
textBox1.Text = BitConverter.ToString(md5.ComputeHash(utf81.GetBytes(textBox30.Text)))
SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider();
UTF8Encoding utf82 = new UTF8Encoding();
textBox2.Text = BitConverter.ToString(sha1.ComputeHash(utf82.GetBytes(textBox30.Text)))
if (string.IsNullOrWhiteSpace(textBox30.Text))
{
btnHash3.Enabled = false;
}
else
{
btnHash3.Enabled = true;
}
Question 2
Also on a slightly different note, how do I enable a button once two files are read in from two filestreams and displayed inside two labels?
{
System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create();
System.Security.Cryptography.SHA1 sha1 = System.Security.Cryptography.SHA1.Create();
FileStream file1 = new FileStream(lblBrowse1.Text, FileMode.Open, FileAccess.Read);
FileStream file2 = new FileStream(lblBrowse2.Text, FileMode.Open, FileAccess.Read);
byte[] hash1 = md5.ComputeHash(file1);
byte[] hash2 = md5.ComputeHash(file2);
file1.Seek(0, SeekOrigin.Begin);
file2.Seek(0, SeekOrigin.Begin);
byte[] hash3 = sha1.ComputeHash(file1);
byte[] hash4 = sha1.ComputeHash(file2);
file1.Seek(0, SeekOrigin.Begin);
file2.Seek(0, SeekOrigin.Begin);
file1.Close();
file2.Close();
textBox1.Text = BitConverter.ToString(hash1).Replace("-", "");
textBox2.Text = BitConverter.ToString(hash2).Replace("-", "");
textBox6.Text = BitConverter.ToString(hash3).Replace("-", "");
textBox7.Text = BitConverter.ToString(hash4).Replace("-", "")
if (textBox1.Text == textBox2.Text
&& textBox6.Text == textBox7.Text)
{
MessageBox.Show("These two files are identical.");
}
else
{
MessageBox.Show("These two files are different.");
}
}
Any help would be much appreciated.
To answer your first question, use .Contains to check for spaces:
bthHash3.Enabled = !myString.Contains(" ");
Since you are just setting a boolean, I collapsed the if into one line. To answer your second, it slightly depends on if you are in a multi-threaded environment. Of course, you could always write the following:
ReadMyFile(file1);
ReadMyFile(file2);
myButton.Enabled = true;
Which works since ReadMyFile should block while reading, so the enabled line won't be hit until all the reads are complete. If you are threaded, then do this:
int completeCount = 0;
void ThreadedRead()
{
//Read file synchronously
completedCount++;
CheckReadCompletion();
}
void CheckReadCompletion()
{
if (completedCount == 2)
myButton.Enabled = true;
}
You would start "ThreadedRead" for each file you need to read. Please let me know if I can clarify anything!
You wouldn't need to do this in the above scenario (because you are just setting the enabled flag) but with complex enough behavior, make sure to put a lock around completedCount and the call to CheckReadCompletion. You could modify it to this:
int completeCount = 0;
object completionLock = new object();
void ThreadedRead()
{
//Read file synchronously
lock (completionLock)
{
completedCount++;
CheckReadCompletion();
}
}
void CheckReadCompletion()
{
if (completedCount == 2)
myButton.Enabled = true;
}
Actually, you don't need an if statement, you can just use the result of a condition as the value for the Enabled property. Trim the string and check if the length is greater than zero to find out if it has any non-space characters:
btnHash3.Enabled = textBox30.Text.Trim().Length > 0;
To wait for two results before enabling a button, first create a counter, for example:
int fileCounter = 0;
After the code that adds the file content to a label, increase the counter and set the button status:
fileCounter++;
someButton.Enabled = fileCounter == 2;

write and read from byte stream

I have a page where the User can either upload their own csv or enter values into a listbox which then creates a csv (in the background). Regardless of which way the csv gets created I need to upload that csv to our server via a byte stream.
My problem is that when Im creating the csv I shouldn't have to create a temporary file, I should be able to write to the stream then read it back for uploading. How can I remove the need for the temporary file?
current code which works (but uses temp file):
try {
string filename = DateTime.Now.ToString("MMddyyHmssf");
filename = filename + ".csv";
string directory = ConfigurationManager.AppSettings["TempDirectory"].ToString();
path = Path.Combine(directory, filename);
using (StreamWriter sw = File.CreateText(path)) {
foreach (ListItem item in this.lstAddEmailAddress.Items) {
sw.WriteLine(" , ," + item.ToString());
}
}
} catch (Exception ex) {
string error = "Cannot create temp csv file used for importing users by email address. Filepath: " + path + ". FileException: " + ex.ToString();
this.writeToLogs(error, 1338);
}
}
// put here for testing the byte array being sent vs ready byte[] byteArray = System.IO.File.ReadAllBytes(path);
myCsvFileStream = File.OpenRead(path);
nFileLen = (int)myCsvFileStream.Length;
I have tried
Stream myCsvFileStream;
using (StreamWriter sw = new StreamWriter(myCsvFileStream)) {
foreach (ListItem item in this.lstAddEmailAddress.Items) {
sw.WriteLine(" , ," + item.ToString());
}
}
However since myCsvFileStream is not initialized (because stream is a static class) it is always null.
Here is what I do with the data (byte stream) after creating the csv.
byte[] file = new byte[nFileLen];
myCsvFileStream.Read(file, 0, nFileLen);
bool response = this.repositoryService.SaveUsers(this.SelectedAccount.Id, file, this.authenticatedUser.SessionToken.SessionId);
myCsvFileStream.Close();
In the end I used StringBuilder to create my csv file contents. Then got a byte array of its contents and used that to populate my shared stream (I say shared because when the user enters their own CSV file it is a HttpPostedFile but when sending it to our server via the rest call (respositoryservices.saveusers) it uses the same byte stream that it would via this method)
StringBuilder csvFileString = new StringBuilder();
sharedStreamForBatchImport = new MemoryStream();
foreach (ListItem item in this.lstAddEmailAddress.Items) {
csvFileString.Append(",," + item.ToString() + "\\r\\n");
}
//get byte array of the string
byteArrayToBeSent = Encoding.ASCII.GetBytes(csvFileString.ToString());
//set length for read
byteArraySize = (int)csvFileString.Length;
//read bytes into the sharedStreamForBatchImport (byte array)
sharedStreamForBatchImport.Read(byteArrayToBeSent, 0, byteArraySize);
You want to create a new MemoryStream()
Here is a function I use to write CSV files
public static bool WriteCsvFile(string path, StringBuilder stringToWrite)
{
try
{
using (StreamWriter sw = new StreamWriter(path, false)) //false in ordre to overwrite the file if it already exists
{
sw.Write(stringToWrite);
return true;
}
}
catch (Exception)
{
return false;
}
}
stringToWrite is just a string that has been created that way :
public static bool WriteCsvFile(string path, DataTable myData)
{
if (myData == null)
return false;
//Information about the table we read
int nbRows = myData.Rows.Count;
int nbCol = myData.Columns.Count;
StringBuilder stringToWrite = new StringBuilder();
//We get the headers of the table
stringToWrite.Append(myData.Columns[0].ToString());
for (int i = 1; i < nbCol; ++i)
{
stringToWrite.Append(",");
stringToWrite.Append(myData.Columns[i].ToString());
}
stringToWrite.AppendLine();
//We read the rest of the table
for (int i = 0; i < nbRows; ++i)
{
stringToWrite.Append(myData.Rows[i][0].ToString());
for (int j = 1; j < nbCol; ++j)
{
stringToWrite.Append(",");
stringToWrite.Append(myData.Rows[i][j].ToString());
}
stringToWrite.AppendLine();
}
return WriteCsvFile(path, stringToWrite);
}

C# Saving an MP4 Resource to a file

I've tried a few different ways but it won't open when it's saved. How can I accomplish this?
Basically I want to be able to save an MP4 file that's currently a resource file to a temp location that I can access as a path.
Here's something I've tried:
public static void WriteResourceToFile(string resourceName, string fileName)
{
using (Stream s = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName))
{
if (s != null)
{
byte[] buffer = new byte[s.Length];
char[] sb = new char[s.Length];
s.Read(buffer, 0, (int)(s.Length));
/* convert the byte into ASCII text */
for (int i = 0; i <= buffer.Length - 1; i++)
{
sb[i] = (char)buffer[i];
}
using (StreamWriter sw = new StreamWriter(fileName))
{
sw.Write(sb);
sw.Flush();
}
}
}}
You're overcomplicating it.
Try something like this (note, not compiled or tested, and Stream.CopyTo() only exists in .NET 4.0 and later).
using (Stream s = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName)))
using (FileStream fs = File.Open("c:\myfile.mp4", FileMode.Create))
{
s.CopyTo(fs);
}
Job done.
If you don't have .NET 4.0 available, you'll need to implement one yourself, like one of these: How do I copy the contents of one stream to another?
To get a list of all of the resource names in the current assembly, do something like this:
Assembly a = Assembly.GetExecutingAssembly();
foreach (string s in a.GetManifestResourceNames())
{
Console.WriteLine(s);
}
Console.ReadKey();
Take what turns up on the console and pass it into GetManifestResourceStream() in the first snippet I posted.
http://msdn.microsoft.com/en-us/library/system.reflection.assembly.getmanifestresourcenames.aspx
Why are you writing an MP4 as a string? You should write out bytes without modification. Your conversion to chars is modifying the data. Use The FileStream call and call the Write method.
you could try something like this:
I pasted the wrong code in.... sorry, i was in a hurry
[HttpPost]
public ActionResult Create(VideoSermons video, HttpPostedFileBase videoFile)
{
var videoDb = new VideoSermonDb();
try
{
video.Path = Path.GetFileName(videoFile.FileName);
video.UserId = HttpContext.User.Identity.Name;
videoDb.Create(video);
if (videoFile != null && videoFile.ContentLength > 0)
{
var videoName = Path.GetFileName(videoFile.FileName);
var videoPath = Path.Combine(Server.MapPath("~/Videos/"),
System.IO.Path.GetFileName(videoFile.FileName));
videoFile.SaveAs(videoPath);
}
return RedirectToAction("Index");
}
catch
{
return View();
}
}
this actually loads video files to a directory, but it should work for your format as well.
-Thanks,

Categories