Performance of Separation of Multipage TIFF

Performance of Separation of Multipage TIFF - c#

I need to separate Multipage TIFF files. The input folder contains 100 TIFF files. The time taken for 100 TIFF files is 1.40 minutes. Each TIFF file contains two pages. Is there any way to improve performance, and is there a way to speed up the process?
static void Main(string[] args)
{
string input = #"D:\testSplit\input\";
string output = #"D:\testSplit\output\out";
DirectoryInfo dir1 = new DirectoryInfo(input);
FileInfo[] DispatchFiles = dir1.GetFiles();
if (DispatchFiles.Length > 0)
{
foreach (FileInfo aFile in DispatchFiles)
{
string files = input + aFile.Name;
if (File.Exists(files))
{
Split(files, output);
}
}
}
}
public static List<string> Split(string InputFilePath, string OutputPath)
{
List<string> splitFileNames = new List<string>();
try
{
//Get the frame dimension list from the image of the file and
Image tiffImage = Image.FromFile(InputFilePath);
//get the globally unique identifier (GUID)
Guid objGuid = tiffImage.FrameDimensionsList[0];
//create the frame dimension
FrameDimension dimension = new FrameDimension(objGuid);
//Gets the total number of frames in the .tiff file
int noOfPages = tiffImage.GetFrameCount(dimension);
if (noOfPages == 1)
{
splitFileNames.Add(InputFilePath);
tiffImage.Dispose();
return splitFileNames;
}
string filName = Path.GetFileNameWithoutExtension(InputFilePath);
string fileExtention = Path.GetExtension(InputFilePath);
ImageCodecInfo encodeInfo = null;
ImageCodecInfo[] imageEncoders = ImageCodecInfo.GetImageEncoders();
for (int j = 0; j < imageEncoders.Length; j++)
{
if (imageEncoders[j].MimeType == "image/tiff")
{
encodeInfo = imageEncoders[j];
break;
}
// Save the tiff file in the output directory.
if (!Directory.Exists(OutputPath))
Directory.CreateDirectory(OutputPath);
foreach (Guid guid in tiffImage.FrameDimensionsList)
{
for (int index = 0; index < noOfPages; index++)
{
FrameDimension currentFrame = new FrameDimension(guid);
tiffImage.SelectActiveFrame(currentFrame, index);
string outPath = string.Concat(OutputPath, filName, "-P", index + 1, fileExtention);
tiffImage.Save(outPath, encodeInfo, null);
splitFileNames.Add(outPath);
}
}
tiffImage.Dispose();
return splitFileNames;
}
catch (Exception ex)
{
return splitFileNames;
}
}

A parallel foreach loop may get you where you need to be.
FileInfo[] DispatchFiles = dir1.GetFiles();
Parallel.ForEach(DispatchFiles, aFile =>
{
string files = input + aFile.Name;
if (File.Exists(files))
{
Split(files, output);
}
}
With parallel processing you may run into some shared resource issues, but if each processing is fully independent of each other, it should improve your performance.
If you need to limit the number of threads that the loop creates, check out the MaxDegreeOfParallelism property.

Related

Within a File StreamWriter scope how to get the Total lines counts in a file

How to do get the Total lines of the file when we are within in a StreamWriter scope.
Based on the total number of lines count I am writing some more lines at the end of the file.
I have tried the below code : But it throws an error message
The process cannot access the file ‘C:\a.txt ' because it is being used by another process.
var lineCount = File.ReadLines(outputFilePath).Count()
This is my Code
private string CreateAndPushFile(string fileName)
{
string outputFilePath = string.Format(#"{0}\{1}", “C:\\a.txt”, fileName);
using (StreamWriter output = new StreamWriter(outputFilePath))
{
// Creates the file header
string fileHeader =”kjhakljdhkjhkj”;
output.Write(fileHeader);
string batchControl = “1515151”; // This value comes from database
output.Write(batchControl);
// Here there is some other logic which will writes many lines to the File using foreach loop
string fileControl = “3123123”; // This value comes from database
output.WriteLine(fileControl);
// After this I need write a few more lines only if total number of lines in a File Total records multiple of 10
var lineCount = File.ReadLines(outputFilePath).Count(); // I am getting error here
int remainder;
Math.DivRem(lineCount, 10, out remainder);
for (int i = 1; i <= 10 - remainder; i++)
{
output.WriteLine(“9999999999999”);
}
}
}

private static void CreateAndPushFile(string outputFilePath) {
using (var output = new StreamWriter(outputFilePath)) {
// Creates the file header
var fileHeader = "kjhakljdhkjhkj";
output.Write(fileHeader);
var batchControl = "1515151"; // This value comes from database
output.Write(batchControl);
// Here there is some other logic which will writes many lines to the File using foreach loop
var fileControl = "3123123"; // This value comes from database
output.WriteLine(fileControl);
// After this I need write a few more lines only if total number of lines in a File Total records multiple of 10
}
var lineCount = TotalLines(outputFilePath); // I am getting error here
var remainder = lineCount % 10;
using (var output2 = new StreamWriter(outputFilePath, true)) { // second parameter is for append
for (var i = 0; i < 10 - remainder; i++) {
output2.WriteLine("9999999999999");
}
}
}
private static int TotalLines(string filePath) {
using (var reader = new StreamReader(filePath)) {
char[] buffer = new char[1024];
var lineCount = 0;
while (!reader.EndOfStream) {
var charsRead = reader.Read(buffer, 0, 1024);
lineCount += buffer.Take(charsRead).Count(character => character == '\n');
}
return lineCount;
}
}

C# Merging Two or more Text Files side by side

using (StreamWriter writer = File.CreateText(FinishedFile))
{
int lineNum = 0;
while (lineNum < FilesLineCount.Min())
{
for (int i = 0; i <= FilesToMerge.Count() - 1; i++)
{
if (i != FilesToMerge.Count() - 1)
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + ",");
}
else
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + "\n");
}
}
lineNum++;
}
}
The current way i am doing this is just too slow. I am merging files that are each 50k+ lines long with various amounts of data.
for ex:
File 1
1
2
3
4
File 2
4
3
2
1
i need this to merge into being a third fileFile 3
1,4
2,3
3,2
4,1P.S. The user can pick as many files as they want from any locations.
Thanks for the help.

You approach is slow because of the Skip and Take in the loops.
You could use a dictionary to collect all line-index' lines:
string[] allFileLocationsToMerge = { "filepath1", "filepath2", "..." };
var mergedLists = new Dictionary<int, List<string>>();
foreach (string file in allFileLocationsToMerge)
{
string[] allLines = File.ReadAllLines(file);
for (int lineIndex = 0; lineIndex < allLines.Length; lineIndex++)
{
bool indexKnown = mergedLists.TryGetValue(lineIndex, out List<string> allLinesAtIndex);
if (!indexKnown)
allLinesAtIndex = new List<string>();
allLinesAtIndex.Add(allLines[lineIndex]);
mergedLists[lineIndex] = allLinesAtIndex;
}
}
IEnumerable<string> mergeLines = mergedLists.Values.Select(list => string.Join(",", list));
File.WriteAllLines("targetPath", mergeLines);

Here's another approach - this implementation only stores in memory one set of lines from each file simultaneously, thus reducing memory pressure significantly (if that is an issue).
public static void MergeFiles(string output, params string[] inputs)
{
var files = inputs.Select(File.ReadLines).Select(iter => iter.GetEnumerator()).ToArray();
StringBuilder line = new StringBuilder();
bool any;
using (var outFile = File.CreateText(output))
{
do
{
line.Clear();
any = false;
foreach (var iter in files)
{
if (!iter.MoveNext())
continue;
if (line.Length != 0)
line.Append(", ");
line.Append(iter.Current);
any = true;
}
if (any)
outFile.WriteLine(line.ToString());
}
while (any);
}
foreach (var iter in files)
{
iter.Dispose();
}
}
This also handles files of different lengths.

Read all pages from PDF c#

I want to read all pages of my PDF and save them as a images, so far what I am doing is only getting me the page defined 0 = 1 first etc .. Is there a chance that I can define a range ?
static void Main(string[] args)
{
try
{
string path = #"C:\Users\test\Desktop\pdfToWord\";
foreach (string file in Directory.EnumerateFiles(path, "*.pdf")) {
using (var document = PdfiumViewer.PdfDocument.Load(file))
{
int i = 1;
var image = document.Render(0,300,300, true);
image.Save(#"C:\Users\test\Desktop\pdfToWord\output.png", ImageFormat.Png);
}
}
}
catch (Exception ex)
{
// handle exception here;
}

if your document-object gives you the pagecount,
you could replace
int i = 1;
var image = document.Render(0,300,300, true);
image.Save(#"C:\Users\test\Desktop\pdfToWord\output.png", ImageFormat.Png);
by
for(int index = 0; index < document.PageCount; index++)
{
var image = document.Render(index,300,300, true);
image.Save(#"C:\Users\test\Desktop\pdfToWord\output"+index.ToString("000")+".png", ImageFormat.Png);
}

C# Sequential file output

I have a C# winform application that is outputting to excel files.
Let's say the name format of the file name is: Output1.xlsl
I would like to have the output saved to another sequential file on each button click/execution.
So next it would be Output2.xlsl, Output3.xlsl... etc.
How to check that, I know of checking if the file exists, but how to check for the numbering?
FileInfo newExcelFile = new FileInfo(#"Output1.xlsx");
if (newExcelFile.Exists)
{
...
}

You could use this loop and File.Exists with Path.Combine:
string directory = #"C:\SomeDirectory";
string fileName = #"Output{0}.xlsx";
int num = 1;
while (File.Exists(Path.Combine(directory, string.Format(fileName, num))))
num++;
var newExcelFile = new FileInfo(Path.Combine(directory, string.Format(fileName, num)));
In general the static File methods are more efficient than always creating a FileInfo instance.

We use a method similar to this to achieve this:
/// <param name="strNewPath">ex: c:\</param>
/// <param name="strFileName">ex: Output.xlsx</param>
/// <returns>Next available filename, ex: Output3.xlsx</returns>
public static string GetValidFileName(string strNewPath, string strFileName)
{
var strFileNameNoExt = Path.GetFileNameWithoutExtension(strFileName);
var strExtension = Path.GetExtension(strFileName);
var intCount = 1;
while (File.Exists(Path.Combine(strNewPath, strFileNameNoExt + intCount + strExtension)))
intCount++;
return Path.Combine(strNewPath, strFileNameNoExt + intCount + strExtension);
}

Just wrap it in a while loop
int num = 1;
FileInfo newExcelFile = new FileInfo("Output1.xlsx");
while(newExcelFile.Exists)
{
newExcelFile = new FileInfo("Output" + num + ".xlsx");
num++;
}

I would find the newest file in the folder and use its number as a basis to start from. If there are no other programs to write there, this should be sufficient.
DirectoryInfo di = new DirectoryInfo("Some folder");
FileInfo fi = di.GetFiles().OrderByDescending(s => s.CreationTime).First();
string fileName = fi.Name;
//....

You can do a simple loop:
FileInfo newExcelFile = null;
for (int i = 0; i < int.MaxValue; i++)
{
newExcelFile = new FileInfo(string.Format(#"Output{0}.xlsx", i));
if (!newExcelFile.Exists)
{
break;
}
newExcelFile = null;
}
if (newExcelFile == null)
{
// do you want to try 2147483647
// or show an error message
// or throw an exception?
}
else
{
// save your file
}

It may not be most efficient one but I can suggest following solution
split the file name with "."
Remove substring "Output" from it
Now sort to get the maximum number.

It depends on the logic. What should happen if you had Output1.xlsx Output2.xlsx Output3.xlsx and removed Output2.xlsx, should the new file be Output2.xlsx or Output4.xlsx?
If you want to have always the highest number for the new files, you can use similar code
int lastNum = 0;
string[] files = Directory.GetFiles("c:\\myDir", "Output*.xlsx");
if (files.Length > 0)
{
Array.Sort(files);
lastNum = Convert.ToInt32(Regex.Match(files[files.Length - 1], "Output[\\d](*).xlsx").Result("$1"));
lastNum++;
}
FileInfo newExcelFile = new FileInfo("Output" + lastNum + ".xlsx");
Of course you can loop, but it's not a good idea if you have thousands of files. For small amount of files it could be fine
int i = 0;
for (; i < Int32.MaxValue; i++)
{
if (File.Exists("Output" + i + ".xlsx"))
break;
}

copy half the files to one place and other half to other c#

if i have 4 files. and i want to move half of them to disc 1 and half of them to disc 2.
if im using the:
Directory.Move(source, destination)
im guessing i can change to source by doing a foreach loop + an array or list,
but how could i change the destination after half the source files are transfered and then transfer the other half to the new destination?

string[] files = ... // create a list of files to be moved
for (int i = 0; i < files.Length; i++)
{
var sourceFile = files[i];
var destFile = string.Empty;
if (i < files.Length / 2)
{
destFile = Path.Combine(#"c:\path1", Path.GetFileName(sourceFile));
}
else
{
destFile = Path.Combine(#"d:\path2", Path.GetFileName(sourceFile));
}
File.Move(sourceFile, destFile);
}
UPDATE:
Here's a lazy approach which doesn't require you to load all the file names in memory at once which could be used for example in conjunction with the Directory.EnumerateFiles method:
IEnumerable<string> files = Directory.EnumerateFiles(#"x:\sourcefilespath");
int i = 0;
foreach (var file in files)
{
var destFile = Path.Combine(#"c:\path1", Path.GetFileName(file));
if ((i++) % 2 != 0)
{
// alternate the destination
destFile = Path.Combine(#"d:\path2", Path.GetFileName(file));
}
File.Move(sourceFile, destFile);
}

The simple answer is that you would move the individual files instead.
Use a Directory.GetFiles(source) to get a list of files in the folder, get a .Count() of that and then loop through each file and move it.
public void MoveFilesToSplitDestination(string source, string destination1, string destination2)
{
var fileList = Directory.GetFiles(source);
int fileCount = fileList.Count();
for(int i = 0; i < fileCount; i++)
{
string moveToDestinationPath = (i < fileCount/2) ? destination1 : destination2;
fileList[i].MoveTo(moveToDestination);
}
}

int rubikon= files.Count() / 2;
foreach (var file in files.Take(rubikon))
file.Move(/* first destination */));
foreach (var file in files.Skip(rubikon))
file.Move(/* second destination */));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Performance of Separation of Multipage TIFF - c#

Related

Within a File StreamWriter scope how to get the Total lines counts in a file

C# Merging Two or more Text Files side by side

Read all pages from PDF c#

C# Sequential file output

copy half the files to one place and other half to other c#

Categories

Resources