Extracting pdf images in a correct order iTextSharp

Extracting pdf images in a correct order iTextSharp - c#

I'm trying to extract images from a PDF File, but I really need to have it at the correct order to get the correct image.
static void Main(string[] args)
{
string filename = "D:\\910723575_marca_coletiva.pdf";
PdfReader pdfReader = new PdfReader(filename);
var imagemList = ExtraiImagens(pdfReader);
// converter byte[] para um bmp
List<Bitmap> bmpSrcList = new List<Bitmap>();
IList<byte[]> imagensProcessadas = new List<byte[]>();
foreach (var imagem in imagemList)
{
System.Drawing.ImageConverter converter = new System.Drawing.ImageConverter();
try
{
Image img = (Image)converter.ConvertFrom(imagem);
ConsoleWriteImage(img);
imagensProcessadas.Add(imagem);
}
catch (Exception)
{
continue;
}
}
System.Console.ReadLine();
}
public static void ConsoleWriteImage(Image img)
{
int sMax = 39;
decimal percent = Math.Min(decimal.Divide(sMax, img.Width), decimal.Divide(sMax, img.Height));
Size resSize = new Size((int)(img.Width * percent), (int)(img.Height * percent));
Func<System.Drawing.Color, int> ToConsoleColor = c =>
{
int index = (c.R > 128 | c.G > 128 | c.B > 128) ? 8 : 0;
index |= (c.R > 64) ? 4 : 0;
index |= (c.G > 64) ? 2 : 0;
index |= (c.B > 64) ? 1 : 0;
return index;
};
Bitmap bmpMin = new Bitmap(img, resSize.Width, resSize.Height);
Bitmap bmpMax = new Bitmap(img, resSize.Width * 2, resSize.Height * 2);
for (int i = 0; i < resSize.Height; i++)
{
for (int j = 0; j < resSize.Width; j++)
{
Console.ForegroundColor = (ConsoleColor)ToConsoleColor(bmpMin.GetPixel(j, i));
Console.Write("██");
}
Console.BackgroundColor = ConsoleColor.Black;
Console.Write(" ");
for (int j = 0; j < resSize.Width; j++)
{
Console.ForegroundColor = (ConsoleColor)ToConsoleColor(bmpMax.GetPixel(j * 2, i * 2));
Console.BackgroundColor = (ConsoleColor)ToConsoleColor(bmpMax.GetPixel(j * 2, i * 2 + 1));
Console.Write("▀");
Console.ForegroundColor = (ConsoleColor)ToConsoleColor(bmpMax.GetPixel(j * 2 + 1, i * 2));
Console.BackgroundColor = (ConsoleColor)ToConsoleColor(bmpMax.GetPixel(j * 2 + 1, i * 2 + 1));
Console.Write("▀");
}
System.Console.WriteLine();
}
}
public static IList<byte[]> ExtraiImagens(PdfReader pdfReader)
{
var data = new byte[] { };
IList<byte[]> imagensList = new List<byte[]>();
for (int numPag = 1; numPag <= 3; numPag++)
//for (int numPag = 1; numPag <= pdfReader.NumberOfPages; numPag++)
{
var pg = pdfReader.GetPageN(numPag);
var res = PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES)) as PdfDictionary;
var xobj = PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)) as PdfDictionary;
if (xobj == null) continue;
var keys = xobj.Keys;
if (keys == null) continue;
PdfObject obj = null;
PdfDictionary tg = null;
for (int key = 0; key < keys.Count; key++)
{
obj = xobj.Get(keys.ElementAt(key));
if (!obj.IsIndirect()) continue;
tg = PdfReader.GetPdfObject(obj) as PdfDictionary;
obj = xobj.Get(keys.ElementAt(key));
if (!obj.IsIndirect()) continue;
tg = PdfReader.GetPdfObject(obj) as PdfDictionary;
var type = PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)) as PdfName;
if (!PdfName.IMAGE.Equals(type)) continue;
int XrefIndex = (obj as PRIndirectReference).Number;
var pdfStream = pdfReader.GetPdfObject(XrefIndex) as PRStream;
data = PdfReader.GetStreamBytesRaw(pdfStream);
imagensList.Add(PdfReader.GetStreamBytesRaw(pdfStream));
}
}
return imagensList;
}
}
The method ConsoleWriteImage is only to print the image at the console and I used it to study the behavior of the order that iTextSharp was retrieving it for me , based on my code.
Any help ?

Unfortunately the OP has not explained what the correct order is - this is not self-explanatory because there might be certain aspects of a PDF which are not obvious for a program, merely for a human reader viewing the rendered PDF.
At least, though, it is likely that the OP wants to get his images on a page-by-page basis. This obviously is not what his current code provides: His code scans the whole base of objects inside the PDF for image objects, so he will get image objects, but the order may be completely random; in particular he may even get images contained in the PDF but not used on any of its pages...
To retrieve images on a page-by-page order (and only images actually used), one should use the parser framework, e.g.
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener();
for (int i = 1; i <= reader.NumberOfPages; i++) {
parser.ProcessContent(i, listener);
}
// Process images in the List listener.MyImages
// with names in listener.ImageNames
(Excerpt from the ExtractImages.cs iTextSharp example)
where MyImageRenderListener is defined to collect images:
public class MyImageRenderListener : IRenderListener {
/** the byte array of the extracted images */
private List<byte[]> _myImages;
public List<byte[]> MyImages {
get { return _myImages; }
}
/** the file names of the extracted images */
private List<string> _imageNames;
public List<string> ImageNames {
get { return _imageNames; }
}
public MyImageRenderListener() {
_myImages = new List<byte[]>();
_imageNames = new List<string>();
}
[...]
public void RenderImage(ImageRenderInfo renderInfo) {
try {
PdfImageObject image = renderInfo.GetImage();
if (image == null || image.GetImageBytesType() == PdfImageObject.ImageBytesType.JBIG2)
return;
_imageNames.Add(string.Format("Image{0}.{1}", renderInfo.GetRef().Number, image.GetFileType() ) );
_myImages.Add(image.GetImageAsBytes());
}
catch
{
}
}
[...]
}
(Excerpt from MyImageRenderListener.cs iTextSharp example)
The ImageRenderInfo renderInfo furthermore also contains information on location and orientation of the image on the page in question which might help to deduce the correct order the OP is after.

Related

ITextSharp Quadpoints to Rectangle

How do I calculate or convert my quadpoints to a rectangle?
var quad = dic.Get(PdfName.QUADPOINTS) as PdfArray;
var rect = new iTextSharp.text.Rectangle(<missing conversion>);
var filter = new RegionTextRenderFilter(rect);
Well while waiting for an answer I found it myself. The Quadpoints array is always divisible by 8. A pair of x,y coordinates for each corner of the rectangle (2 * 4 = 8). With this information in mind I created the following extension method:
public static class Extensions
{
public static iTextSharp.text.Rectangle[] ToRectangle(this PdfArray quadricles)
{
var result = new List<iTextSharp.text.Rectangle>();
for (var m = 0; m < quadricles.Size; m += 8)
{
var dimX = new List<float>();
var dimY = new List<float>();
for (var n = 0; n < 8; n += 2)
{
var x = quadricles[m + n] as PdfNumber;
dimX.Add(x.FloatValue);
var y = quadricles[m + n + 1] as PdfNumber;
dimY.Add(y.FloatValue);
}
result.Add(new iTextSharp.text.Rectangle(dimX.Min(), dimY.Min(), dimX.Max(), dimY.Max(), 1));
}
return result.ToArray();
}
}

Marshal.Copy() causes System.StackOverflowException

I encountered a System.StackOverflowException problem when I was trying to Marshal.Copy()
Here is the screen shot of the code where exception happens:
Here is the function:
private static void ImageUpdate(IntPtr returnArray, ref int channels)
{
if (_prevTickCount == 0)
{
_sumTickCount = 0;
}
else
{
_sumTickCount = _sumTickCount * .75 + (Environment.TickCount - _prevTickCount) * .25;
}
//only copy to the buffer when pixel data is not being read
if (_pixelDataReady == false)
{
int width = 0;
int height = 0;
if (false == GetImageDimensions(ref width, ref height))
{
return;
}
_dataLength = width * height;
_colorChannels = channels;
if ((_pixelData == null) || (_pixelData.Length != (_dataLength * _colorChannels)))
{
_pixelData = new short[_dataLength * _colorChannels];
//_pixelDataHistogram = new int[_colorChannels][];
_pixelDataHistogram = new int[MAX_CHANNELS][];
if (_colorChannels == 1)
{
_pixelDataByte = new byte[_dataLength];
}
else
{
_pixelDataByte = new byte[_dataLength * 3];
}
//for (int i = 0; i < _colorChannels; i++)
for (int i = 0; i < MAX_CHANNELS; i++)
{
_pixelDataHistogram[i] = new int[PIXEL_DATA_HISTOGRAM_SIZE];
}
}
//2^n == FULL_RANGE_NORMALIZATION_FACTOR
const int SHIFT_VALUE = 6;
switch (_colorChannels)
{
case 1:
{
Marshal.Copy(returnArray, _pixelData, 0, _dataLength * _colorChannels);
//factor is derived by taking CAMERA_MAX_INTENSITY_VALUE/256
//const double FULL_RANGE_NORMALIZATION_FACTOR = 64.0;
//clear the histogram
for (int i = 0; i < PIXEL_DATA_HISTOGRAM_SIZE; i++)
{
_pixelDataHistogram[0][i] = 0;
}
for (int i = 0; i < _dataLength * _colorChannels; i++)
{
double valHist;
if (_pixelData[i] < 0)
{
valHist = (_pixelData[i] + 32768) >> SHIFT_VALUE;
}
else
{
valHist = (_pixelData[i]) >> SHIFT_VALUE;
}
_pixelDataHistogram[0][(byte)valHist]++;
}
}
break;
default:
{
Marshal.Copy(returnArray, _pixelData, 0, _dataLength * _colorChannels);
}
break;
}
_dataWidth = width;
_dataHeight = height;
_pixelDataReady = true;
ThorLog.Instance.TraceEvent(TraceEventType.Verbose, 1, "ImageUpdate pixeldata updated");
}
else
{
ThorLog.Instance.TraceEvent(TraceEventType.Verbose, 1, "ImageUpdate pixeldata not ready");
}
_prevTickCount = Environment.TickCount;
}
The whole idea is to copy image buffer from native code. This exception occurs only when image size is large 4K X 4K, but I dont have a problem processing a size below that.
I have no idea how I should correct this. Anyone care to educate? Thanks!

I traced it down, eventually, if was the returnArray which is not newed large enough to cause this problem.

D3 plotter and file watcher, interference?

I am using file watcher monitoring the file output, and then dynamically display the results using d3 (dynamic data display). Here is more details:
Data are series of tiff file generated one by one quickly (20-25 miliseconds per file generation);
Once there is a file coming in, event is fired, and the file is processed (some stats calculation);
The results will be sent to a d3 plotter, which will dynamically display the results on a charter.
When I close the d3 charter window, the file watcher catcher pretty much all the events, but here is my problem:
However, when I leave the d3 charter window open, even when there is not any stats calculation, the file watcher drop many events (some big gap occurs).
Here is what we have tried:
Increase the watcher buffer, but, it still drop events as long as the d3 charter window is open.
Using C++ dll to calculate stats, but it looks even more slow.
So I am wondering:
Do d3 plotter and file watcher conflict/interfere with each other?
Can I possibly use them together to to real time plot?
Is there anyway to get around my problem?
Any pointers are appreciated.
Here is the code for d3 plotter, including constructor, plotter initiator, and plotter update:
#region Constructor
public SignalStatsDisplay()
{
InitializeComponent();
// timeDomainPlotter.Legend.Remove();
_initialChildrenCount = timeDomainPlotter.Children.Count;
int count = timeDomainPlotter.Children.Count;
//do not remove the initial children
if (count > _initialChildrenCount)
{
for (int i = count - 1; i >= _initialChildrenCount; i--)
{
timeDomainPlotter.Children.RemoveAt(i);
}
}
_nMaxStatsOneChannel = Enum.GetNames(typeof(Window1.ROISignalList)).Length;
_curveBrush = new Brush[_nMaxStatsOneChannel];
_statsEnable = new int[_nMaxStatsOneChannel];
for (int i = 0; i < _nMaxStatsOneChannel; i++)
{
_curveBrush[i] = new SolidColorBrush((Color)ColorConverter.ConvertFromString(_colorList[i]));
_statsEnable[i] = 0;
}
_nActiveStatsOneChannel = 0;
}
#endregion Constructor
public void InitiateSignalAnalysisPlot()
{
_statsName = Enum.GetNames(typeof(Window1.ROISignalList));
int count = 0;
_statsEnableIndex = new int[_nActiveStatsOneChannel];
for (int i = 0; i < _nMaxStatsOneChannel; i++) // assign color
{
if (_statsEnable[i] == 1)
{
_statsEnableIndex[count] = i;
count++;
}
}
if (_nActiveChannel > 0) // timeDomainPlotter init
{
_dataX = new List<double[]>();
_dataY = new List<double[]>();
double[] dataXOneCh = new double[_signalLength];
double[] dataYOneCh = new double[_signalLength];
dataXOneCh[0] = 0;
dataYOneCh[0] = 0;
for (int i = 0; i < _nActiveChannel; i++)
{
for (int j = 0; j < _nActiveStatsOneChannel; j++)
{
_dataX.Add(dataXOneCh); // data x-y mapping init
_dataY.Add(dataYOneCh);
EnumerableDataSource<double> xOneCh = new EnumerableDataSource<double>(dataXOneCh);
EnumerableDataSource<double> yOneCh = new EnumerableDataSource<double>(dataYOneCh);
xOneCh.SetXMapping(xVal => xVal);
yOneCh.SetXMapping(yVal => yVal);
CompositeDataSource dsOneCh = new CompositeDataSource(xOneCh, yOneCh);
LineAndMarker<MarkerPointsGraph> lam = timeDomainPlotter.AddLineGraph(dsOneCh,
new Pen(_curveBrush[_statsEnableIndex[j]], 2),
new CirclePointMarker { Size = 5, Fill = _curveBrush[_statsEnableIndex[j]] },
new PenDescription(_statsName[_statsEnableIndex[j]]));
}
}
Action FitToView = delegate()
{
timeDomainPlotter.FitToView();
};
this.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal, FitToView);
}
else
{
return;
}
}
public void RedrawSignalAnalysisPlot()
{
int startIndex = _initialChildrenCount;
if ((_nActiveStatsOneChannel > 0) && (_dataX != null) && (_dataY != null))
{
CompositeDataSource[] dsCh = new CompositeDataSource[_nActiveStatsOneChannel];
int m, n;
int index;
for (int i = 0; i < _nActiveChannel; i++)
{
for (int j = 0; j < _nActiveStatsOneChannel; j++)
{
index = i * _nActiveStatsOneChannel + j;
if (_dataX[index].Length == _dataY[index].Length)
{
EnumerableDataSource<double> xOneCh = new EnumerableDataSource<double>(_dataX[index]);
xOneCh.SetXMapping(xVal => xVal);
EnumerableDataSource<double> yOneCh = new EnumerableDataSource<double>(_dataY[index]);
yOneCh.SetYMapping(yVal => yVal);
CompositeDataSource ds = new CompositeDataSource(xOneCh, yOneCh);
Action UpdateData = delegate()
{
m = i * 2;
n = j * 2;
// ((LineGraph)timeDomainPlotter.Children.ElementAt(startIndex + n + m * _nActiveStatsOneChannel)).DataSource = ds;
// ((LineGraph)timeDomainPlotter.Children.ElementAt(startIndex + n + m * _nActiveStatsOneChannel)).LinePen
// = new Pen(new SolidColorBrush(_curveBrush[j]), 1);
((MarkerPointsGraph)timeDomainPlotter.Children.ElementAt(startIndex + n + 1 + m * _nActiveStatsOneChannel)).DataSource = ds;
// ((MarkerPointsGraph)timeDomainPlotter.Children.ElementAt(startIndex + n + 1 + m * _nActiveStatsOneChannel)).Marker
// = new CirclePointMarker { Size = 5, Fill = Brushes.Green };
};
this.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal, UpdateData);
}
}
}
/* Action PlotFitToView = delegate()
{
timeDomainPlotter.FitToView();
};
this.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal, PlotFitToView);*/
}
}
Here is how files are monitored:
Where events (file written) are filed
void tiffWatcher_EventChanged(object sender, WatcherExEventArgs e)
{
string fileName = ((FileSystemEventArgs)(e.Arguments)).FullPath;
string fileExt = StringExtension.GetLast(fileName, 4);
if (!IsFileLocked(fileName))
{
Action EventFinished = delegate()
{
CreateListViewItem(fileName, "Finished", DateTime.Now.ToString("HH:mm:ss.fff"));
};
listView1.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal, EventFinished);
_tiffName.Add( fileName);
_tiffFilledCount++;
}
}
tiff data processing, and data communication to d3 plotter:
void tiffTimer_Tick(object sender, EventArgs e)
{
//throw new NotImplementedException();
byte[] image = new byte[_signalManager.ImgWidth * _signalManager.ImgHeight];
if (_tiffFilledCount - _tiffProcessedCount >= 1)
{
string fileName = _tiffName[_tiffProcessedCount++];
char filePre = fileName[49];
int indexBeigin = fileName.LastIndexOf("_");
int indexEnd = fileName.LastIndexOf(".");
_signalIndex = Convert.ToInt32(fileName.Substring(indexBeigin + 1, indexEnd - indexBeigin - 1)) - 1; // 0 based//
_deltaT = ExtractTiffDeltaT(fileName, "DeltaT=", 1);
_channelIndex = (int)Enum.Parse(typeof(ChannelList), Convert.ToString(filePre));
TIFFIImageIO.LoadTIFF(fileName, ref image);
_signalManager.Image = image;
for (int i = 0; i < _nActiveStatsOneChannel; i++)
{
_signalManager.GetSignal( _signalDisplay.StatsEnableIndex[i], ref _signal);
UpdateSignal(_channelIndex, i, _tiffProcessedCount-1, _deltaT, _signal);
}
// if( _tiffProcessedCount % 5 == 0)
_signalDisplay.SetData(_XList, _YList, true);
}
}
I use a Timer.Tick to process the file every 50-100 miliseconds, still testing.

The way I handled it was to setup a Timer, cyclically handling calculation, display. So far, it works fine. Of course there will be many events are missing, but we handle large amount of points (thousands of them), so it is OK to hundreds to do the plot. Here is the code:
private Timer _tiffTimer;
void Window1_Loaded(object sender, RoutedEventArgs e)
{
//throw new NotImplementedException();
_tiffTimer = new Timer();
_tiffTimer.Interval = 50; // change interval to change performance
_tiffTimer.Tick += new EventHandler(tiffTimer_Tick);
_tiffTimer.Start();
}
void tiffTimer_Tick(object sender, EventArgs e)
{
//do your stuff here
}

Saving each WAV channel as a mono-channel WAV file using Naudio

I'm trying to convert a WAV file(PCM,48kHz, 4-Channel, 16 bit) into mono-channel WAV files.
I tried splittiing the WAV file into 4 byte-arrays like this answer and created a WaveMemoryStream like shown below but does not work.
byte[] chan1ByteArray = new byte[channel1Buffer.Length];
Buffer.BlockCopy(channel1Buffer, 0, chan1ByteArray, 0, chan1ByteArray.Length);
WaveMemoryStream chan1 = new WaveMemoryStream(chan1ByteArray, sampleRate, (ushort)bitsPerSample, 1);
Am I missing something in creating the WAVE headers ? Or is there more to splitting a
WAV into mono channel WAV files ?

The basic idea is that the source wave file contains the samples interleaved. One for the first channel, one for the second, and so on. Here's some untested example code to give you an idea of how to do this.
var reader = new WaveFileReader("fourchannel.wav");
var buffer = new byte[2 * reader.WaveFormat.SampleRate * reader.WaveFormat.Channels];
var writers = new WaveFileWriter[reader.WaveFormat.Channels];
for (int n = 0; n < writers.Length; n++)
{
var format = new WaveFormat(reader.WaveFormat.SampleRate,16,1);
writers[n] = new WaveFileWriter(String.Format("channel{0}.wav",n+1), format);
}
int bytesRead;
while((bytesRead = reader.Read(buffer,0, buffer.Length)) > 0)
{
int offset= 0;
while (offset < bytesRead)
{
for (int n = 0; n < writers.Length; n++)
{
// write one sample
writers[n].Write(buffer,offset,2);
offset += 2;
}
}
}
for (int n = 0; n < writers.Length; n++)
{
writers[n].Dispose();
}
reader.Dispose();

Based on Mark Heath's answer, I struggled with a 32 bit floating WAV containing 32 channels and managed to get it working by simplifying his proposal. I would guess this peace code also works for a four-channel audio WAV file.
var reader = new WaveFileReader("thirtytwochannels.wav");
var writers = new WaveFileWriter[reader.WaveFormat.Channels];
for (int n = 0; n < writers.Length; n++)
{
var format = new WaveFormat(reader.WaveFormat.SampleRate, 16, 1);
writers[n] = new WaveFileWriter(string.Format($"channel{n + 1}.wav"), format);
}
float[] buffer;
while ((buffer = reader.ReadNextSampleFrame())?.Length > 0)
{
for(int i = 0; i < buffer.Length; i++)
{
// write one sample for each channel (i is the channelNumber)
writers[i].WriteSample(buffer[i]);
}
}
for (int n = 0; n < writers.Length; n++)
{
writers[n].Dispose();
}
reader.Dispose();

Here is a method I used, you can set the output mono format, e.g BitsPerSample, SampleRate
using NAudio.Wave;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace DataScraper.TranscriptionCenter
{
public class MP3ToWave
{
/// <summary>
/// Converts an MP3 file to distinct wav files, using NAudio
/// They are saved in the same directory as the MP3 file
/// </summary>
/// <param name="MP3FileIn">The MP3 file</param>
/// <returns>Returns the WAV files</returns>
public static string[] MP3FilesToTranscriptionWaveFiles(string MP3FileIn)
{
FileInfo MP3FileInfo = new FileInfo(MP3FileIn);
if (MP3FileInfo.Exists == false)
throw new Exception("File does not exist? " + MP3FileIn);
Mp3FileReader readerMP3 = null;
WaveStream PCMStream = null;
WaveFileReader readerWAV = null;
List<string> ListFilesOut = null;
WaveFileWriter[] FileWriters = null;
MemoryStream TempStream = null;
WaveFormatConversionStream WaveFormatConversionStream_ = null;
WaveFormat SaveWaveFormatMono = new WaveFormat((16 * 1000),
16,
1);
try
{
readerMP3 = new Mp3FileReader(MP3FileInfo.FullName);
PCMStream = WaveFormatConversionStream.CreatePcmStream(readerMP3);
WaveFormatConversionStream_ = new WaveFormatConversionStream(new WaveFormat(SaveWaveFormatMono.SampleRate,
SaveWaveFormatMono.BitsPerSample,
PCMStream.WaveFormat.Channels),
PCMStream);
//Each filepath, each channel
ListFilesOut = new List<string> (WaveFormatConversionStream_.WaveFormat.Channels);
//Each is a wav file out
for (int index = 0; index < WaveFormatConversionStream_.WaveFormat.Channels; index++)
{
ListFilesOut.Add(MP3FileInfo.Directory.FullName + "\\" + Path.GetFileNameWithoutExtension(MP3FileInfo.Name) + "_" + index.ToString() + ".wav");
}
//Initialize the writers
FileWriters = new WaveFileWriter[WaveFormatConversionStream_.WaveFormat.Channels];
for (int index = 0; index < WaveFormatConversionStream_.WaveFormat.Channels; index++)
{
FileWriters[index] = new WaveFileWriter(ListFilesOut[index], SaveWaveFormatMono);
}
TempStream = new MemoryStream(int.Parse("" + WaveFormatConversionStream_.Length));
WaveFileWriter NewWriter = new WaveFileWriter(TempStream, WaveFormatConversionStream_.WaveFormat);
byte[] BUFFER = new byte[1024];
int ReadLength = WaveFormatConversionStream_.Read(BUFFER, 0, BUFFER.Length);
while (ReadLength != -1 && ReadLength > 0)
{
NewWriter.Write(BUFFER, 0, ReadLength);
ReadLength = WaveFormatConversionStream_.Read(BUFFER, 0, BUFFER.Length);
}
NewWriter.Flush();
TempStream.Position = 0;
readerWAV = new WaveFileReader(TempStream);
float[] buffer = readerWAV.ReadNextSampleFrame();
while(buffer != null && buffer.Length > 0)
{
for(int i = 0; i < buffer.Length; i++)
{
FileWriters[i].WriteSample(buffer[i]);
}
buffer = readerWAV.ReadNextSampleFrame();
}
}
catch (Exception em1)
{
throw em1;
}
finally
{
try
{
//Flush each writer and close
for (int writercount = 0; writercount < FileWriters.Length; writercount++)
{
FileWriters[writercount].Flush();
FileWriters[writercount].Close();
FileWriters[writercount].Dispose();
}
}
catch
{
}
try { readerWAV.Dispose(); readerWAV = null; }
catch { }
try { WaveFormatConversionStream_.Dispose(); WaveFormatConversionStream_ = null; }
catch { }
try { PCMStream.Dispose(); PCMStream = null; }
catch { }
try { readerMP3.Dispose(); readerMP3 = null; }
catch { }
try
{
TempStream.Close(); TempStream.Dispose();
}
catch
{
}
}
return ListFilesOut.ToArray();
}
}
}

Extract Field Names and max lengths from a text file using C#

I have a file that is a SQL Server result set saved as a text file.
Here is a sample of what the file looks like:
RWS_DMP_ID RV1_DMP_NUM CUS_NAME
3192 3957 THE ACME COMPANY
3192 3957 THE ACME COMPANY
3192 3957 THE ACME COMPANY
I want to create a C# program that reads this file and creates the following table of data:
Field MaxSize
----- -------
RWS_DMP_ID 17
RV1_DMP_NUM 17
CUS_NAME 42
This is a list of the field names and their max length. The max length is the beginning of the field to the space right before the beginning of the next field.
By the way I don't care about code performance. This is seldom used file processing utility.
I solved this with the following code:
objFile = new StreamReader(strPath + strFileName);
strLine = objFile.ReadLine();
intLineCnt = 0;
while (strLine != null)
{
intLineCnt++;
if (intLineCnt <= 3)
{
if (intLineCnt == 1)
{
strWords = SplitWords(strLine);
intNumberOfFields = strWords.Length;
foreach (char c in strLine)
{
if (bolNewField == true)
{
bolFieldEnd = false;
bolNewField = false;
}
if (bolFieldEnd == false)
{
if (c == ' ')
{
bolFieldEnd = true;
}
}
else
{
if (c != ' ')
{
if (intFieldCnt < strWords.Length)
{
strProcessedData[intFieldCnt, 0] = strWords[intFieldCnt];
strProcessedData[intFieldCnt, 1] = (intCharCnt - 1).ToString();
}
intFieldCnt++;
intCharCnt = 1;
bolNewField = true;
}
}
if (bolNewField == false)
{
intCharCnt++;
}
}
strProcessedData[intFieldCnt, 0] = strWords[intFieldCnt];
strProcessedData[intFieldCnt, 1] = intCharCnt.ToString();
}
else if (intLineCnt == 3)
{
intLine2Cnt= 0;
intTotalLength = 0;
while(intLine2Cnt < intNumberOfFields)
{
intSize = Convert.ToInt32(strProcessedData[intLine2Cnt, 1]);
if (intSize + intTotalLength > strLine.Length)
{
intSize = strLine.Length - intTotalLength;
}
strField = strLine.Substring(intTotalLength, intSize);
strField = strField.Trim();
strProcessedData[intLine2Cnt, intLineCnt - 1] = strField;
intTotalLength = intTotalLength + intSize + 1;
intLine2Cnt++;
}
}
}
strLine = objFile.ReadLine();
}`enter code here`
I'm aware that this code is a complete hack job. I'm looking for a better way to solve this problem.
Is there a better way to solve this problem?
THanks

I'm not sure how memory efficient this is, but I think it's a bit cleaner (assuming your fields are tab-delimited):
var COL_DELIMITER = new[] { '\t' };
string[] lines = File.ReadAllLines(strPath + strFileName);
// read the field names from the first line
var fields = lines[0].Split(COL_DELIMITER, StringSplitOptions.RemoveEmptyEntries).ToList();
// get a 2-D array of the columns (excluding the header row)
string[][] columnsArray = lines.Skip(1).Select(l => l.Split(COL_DELIMITER)).ToArray();
// dictionary of columns with max length
var max = new Dictionary<string, int>();
// for each field, select all columns, and take the max string length
foreach (var field in fields)
{
max.Add(field, columnsArray.Select(row => row[fields.IndexOf(field)]).Max(col => col.Trim().Length));
}
// output per requirment
Console.WriteLine(string.Join(Environment.NewLine,
max.Keys.Select(field => field + " " + max[field])
));

void MaximumWidth(StreamReader reader)
{
string[] columns = null;
int[] maxWidth = null;
string line;
while ((line = reader.ReadLine()) != null)
{
string[] cols = line.Split('\t');
if (columns == null)
{
columns = cols;
maxWidth = new int[cols.Length];
}
else
{
for (int i = 0; i < columns.Length; i++)
{
int width = cols[i].Length;
if (maxWidth[i] < width)
{
maxWidth[i] = width;
}
}
}
}
// ...
}

Here is what I came up with. The big takeaway is to use the IndexOf string function.
class Program
{
static void Main(string[] args)
{
String strFilePath;
String strLine;
Int32 intMaxLineSize;
strFilePath = [File path and name];
StreamReader objFile= null;
objFile = new StreamReader(strFilePath);
intMaxLineSize = File.ReadAllLines(strFilePath).Max(line => line.Length);
//Get the first line
strLine = objFile.ReadLine();
GetFieldNameAndFieldLengh(strLine, intMaxLineSize);
Console.WriteLine("Press <enter> to continue.");
Console.ReadLine();
}
public static void GetFieldNameAndFieldLengh(String strLine, Int32 intMaxSize)
{
Int32 x;
string[] fields = null;
string[,] strFieldSizes = null;
Int32 intFieldSize;
fields = SplitWords(strLine);
strFieldSizes = new String[fields.Length, 2];
x = 0;
foreach (string strField in fields)
{
if (x < fields.Length - 1)
{
intFieldSize = strLine.IndexOf(fields[x + 1]) - strLine.IndexOf(fields[x]);
}
else
{
intFieldSize = intMaxSize - strLine.IndexOf(fields[x]);
}
strFieldSizes[x, 0] = fields[x];
strFieldSizes[x, 1] = intFieldSize.ToString();
x++;
}
Console.ReadLine();
}
static string[] SplitWords(string s)
{
return Regex.Split(s, #"\W+");
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting pdf images in a correct order iTextSharp - c#

Related

ITextSharp Quadpoints to Rectangle

Marshal.Copy() causes System.StackOverflowException

D3 plotter and file watcher, interference?

Saving each WAV channel as a mono-channel WAV file using Naudio

Extract Field Names and max lengths from a text file using C#

Categories

Resources