split multiple whitespaces from text file to array - c#

I have a text file and required to have all 7 elements including the empty ones to be parsed into an array for further processing. However, there are no unique delimiter to be make use of except for whitespaces and some of the data/value will come with whitespace. Example per "Data Sample" and some of the block will have null entry. How can i make this happen?
Snippet of Data
Actual Sample Data
My end results would be some ting similar like below:
Array[0]:123456789
Array[1]:HLTX
Array[2]:5
Array[3]:BT5Q02
Array[4]:4SV
Array[5]:D8041
Array[6]:LIANG LIN
My code for the above function for now per below and it will omit the empty values. Which likely will missed out some of the data required.
string[] splitlinecontent = line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var OrderNum = splitlinecontent[0];
var OrderType = splitlinecontent[1];
int OrderQTY = int.Parse(splitlinecontent[2]);
var OrderSINumInRpt = splitlinecontent[3];
var OrderHoldMod = splitlinecontent[5];
var SalesPerson = splitlinecontent[6];

I think bestpractice for these files is to use TextFieldParser from Microsoft.VisualBasic.FileIO;
using (var parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.FixedWidth;
parser.SetFieldWidths(3, 7, 10, 13, 8, 6, 1, 7, -1);
while (!parser.EndOfData)
{
var fields = parser.ReadFields();
But I guess it isn't that hard to code the stuff yourself.

Based on the screenshot of your sample data, your columns have a fixed charsize of ten chars. You now can simply read the sample data line by line and split the lines by this fixed size.
public static List<List<string>> GetRecords(string path, bool hasColHeader, int colLength, int colCount){
//Result will be stored in lists
List<List<string>> result = new List<List<string>>();
//Get the sample file
string[] records = File.ReadAllLines(path,Encoding.UTF8);
//Go for each line through the data from sample file
for(int n = 0; n<records.Length;n++){
//create new list for this line
result.Add(new List<string>());
//here you can do something with headers. for simplification i do nothing with them and continue with next line.
if(n==0 && hasColHeader){
continue;
}
//go for each column (colCount specifies the count of columns)
for(int i = 0; i< colCount ;i++){
//if the length of the line is not devisible by colLength, you have to put some spaces to match the columns size
//not the best way to do this but this is not the major point of this question
if(records[n].Length % colLength != 0){
int charsToAdd = (colLength * colCount) - records[n].Length;
string spaces = "";
for(int s = 0; s< charsToAdd; s++){
spaces += " ";
}
records[n] += spaces;
}
//add the result to the currently created list
result[n].Add(records[n].Substring(i*colLength,colLength).Trim());
}
}
return result;
}
You can use this code like this:
static void Main(string[] args)
{
List<List<String>> list = GetRecords(#"C:\temp\DataSample.txt",true, 10, 7);
}
The data in list looks like this:
List[0]:List[0]:123456789
List[0]:List[1]:HLTX
List[0]:List[2]:5
List[0]:List[3]:BT5Q02
List[0]:List[4]:4SV
List[0]:List[5]:D8041
List[0]:List[6]:LIANG LIN
List[1]:List[0]:3835443
List[1]:List[1]:HLTX
List[1]:List[2]:1
...
Here you can optimize two things by youreself.
Calculate the size of the columns by the chars between headers. The columnsize will alsways be the start of a columnheader and the start of the next columnheader. The charcount between this two points, will be the size of the column.
Find a better way to get the last column! :D i dont think what i've done is good. There a better ways to do this.

Related

problem reading .txt file in C# starting from (after immediate empty whole line) to (the next empty whole line)

I am trying to randomly read a huge .txt file. It has tons of paragraphs separated by a full line of empty space prior and post each paragraph. I would like each time I randomly read that it pulls up a full intact paragraph without any characters or words missing for the sake of context. I appreciated the help in advance.
I added a for loop just to test it out and see if I can at some point include a way to recognize consecutively running empty space. That would only work post already selected the starting point obviously if applied.
public static string GetRandomLine(string filename)
{
var lines = File.ReadAllLines(filename);
var lineNumber = _rand.Next(0, lines.Length);
string reply = lines[lineNumber];
return reply ;
}
Try the following:
public static string[] GetRandomParagraph(string filePath)
{
if (File.Exists(filePath))
{
string text = File.ReadAllText(filePath);
string[] paragraphs = text.Split(new string[] { "\n\n" }, StringSplitOptions.None);
return paragraphs[new Random().Next(0, paragraphs.Length)].Split('\n');
}
else
throw new FileNotFoundException("The file was not found", filePath);
}
I really hope that is that what you are looking for.
// This builds a list of Paragraph first
public static List<string> GetParagraphs(string filename)
{
var paragraphs = new List<string>();
var lines = File.ReadAllLines(filename);
bool newParagraph = true;
string CurrentParagraph = string.Empty;
// Build the list of paragraphs by adding to the currentParagraph until empty lines and then starting a new one
foreach(var line in lines)
{
if(newParagraph)
{
CurrentParagraph = line;
newParagraph = false;
}
else
{
if(string.IsNullOrWhiteSpace(line))// we're starting a new paragraph, add it to the list of paragraphs and reset current paragraph for next one
{
paragraphs.Add(CurrentParagraph);
CurrentParagraph = string.Empty;
newParagraph = true;
}
else // we're still in the same paragraph, add the line to current paragraph
{
newParagraph += (Environment.NewLine + line);
}
}
}
// Careful, if your file doesn't end with a newline the last paragraph won't count as one, in that case add it manually here.
}
public static Random rnd = new Random();
// And this returns a random one
public static string GetRandomParagraph(string fileName)
{
var allParagraphs = GetParagraphs(filename);
allParagraphs[rnd.Next(0,allParagraphs.length-1)]; // pick one of the paragraphs at random, stop at length-1 as collection indexers are 0 based
}
Note that if you're always reading from the same file this could be much faster by only calling GetParagraphs once and keeping the list of paragraphs in memory.
Try this:
public static string GetRandomLine(string filename)
{
var lines = File.ReadAllLines(filename);
var lineNumber = _rand.Next(0, lines.Length - 1);
var blankBefore = lineNumber;
var blankAfter = lineNumber + 1;
string reply = "";
while (lines[blankBefore].Length > 0)
{
blankBefore--;
}
while (lines[blankAfter].Length != 0)
{
blankAfter++;
}
for ( int i = blankBefore + 1; blankBefore < blankAfter; blankBefore++)
{
reply += lines[i];
}
return reply;
}
Based on your description, I'm assuming the file begins and ends with a blank line. By setting the exclusive upper bound of the random line to be one less than the length of lines, you avoid the chance of the random line being the last line of the file. If the random line is a blank line, blankBefore will be the index of that line, otherwise, it will be back tracked until it reaches the previous blank. blankAfter starts as the index of the next line after the random line and if that line is not blank, blankAfter is increased until it is the index of the next blank line.
Once you have the index of the blank lines before and after the target paragraph, simply append the lines between them to reply.
If the first and last lines of the file are not blank, you would need to verify that blankBefore and blankAfter remain within the bounds of the array.
I made some modifications to the code provided above by #TheCoderCrab. I turned the method to a string method so it would return a string. I simply added a for loop append all the characters of the paragraph array on to a new string which returns it to the main. Thank you.
public static string GetRandomParagraph(string filePath)
{
if (File.Exists(filePath))
{
string text = File.ReadAllText(filePath);
string[] paragraphs = text.Split(new string[] { "\n\n" }, StringSplitOptions.None);
string [] paragraph = paragraphs[new Random().Next(0, paragraphs.Length)].Split('\n');
//Added a for loop to build the string out of all the characters in the 'paragraph' array index.
string pReturn = "";
for (int a = 0; a < paragraph.Length; a++)
{
//Loop through and consecutively append each character of mapped array index to a return string 'pReturn'
pReturn = pReturn + paragraph[a].ToString();
}
return pReturn;
}
else
throw new FileNotFoundException("The file was not found", filePath);
}
To get intact paragraphs
public static string GetRandomParagraph(string fileName)
{
/*
Rather than reading all the lines, read all the text
this gives you the ability to split by paragraph
*/
var allText = File.ReadAllText(fileName);
// Use as separator for paragraphs
var paragraphSeparator = $"{Environment.NewLine}{Environment.NewLine}";
// Treat large white spaces after a new line as separate paragraphs
allText = Regex.Replace(allText, #"(\n\s{3,})", paragraphSeparator);
// Split the text into paragraphs
var paragraphs = allText.Split(paragraphSeparator);
// Get a random index between 0 and the amount of paragraphs
var randomParagraph = new Random().Next(0, paragraphs.Length);
return paragraphs[randomParagraph];
}

C# Populate An Array with Values in a Loop

I have a C# console application where an external text file is read. Each line of the file has values separated by spaces, such as:
1 -88 30.1
2 -89 30.1
So line one should be split into '1', '-88', and '30.1'.
What I need to do is to populate an array (or any other better object) so that it duplicate each line; the array should have 3 elements per row. I must be having a brain-lock to not figure it out today. Here's my code:
string line;
int[] intArray;
intArray = new int[3];
int i = 0;
//Read Input file
using (StreamReader file = new StreamReader("Score_4.dat"))
{
while ((line = file.ReadLine()) != null && line.Length > 10)
{
line.Trim();
string[] parts;
parts = line.Split(' ');
intArray[0][i] = parts[0];//error: cannot apply indexing
i++;
}
}
Down the road in my code, I intend to make some API calls to a server by constructing a Json object while looping through the array (or alternate object).
Any idea?
Thanks
If you only need the data to be transferred to JSON then you don't need to process the values of the data, just reformat it to JSON arrays.
As you don't know the number of lines in the input file, it is easier to use a List<>, whose capacity expands automatically, to hold the data rather than an array, whose size you would need to know in advance.
I took your sample data and repeated it a few times into a text file and used this program:
static void Main(string[] args)
{
string src = #"C:\temp\Score_4.dat";
List<string> dataFromFile = new List<string>();
using (var sr = new StreamReader(src))
{
while (!sr.EndOfStream)
{
string thisLine = sr.ReadLine();
string[] parts = thisLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
if (parts.Length == 3)
{
string jsonArray = "[" + string.Join(",", parts) + "]";
dataFromFile.Add(jsonArray);
}
else
{
/* the line did not have three entries */
/* Maybe keep a count of the lines processed to give an error message to the user */
}
}
}
/* Do something with the data... */
int totalEntries = dataFromFile.Count();
int maxBatchSize = 50;
int nBatches = (int)Math.Ceiling((double)totalEntries / maxBatchSize);
for(int i=0;i<nBatches;i+=1)
{
string thisBatchJsonArray = "{\"myData\":[" + string.Join(",", dataFromFile.Skip(i * maxBatchSize).Take(maxBatchSize)) + "]}";
Console.WriteLine(thisBatchJsonArray);
}
Console.ReadLine();
}
to get this output:
{"myData":[[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1]]}
{"myData":[[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1],[1,-88,30.1],[2,-89,30.1]]}
It should be easy to adjust the format as required.
I would create a custom Item class and then populate a list, for easy access and sorting, with self contained items. something like:
public Class MyItem
{
public int first { get; set; }
public int second { get; set; }
public float third { get; set; }
public MyItem(int one, int two, float three)
{
this.first = one;
this.second = two;
this.third = three;
}
}
then you could do:
List<MyItem> mylist = new List<MyItem>();
and then in your loop:
using (StreamReader file = new StreamReader("Score_4.dat"))
{
while ((line = file.ReadLine()) != null && line.Length > 10)
{
line.Trim();
string[] parts;
parts = line.Split(' ');
MyItem item = new Item(Int32.Parse(parts[0]),Int32.Parse(parts[1]),Float.Parse(parts[2]));
mylist.Add(item);
i++;
}
}
As there are numbers like 30.1 so int is not suitable for this, and also it must not be a double[] but double[][]:
string[] lines = File.ReadAllLines("file.txt");
double[][] array = lines.Select(x => s.Split(' ').Select(a => double.Parse(a)).ToArray()).ToArray();
Issue is that int array is single dimensional.
My suggestion is that you can put a class with 3 properties and populate a list of class there. It's better to have class with same property names that you require to build JSON. So that you can easily serialize this class to JSON using some nugets like Newtonsoft and make api calls easily.
Your int array is a single dimensional array yet you're trying to index it like a multidemensional array. It should be something like this:
intArray[i] = parts[0]
(However you'll need to handle converting to int for parts that are fractional)
Alternatively, if you want to use a multidimensional array, you have to declare one.
int[][] intArray = new int[*whatever your expected number of records are*][3]
Arrays have a static size. Since you're reading from a file and may not know how many records there are until your file finishes reading, I recommend using something like a List of Tuples or a Dictionary depending on your needs.
A dictionary will allow you to have quick lookup of your records without iterating over them by using a key value pair, so if you wanted your records to match up with their line numbers, you could do something like this:
Dictionary<int, int[]> test = new Dictionary<int, int[]>();
int lineCount = 1;
while ((line = file.ReadLine()) != null && line.Length > 10)
{
int[] intArray = new int[3];
line.Trim();
string[] parts = line.Split(' ');
for (int i = 0; i < 3; i++)
{
intArray[i] = int.Parse(parts[i]);
}
test[lineCount] = intArray;
lineCount++;
}
This will let you access your values by line count like so:
test[3] = *third line of file*

removing additional space above last line text in textfile c#

The following code is used by me to convert the entries in a datagrid view into a text file! (The process of creating a text file is successfull) After converting the datagrid view entries into a string I append a string from outside(This should appear on the last line of the textfile)
private void button1_Click_1(object sender, EventArgs e) // converting data grid value to single string
{
StringBuilder file = new StringBuilder();
for (int i = 0; i < dataGridView2.Rows.Count; i++)
{
for (int j = 0; j < dataGridView2.Rows[i].Cells.Count; j++)
{
var val = dataGridView2.Rows[i].Cells[j].Value;
if (val == null)
continue;//IF NULL GO TO NEXT CELL, MAYBE YOU WANT TO PUT EMPTY SPACE
var s = val.ToString();
file.Append(s.Replace(Environment.NewLine, " "));
}
file.AppendLine(); // NEXT ROW WILL COME INTO NEXT LINE
}
file.Append("Hello");
using (StreamWriter sw = new
StreamWriter(#"C:\Users\sachinthad\Desktop\VS\Tfiles\file.txt"))
{
sw.Write(x);
}
}
But when I check text file the outside string("Hello" in this scenario) appears on the last line but there is an additional space above it! How can i remove this additional space?
You can use string.Joinwhich concatenates a collection of strings with a separator in between. Together with refactoring it to use linq .Select:
var lines = dataGridView2.Rows.Select(row => string.Join(" ",
row.Cells.Select(cell => cell.Value).Where(val => val != null));
Then of course you can also use it on the entire collection of lines to concatenate them with a new line:
// Will eliminate problem of extra \n at the end
var result = string.Join(Environment.NewLine, lines);
If you prefer having the loops instead of linq then what you can do in the inner loop is add the values to a List<string> initialized in the outer loop. After the inner loop ends to use string.Join on the values of that list. Psudocode:
for each row:
List<string> items = new List<string>();
for each column in row:
items.Add(value of column);
file.Append(string.Join(" ", items));

Change a text in files in a folder next to string "Text_ID"

  Example of a text file below 
text_file a
Text_ID "441124_aad0656_1234"
Text_FILE_NAME
I would like to keep only last index of string "1234"
StreamReader streamReader = new StreamReader(text);
string text2;
while ((text2 = streamReader.ReadLine()) != null)
{
num++;
string[] array3 = text2.Split(new char[0]);
if (array3[0] == "Text_ID")
{
string[] array4 = array3[1].Split(new char[] {'_'});
string value = "Text_ID" + " " + '"' + array4[1];
streamWriter.WriteLine(value);
}
else
{
streamWriter.WriteLine(text2);
}
}
try below code, and hope it should work for you.
var startsWith = "Text_ID";
var allLines = File.ReadAllLines("a.txt").ToList();
allLines = allLines.Select(ln =>
{
if(ln.StartsWith(startsWith))
{
var finalValue = ln.Split(' ')[1].Trim('"').Split('_').Last();
//get update line
return string.Format("{0} \"{1}\"", startsWith, finalValue);
}
return ln;
}).ToList();
//Write back to file.
File.WriteAllLines("a.txt", allLines.ToArray());
Content before code execution.
Record 1
Text_ID "441124_aad0656_1234"
other content.
Record 2
Text_ID "Deepak_Sharma"
other content for line 2
Content in file after execution.
Record 1
Text_ID "1234"
other content.
Record 2
Text_ID "Sharma"
other content for line 2
You could use File.ReadAllLines to read the file into an array, then search through the array for the line you want to change, replace that line with the new string, and then use File.WriteAllLines to write the array back to the file:
var filePath = #"f:\public\temp\temp.txt";
// The string to search for
var searchTxt = "Text_ID";
// Read all the lines of the file into an array
var fileLines = File.ReadAllLines(filePath);
// Loop through each line in the array
for(int i = 0; i < fileLines.Length; i++)
{
// Check if the line begins with our search term
if (fileLines[i].Trim().StartsWith(searchTxt, StringComparison.OrdinalIgnoreCase))
{
// Get the end of the line, after the last underscore
var lastPartOfLine = fileLines[i].Substring(fileLines[i].LastIndexOf("_") + 1);
// Combine our search string, a quote, and the end of the line
fileLines[i] = $"{searchTxt} \"{lastPartOfLine}";
// We found what we were looking for, so we can exit the for loop now
// Remove this line if you expect to find more than one match
break;
}
}
// Write the lines back to the file
File.WriteAllLines(filePath, fileLines);
If you only want to save the last four lines of the file, you can call Skip and pass in the Length of the array minus the number of lines you want to keep. This skips all the entries up to the number that you want to save:
// Read all the lines of the file into an array
var fileLines = File.ReadAllLines(filePath);
// Take only the last 4 lines
fileLines = fileLines.Skip(fileLines.Length - 4).ToArray();

C# - Stream Reader or whats the best way to do it?

I have a text file with something like this in it.
Tom 1 2
Jerry 3 4
using C#, I have populate this into two arrays
1st array = {Tom,Jerry} - 1 dim array
2nd array ={(1,2),(3,4)} - 2 dim array
Please help me with this. Any help would be appreciated.
Console.WriteLine("Enter the file name with extension:");
string filename = Console.ReadLine();
string s = System.IO.File.ReadAllText("C:/Desktop/" + filename);
Console.WriteLine("\n Text Details in the file: \n \n"+s);
I guess this is a follow up to the last question :)
More hw hints:
As I said in my last answer, splitting on tab (assuming each item is delimited by tab, which looks to be the case) will give you a 1D array of every item in a line (if you use ReadLine).
Item 1 in the ReadLine() array will be the name. Put that into your 1D names array.
Items 2 to N of ReadLine() array will be the test scores. Put that into your 2D scores array.
The first dimension of the scores array will be the student index. The second dimension will be the score array.
That may sound confusing, but if you think about it, a 2D array is an array of arrays.
So even though your data file doesn't show the student index, it's implied:
0 Joe 100 80 77
1 Bob 65 93 100
Names array will look like:
[0] Joe
[1] Bob
and scores array will look like:
[0][0] 100
[0][1] 80
[0][2] 77
[1][0] 65
[1][1] 93
[1][2] 100
Notice that the index (first dimension) in the scores array coincide with the index of the names array.
There are few ways, it depends how "elegant" you want to be, and / or whether Tom, Jerry is always going to be one word.
Parse every line with String methods
Parse every line with RegEx
Use Linq to Text
Simplest way would be something like this (quick and dirty, very fragile solution):
var path = "fileName.txt";
var names = new List<string>();
var values = new List<KeyValuePair<int, int>>();
using (var reader = File.OpenText(path))
{
string s = "";
while ((s = reader.ReadLine()) != null)
{
String[] arr = s.Split(' ');
names.Add(arr[0]);
values.Add(new KeyValuePair<int, int>(int.Parse(arr[1]), int.Parse(arr[2])));
}
}
If you need you can convert lists to array
A more complete version ;)
string filename = "";
do
{
Console.WriteLine("Enter the file name with extension:");
filename = Environment.GetEnvironmentVariable("HOMEDRIVE") + Environment.GetEnvironmentVariable("HOMEPATH") + "\\Desktop\\" + Console.ReadLine();
if (!System.IO.File.Exists(filename))
Console.WriteLine("File doesn't exist!");
else
break;
} while (true);
System.IO.StreamReader readfile = new System.IO.StreamReader(filename);
List<string> Names = new List<string>();
List<int[]> Numbers = new List<int[]>();
string val = "";
while ((val = readfile.ReadLine()) != null)
{
if (val == string.Empty)
continue;
List<string> parts = val.Split(' ').ToList<string>();
Names.Add(parts[0]);
parts.RemoveAt(0);
Numbers.Add(parts.ConvertAll<int>(delegate(string i) { return int.Parse(i); }).ToArray());
}
readfile.Close();
//Print out info
foreach (string name in Names)
{
Console.Write(name + ", ");
}
Console.WriteLine();
foreach (int[] Numberset in Numbers)
{
Console.Write("{");
foreach (int number in Numberset)
Console.Write(number + ", ");
Console.Write("} ");
}
Console.ReadLine();
I like a functional approach.
// var fileContent = System.IO.File.ReadAllText("somefilethathasthestuff");
var fileContent = #"Tom 1 2
Jerry 3 4";
var readData = fileContent.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Aggregate(new { names = new List<string>(), data = new List<int[]>() },
(result, line) => {
var fields = line.Split(new []{' '}, 2);
result.names.Add(fields[0]);
result.data.Add(fields[1].Split(new[] { ' ' }).Select(n => int.Parse(n)).ToArray());
return result;
}
);
string[] firstarray = readData.names.ToArray();
int[][] secondarray = readData.data.ToArray();
This uses a jagged array for the numbers, but you can copy it to a 2d if that is what you really need. Better yet, don't copy to arrays at all. Use List < string> for names and List < int[] > for the numbers.

Categories