File ReadLine Count - c#

I am trying to read from a file,the program goes through each line of the text file comparing the first 8 characters of the each line and joining these lines into one where the 8 characters are similar.See code:
while ((line1 = fileread1.ReadLine()) != null)
{
line2 = fileread2.ReadLine();
while (line2 != null)
{
if (line1.Length >= 8 && line2.Length >= 8 &&
line1.Substring(0, 8) == line2.Substring(0, 8))
{
//line2 = line2.Remove(0, 60);
line1 = line1 +" "+ line2;
}
line2 = fileread3.ReadLine();
counter2++;
}
filewrite.WriteLine(line1);
counter1++;
}
Qusetion 1:
How can i get the count of fileread2 and assign it to fileread3,because i need every time the inner loop executes to reset the count of fileread3 to be the same as fileread2.
Question 2:
How do i write the combined lines as single line where the first 8 characters match.

From reading the comments what I understand is that you actually want to remove duplicates. Or, to be more specific, the lines that start with the same 8 characters.
If that's the case why not use a collection of keys to remember whether the string with the same 8 initial characters has not been already loaded.
E.g. you can have a collection (like List) where you add the 8 character substring. Then, before writing the line you first check if the substring is already in the collection. If it is then you know it's a duplicate and you don't write it.
Another option, if you want to calculate how many duplicates of a specific type there are, would be to use a Dictionary. When you have a duplicate of specific type you can then increase the counter in the dictionary.

Since you have large files, probably File.ReadAllLines is not an option for you. You can declare the following methods (we need to work with Stream, since StreamReader doesn't have Position property):
public static string ReadLine(Stream stream)
{
return ReadLine(stream, Encoding.UTF8);
}
public static string ReadLine(Stream stream, Encoding encoding)
{
List<byte> lineBytes = new List<byte>();
while (stream.Position < stream.Length)
{
byte b = (byte)stream.ReadByte();
if (b == 0x0a)
break;
if (b == 0x0d)
continue;
lineBytes.Add(b);
}
return encoding.GetString(lineBytes.ToArray());
}
Sample code:
string sourceFileName = "input.txt";
string targetFileName = "output.txt";
using (StreamWriter targetWriter = new StreamWriter(targetFileName))
using (FileStream sourceStream = File.OpenRead(sourceFileName))
{
HashSet<string> processedKeys = new HashSet<string>();
while (sourceStream.Position < sourceStream.Length)
{
string line = ReadLine(sourceStream);
if (line.Length < 8)
targetWriter.WriteLine(line);
else
{
string key = line.Substring(0, 8);
if (processedKeys.Contains(key))
continue;
targetWriter.Write(line);
long backupPosition = sourceStream.Position;
while (sourceStream.Position < sourceStream.Length)
{
string dupLine = ReadLine(sourceStream);
if (dupLine.Length < 8)
continue;
string dupKey = dupLine.Substring(0, 8);
if (dupKey == key)
targetWriter.Write(" " + dupLine);
}
sourceStream.Position = backupPosition;
targetWriter.WriteLine();
processedKeys.Add(key);
}
}
}

If you want to group the data based on the first 8 characters, all you need is:
var groups = File.ReadLines(your_file_name).GroupBy(line => line.Substring(0, 8));
To count the elements you call Count() on the elements in the groups
groups.First().Count()
(you can also enumerate over groups via foreach (var group in groups)..)
The group result will contain the complete line from the file. If you don't want the "key" (i.e. the first 8 characters) in the result, you can use the overload of GroupBy
var groups = File.ReadLines(your_file_name).GroupBy(line => line.Substring(0, 8), line => line.Substring(8));
In order to get new output with the "key" and the lines concatenated, you would use something like this:
var groups = File.ReadLines(input_file).GroupBy(line => line.Substring(0, 8), line => line.Substring(8));
File.WriteAllLines(output_file, groups.Select(group => string.Format("{0}{1}", group.Key, string.Join(string.Empty, group))));

Related

Can't properly rebuild a string with Replacement values from Dictionary

I am trying to build a file using a template. I am processing the file in a while loop line by line. The first section of the file, first 35 lines are header information. The infromation is surrounded by # signs. Take this string for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
The expected output should be:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
This header section uses a different mapping than the rest of the file so I wanted to parse the file line by line from top to bottom and use a different logic for each section so that I don't waste time parsing the entire file at once multiple times for different sections.
The logic uses two dictionaries populated from an xml file. Because the file has mutliple tables, I combined them in the two dictionaries like so:
var headerCdataIndexKeyVals = Dictionary<string, int>(){
{"data.tool_context.TOOL_SOFTWARE_VERSION", 1},
{"data.context.TOOL_ENTITY",0}
};
var headerCdataArrayKeyVals = new Dictionary<string, List<string>>();
var tool_contextCdataList = new list <string>{"HM654", "sw0.2.002"};
var contextCdataList = new List<string>{"WSM102"}
headerCdataArrayKeyVals.add("tool_context", tool_contextCdataList);
headerCdataArrayKeyVals.add("context", contextCdataList);
To help me map the values to their respective positions in the string in one go and without having to loop through multiple dictionaries.
I am using the following logic:
public static string FindSubsInDelimetersAndReturn(string str, char openDelimiter, char closeDelimiter, HeaderMapperData mapperData )
{
string newString = string.Empty;
// Stores the indices of
Stack <int> dels = new Stack <int>();
for (int i = 0; i < str.Length; i++)
{
var let = str[i];
// If opening delimeter
// is encountered
if (str[i] == openDelimiter && dels.Count == 0)
{
dels.Push(i);
}
// If closing delimeter
// is encountered
else if (str[i] == closeDelimiter && dels.Count > 0)
{
// Extract the position
// of opening delimeter
int pos = dels.Peek();
dels.Pop();
// Length of substring
int len = i - 1 - pos;
// Extract the substring
string headerSubstring = str.Substring(pos + 1, len);
bool hasKey = mapperData.HeaderCdataIndexKeyVals.TryGetValue(headerSubstring.ToUpper(), out int headerCdataIndex);
string[] headerSubstringSplit = headerSubstring.Split('.');
string headerCDataVal = string.Empty;
if (hasKey)
{
if (headerSubstring.Contains("CONTAINER.CONTEXT", StringComparison.OrdinalIgnoreCase))
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper() + '.' + headerSubstringSplit[2].ToUpper()][headerCdataIndex];
//mapperData.HeaderCdataArrayKeyVals[]
}
else
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper()][headerCdataIndex];
}
string strToReplace = openDelimiter + headerSubstring + closeDelimiter;
string sub = str.Remove(i + 1);
sub = sub.Replace(strToReplace, headerCDataVal);
newString += sub;
}
else if (headerSubstring == "WSM" && closeDelimiter == '#')
{
string sub = str.Remove(len + 1);
newString += sub.Replace(openDelimiter + headerSubstring + closeDelimiter, "");
}
else
{
newString += let;
}
}
}
return newString;
}
}
But my output turns out to be:
"\tFie\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw0.2.002\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw#data.tool_context.TOOL_SOFTWARE_VERSION#\", \"WSM102"
Can someone help understand why this is happening and how I can go about correcting it so I get the output:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Am i even trying to solve this the right way or is there a better cleaner way to do it? Btw if the key is not in the dictionary I replace it with empty string

How to remove texts after a character?

I have an array that holds user and passwords in user:pass form, and I like ro remove lines which pass is less than 8 characters or password uses repetitive chars like 111111,22222222222,...
I have tried string.take but it takes lines completely, I need conditional deletion.
public string[] lines;
//open file dialogue to load the user pass file
lines = File.ReadAllLines(openFileDialog1.FileName);
//delete button click event
//the place that I have problem
I have email:pass combination like so:
email1:1234567895121
email2:12345
email4:11111
email5"454545454545
and I would like the output to be like
email1:1234567895121
email5"454545454545
Just loop the characters of every line and see if the current is equal to the previous:
public string[] lines = File.ReadAllLines(openFileDialog1.FileName);
var filteredLines = new List<string>(lines);
foreach(var line in lines)
{
var pair = line.Split(':');
var mail = pair[0];
var pass = pair[1]; // may throw exception on invalid format of your line
for(int i = 1; i < pass.Length; i++)
{
if(pass[i] == pass[i - 1])
{
filteredLines.Remove(line);
break; // will break inner loop and continue on next line
}
}
}
string[] lines =
{
"email1:1234567895121",
"email2:12345",
"email3:22222222222",
"email4:11111",
"email5:454545454545"
};
lines = lines
.Where(s =>
{
string pass = s.Split(new[] { ':' }, 2)[1];
return pass.Length >= 8 && pass.Any(c => c != pass[0]);
})
.ToArray();
foreach (var s in lines)
Console.WriteLine(s);
Output:
email1:1234567895121
email5:454545454545

How to validate or compare string by omitting certain part of it

I have a string as below
"a1/type/xyz/parts"
The part where 'xyz' exists is dynamic and varies accordingly at any size. I want to compare just the two strings are equal discarding the 'xyz' portion exactly.
For example I have string as below
"a1/type/abcd/parts"
Then my comparison has to be successful
I tried with regular expression as below. Though my knowledge on regular expressions is limited and it did not work. Probably something wrong in the way I used.
var regex = #"^[a-zA-Z]{2}/\[a-zA-Z]{16}/\[0-9a-zA-Z]/\[a-z]{5}/$";
var result = Regex.Match("mystring", regex).Success;
Another idea is to get substring of first and last part omitting the unwanted portion and comparing it.
The comparison should be successful by discarding certain portion of the string with effective code.
Comparison successful cases
string1: "a1/type/21412ghh/parts"
string2: "a1/type/eeeee122ghh/parts"
Comparison failure cases:
string1: "a1/type/21412ghh/parts"
string2: "a2/type/eeeee122ghh/parts/mm"
In short "a1/type/abcd/parts" in this part of string the non-bold part is static always.
Honestly, you could do this using regex, and pull apart the string. But you have a specified delimiter, just use String.Split:
bool AreEqualAccordingToMyRules(string input1, string input2)
{
var split1 = input1.Split('/');
var split2 = input2.Split('/');
return split1.Length == split2.Length // strings must have equal number of sections
&& split1[0] == split2[0] // section 1 must match
&& split1[1] == split2[1] // section 2 must match
&& split1[3] == split2[3] // section 4 must match
}
You can try Split (to get parts) and Linq (to exclude 3d one)
using System.Linq;
...
string string1 = "a1/type/xyz/parts";
string string2 = "a1/type/abcd/parts";
bool result = string1
.Split('/') // string1 parts
.Where((v, i) => i != 2) // all except 3d one
.SequenceEqual(string2 // must be equal to
.Split('/') // string2 parts
.Where((v, i) => i != 2)); // except 3d one
Here's a small programm using string functions to compare the parts before and after the middle part:
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine(CutOutMiddle("a1/type/21412ghh/parts"));
Console.WriteLine("True: " + CompareNoMiddle("a1/type/21412ghh/parts", "a1/type/21412ghasdasdh/parts"));
Console.WriteLine("False: " + CompareNoMiddle("a1/type/21412ghh/parts", "a2/type/21412ghh/parts/someval"));
Console.WriteLine("False: " + CompareNoMiddle("a1/type/21412ghh/parts", "a1/type/21412ghasdasdh/parts/someappendix"));
}
private static bool CompareNoMiddle(string s1, string s2)
{
var s1CutOut = CutOutMiddle(s1);
var s2CutOut = CutOutMiddle(s2);
return s1CutOut == s2CutOut;
}
private static string CutOutMiddle(string val)
{
var fistSlash = val.IndexOf('/', 0);
var secondSlash = val.IndexOf('/', fistSlash+1);
var thirdSlash = val.IndexOf('/', secondSlash+1);
var firstPart = val.Substring(0, secondSlash);
var secondPart = val.Substring(thirdSlash, val.Length - thirdSlash);
return firstPart + secondPart;
}
}
returns
a1/type/parts
True: True
False: False
False: False
This solution should cover your case, as said by others, if you have a delimiter use it. In the function below you could change int skip for string ignore or something similar and within the comparison loop if(arrayStringOne[i] == ignore) continue;.
public bool Compare(string valueOne, string valueTwo, int skip) {
var delimiterOccuranceOne = valueOne.Count(f => f == '/');
var delimiterOccuranceTwo = valueTwo.Count(f => f == '/');
if(delimiterOccuranceOne == delimiterOccuranceTwo) {
var arrayStringOne = valueOne.Split('/');
var arrayStringTwo = valueTwo.Split('/');
for(int i=0; i < arrayStringOne.Length; ++i) {
if(i == skip) continue; // or instead of an index you could use a string
if(arrayStringOne[i] != arrayStringTwo[i]) {
return false;
}
}
return true;
}
return false;
}
Compare("a1/type/abcd/parts", "a1/type/xyz/parts", 2);

How to iterate through data and create a new text file every nth entries

I'm making a list of lines that need to be added to a .txt file (with tab delimitation). The text file needs to have a maximum of 500 entries plus a header.
Right now, I have this code, which is successfully iterating through my list and creating the text file with the header. If the file already exists, it appends the lines in my list without adding the header.
I can't quite figure out how to make a new file, add the header and add each line after my first file surpasses 500 entries.
Can you help me separate in 500 line files with headers? Thank you
This is the code I have so far:
var tab = new StringBuilder();
foreach (var line in textlinestoadd)
{
tab.AppendLine(line.ToString());
}
if (!File.Exists(textcsvpath))
{
string textheader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllLines(textcsvpath, textlinestoadd);
This seems like a good practice opportunity so I will leave the code part as exercise!
The basic idea is simple. Whenever you wrote 500 lines just reset and write to a new file
here is a high level pseudo code
Initialize StringBuilder sb
For each line do
Add line to sb
if line count == 500 then
save to file
reset sb
reset line count
update filename = next file
end if
End For
//writes the last chunk if # of lines is not multiple of 500
if line count is not 0 then
save to file
end if
I'd try something like this.
var tab = new StringBuilder();
int lineCount = 0;
string textheader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
if (File.Exists(textcsvpath)) {
FileStream fs = File.OpenRead(textcsvpath);
string[] fileContent = File.ReadAllLines(textcsvpath);
lineCount = fileContent.Length - 1; // assume the first line is the header
}
foreach (var line in textlinestoadd)
{
tab.AppendLine(line.ToString());
lineCount++;
if (lineCount > 0 && lineCount % 500 == 0)
{
if (!File.Exists(textcsvpath))
{
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllText(textcsvpath, tab.ToString());
tab.Clear();
textcsvpath = "some-new-file-name";
}
}
if (!File.Exists(textcsvpath))
{
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllText(textcsvpath, tab.ToString());
You'll need to do something to determine the new file name as you add a new file.
I'd do something like this:
const int limit = 500;
int iteration = 0;
string textHeader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
while(iteration * limit < textLinesToAdd.Count())
{
string fullPath = Path.Combine(filePath, $"{fileName}.{iteration}", extension);
IEnumerable<string> linesToAdd = textLinesToAdd.Skip(iteration++ * limit).Take(limit);
File.Create(fullPath);
File.WriteAllText(fullPath, textHeader);
File.AppendAllLines(fullPath, linesToAdd);
}
Define that filename as foo and the extension as bar, and you'll get a sequence of files called foo.0.bar, foo.1.bar, foo.2.bar and so on.
I'm assuming we want to create a file with the specified name, and then have some integer placed between the name and extension that increments every time a new file is created.
One way to do this would be to have a method that takes in a filePath string, a list of lines to write, a header string, and the maximum number of lines allowed per file. Then it could parse the directory of the file path, looking for a pattern related to the file name.
It would determine what the latest file name should be based on the contents of the directory and the number of lines in the last file that matches our pattern, then would write to that file until it was full, and then continue creating new files until the lines were all written.
Here's a sample class that can do that, where I added some helper methods to get a file's number, increment that number in the name, get the latest file from a directory, and write lines to the file. It also implements IComparer<string> so that we can pass it to OrderByDescending to easily sort the files we're interested in.
public class FileWriterHelper : IComparer<string>
{
public int Compare(string x, string y)
{
// Compare null
if (x == null) return y == null ? 0 : 1;
if (y == null) return -1;
// Compare count of parts split on '.'
var xParts = x.Split('.');
var yParts = y.Split('.');
if (xParts.Length < 3) return yParts.Length < 3 ? 0 : -1;
if (yParts.Length < 3) return 1;
// Compare numeric portion
int xNum, yNum;
if (int.TryParse(xParts[1], out xNum) &&
int.TryParse(yParts[1], out yNum))
{
return xNum.CompareTo(yNum);
}
// Unknown values
return string.Compare(x, y, StringComparison.Ordinal);
}
private static int? GetFileNumber(string fileName)
{
if (string.IsNullOrWhiteSpace(fileName)) return null;
var fileParts = fileName.Split('.');
int fileNum;
if (fileParts.Length < 3 || !int.TryParse(fileParts[1], out fileNum)) return null;
return fileNum;
}
private static string IncrementNumber(string fileName)
{
var number = GetFileNumber(fileName).GetValueOrDefault() + 1;
var fileParts = fileName.Split('.');
return $"{fileParts[0]}.{number}.{fileParts[fileParts.Length - 1]}";
}
private static string GetLatestFile(string filePath, int maxLines)
{
var fileDir = Path.GetDirectoryName(filePath);
var fileName = Path.GetFileNameWithoutExtension(filePath);
var fileExt = Path.GetExtension(filePath);
var latest = Directory.GetFiles(fileDir, $"{fileName}*{fileExt}")
.OrderByDescending(f => f, new FileWriterHelper())
.FirstOrDefault() ?? filePath;
return File.Exists(latest) && File.ReadAllLines(latest).Length >= maxLines
? Path.Combine(fileDir, IncrementNumber(Path.GetFileName(latest)))
: latest;
}
public static void WriteLinesToFile(string filePath, string header,
List<string> lines, int maxFileLines)
{
while ((lines?.Count ?? 0) > 0 && maxFileLines > 0)
{
var latestFile = GetLatestFile(filePath, maxFileLines);
if (!File.Exists(latestFile)) File.CreateText(latestFile).Close();
var lineCount = File.ReadAllLines(latestFile).Length;
if (lineCount == 0 && header != null)
{
File.WriteAllText(latestFile, string.Concat(header, Environment.NewLine));
lineCount = 1;
}
var numLinesToWrite = maxFileLines - lineCount;
File.AppendAllLines(latestFile, lines.Take(numLinesToWrite));
lines = lines.Skip(numLinesToWrite).ToList();
}
}
}
That was a bit of work, but now to use it is really simple:
private static void Main()
{
// Generate 5000 lines to write
var fileLines = Enumerable.Range(0, 5000).Select(i => $"Line number {i}").ToList();
// File path with base file name
var filePath = #"f:\public\temp\temp.csv";
// This should create 10 files
FileWriterHelper.WriteLinesToFile(filePath,
"HEADER: This should be the first line in each file.", fileLines, 500);
GetKeyFromUser("\nDone! Press any key to exit...");
}
If you run that once, it will create 10 files (because of the number of lines we're generating and the max number of lines per file we specified). And if you run it again, it will create 10 more, since we're using the same path and file name pattern, it recognizes the previous files that were in the location.
I'm sure it could use some work, but hopefully it's a start!

Delimit string and put it in listbox

I have a string like:
one,one,one,one,two,two,two,two,three,three,three,three,four,four,four,four,...
and I'd like to delimit it after every fourth comma and store it into a list box, like this:
one,one,one,one,
two,two,two,two,
three,three,three,three,
four,four,four,four,
...
What should be appropriate way to do this? Should I supposed to use regex to somehow delimit this string?
Thanks
Linqless alternative;
int s = 0, n = 0, len = inputString.Length;
for (var i = 0; i < len; i++) {
if (inputString[i] == ',' && ++n % 4 == 0 || i == len - 1) {
aListBox.Items.Add(inputString.Substring(s, i - s + 1));
s = i + 1;
}
}
This LINQ breaks your input into individual strings by delimiting on the comma, then uses an index in the Select method to group four items together at a time, then finally joins those four items into a single string again.
var input = "one,one,one,one,two,two,two,two,three,three,three,three"; // and so on
var result = input.Split(',')
.Select((s, i) => new {s, i})
.GroupBy(pair => pair.i / 4)
.Select(grp => string.Join(",", grp.Select(pair => pair.s)) + ",");
The result is a collection of strings, where the first one (based on your input) is "one,one,one,one,", then the second is "two,two,two,two," and so on...
From there, it's just a matter of setting it as the DataSource, ItemsSource or similar, depending on what technology you're using.

Categories