Can't properly rebuild a string with Replacement values from Dictionary - c#

I am trying to build a file using a template. I am processing the file in a while loop line by line. The first section of the file, first 35 lines are header information. The infromation is surrounded by # signs. Take this string for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
The expected output should be:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
This header section uses a different mapping than the rest of the file so I wanted to parse the file line by line from top to bottom and use a different logic for each section so that I don't waste time parsing the entire file at once multiple times for different sections.
The logic uses two dictionaries populated from an xml file. Because the file has mutliple tables, I combined them in the two dictionaries like so:
var headerCdataIndexKeyVals = Dictionary<string, int>(){
{"data.tool_context.TOOL_SOFTWARE_VERSION", 1},
{"data.context.TOOL_ENTITY",0}
};
var headerCdataArrayKeyVals = new Dictionary<string, List<string>>();
var tool_contextCdataList = new list <string>{"HM654", "sw0.2.002"};
var contextCdataList = new List<string>{"WSM102"}
headerCdataArrayKeyVals.add("tool_context", tool_contextCdataList);
headerCdataArrayKeyVals.add("context", contextCdataList);
To help me map the values to their respective positions in the string in one go and without having to loop through multiple dictionaries.
I am using the following logic:
public static string FindSubsInDelimetersAndReturn(string str, char openDelimiter, char closeDelimiter, HeaderMapperData mapperData )
{
string newString = string.Empty;
// Stores the indices of
Stack <int> dels = new Stack <int>();
for (int i = 0; i < str.Length; i++)
{
var let = str[i];
// If opening delimeter
// is encountered
if (str[i] == openDelimiter && dels.Count == 0)
{
dels.Push(i);
}
// If closing delimeter
// is encountered
else if (str[i] == closeDelimiter && dels.Count > 0)
{
// Extract the position
// of opening delimeter
int pos = dels.Peek();
dels.Pop();
// Length of substring
int len = i - 1 - pos;
// Extract the substring
string headerSubstring = str.Substring(pos + 1, len);
bool hasKey = mapperData.HeaderCdataIndexKeyVals.TryGetValue(headerSubstring.ToUpper(), out int headerCdataIndex);
string[] headerSubstringSplit = headerSubstring.Split('.');
string headerCDataVal = string.Empty;
if (hasKey)
{
if (headerSubstring.Contains("CONTAINER.CONTEXT", StringComparison.OrdinalIgnoreCase))
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper() + '.' + headerSubstringSplit[2].ToUpper()][headerCdataIndex];
//mapperData.HeaderCdataArrayKeyVals[]
}
else
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper()][headerCdataIndex];
}
string strToReplace = openDelimiter + headerSubstring + closeDelimiter;
string sub = str.Remove(i + 1);
sub = sub.Replace(strToReplace, headerCDataVal);
newString += sub;
}
else if (headerSubstring == "WSM" && closeDelimiter == '#')
{
string sub = str.Remove(len + 1);
newString += sub.Replace(openDelimiter + headerSubstring + closeDelimiter, "");
}
else
{
newString += let;
}
}
}
return newString;
}
}
But my output turns out to be:
"\tFie\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw0.2.002\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw#data.tool_context.TOOL_SOFTWARE_VERSION#\", \"WSM102"
Can someone help understand why this is happening and how I can go about correcting it so I get the output:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Am i even trying to solve this the right way or is there a better cleaner way to do it? Btw if the key is not in the dictionary I replace it with empty string

Related

How to iterate through data and create a new text file every nth entries

I'm making a list of lines that need to be added to a .txt file (with tab delimitation). The text file needs to have a maximum of 500 entries plus a header.
Right now, I have this code, which is successfully iterating through my list and creating the text file with the header. If the file already exists, it appends the lines in my list without adding the header.
I can't quite figure out how to make a new file, add the header and add each line after my first file surpasses 500 entries.
Can you help me separate in 500 line files with headers? Thank you
This is the code I have so far:
var tab = new StringBuilder();
foreach (var line in textlinestoadd)
{
tab.AppendLine(line.ToString());
}
if (!File.Exists(textcsvpath))
{
string textheader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllLines(textcsvpath, textlinestoadd);
This seems like a good practice opportunity so I will leave the code part as exercise!
The basic idea is simple. Whenever you wrote 500 lines just reset and write to a new file
here is a high level pseudo code
Initialize StringBuilder sb
For each line do
Add line to sb
if line count == 500 then
save to file
reset sb
reset line count
update filename = next file
end if
End For
//writes the last chunk if # of lines is not multiple of 500
if line count is not 0 then
save to file
end if
I'd try something like this.
var tab = new StringBuilder();
int lineCount = 0;
string textheader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
if (File.Exists(textcsvpath)) {
FileStream fs = File.OpenRead(textcsvpath);
string[] fileContent = File.ReadAllLines(textcsvpath);
lineCount = fileContent.Length - 1; // assume the first line is the header
}
foreach (var line in textlinestoadd)
{
tab.AppendLine(line.ToString());
lineCount++;
if (lineCount > 0 && lineCount % 500 == 0)
{
if (!File.Exists(textcsvpath))
{
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllText(textcsvpath, tab.ToString());
tab.Clear();
textcsvpath = "some-new-file-name";
}
}
if (!File.Exists(textcsvpath))
{
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllText(textcsvpath, tab.ToString());
You'll need to do something to determine the new file name as you add a new file.
I'd do something like this:
const int limit = 500;
int iteration = 0;
string textHeader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
while(iteration * limit < textLinesToAdd.Count())
{
string fullPath = Path.Combine(filePath, $"{fileName}.{iteration}", extension);
IEnumerable<string> linesToAdd = textLinesToAdd.Skip(iteration++ * limit).Take(limit);
File.Create(fullPath);
File.WriteAllText(fullPath, textHeader);
File.AppendAllLines(fullPath, linesToAdd);
}
Define that filename as foo and the extension as bar, and you'll get a sequence of files called foo.0.bar, foo.1.bar, foo.2.bar and so on.
I'm assuming we want to create a file with the specified name, and then have some integer placed between the name and extension that increments every time a new file is created.
One way to do this would be to have a method that takes in a filePath string, a list of lines to write, a header string, and the maximum number of lines allowed per file. Then it could parse the directory of the file path, looking for a pattern related to the file name.
It would determine what the latest file name should be based on the contents of the directory and the number of lines in the last file that matches our pattern, then would write to that file until it was full, and then continue creating new files until the lines were all written.
Here's a sample class that can do that, where I added some helper methods to get a file's number, increment that number in the name, get the latest file from a directory, and write lines to the file. It also implements IComparer<string> so that we can pass it to OrderByDescending to easily sort the files we're interested in.
public class FileWriterHelper : IComparer<string>
{
public int Compare(string x, string y)
{
// Compare null
if (x == null) return y == null ? 0 : 1;
if (y == null) return -1;
// Compare count of parts split on '.'
var xParts = x.Split('.');
var yParts = y.Split('.');
if (xParts.Length < 3) return yParts.Length < 3 ? 0 : -1;
if (yParts.Length < 3) return 1;
// Compare numeric portion
int xNum, yNum;
if (int.TryParse(xParts[1], out xNum) &&
int.TryParse(yParts[1], out yNum))
{
return xNum.CompareTo(yNum);
}
// Unknown values
return string.Compare(x, y, StringComparison.Ordinal);
}
private static int? GetFileNumber(string fileName)
{
if (string.IsNullOrWhiteSpace(fileName)) return null;
var fileParts = fileName.Split('.');
int fileNum;
if (fileParts.Length < 3 || !int.TryParse(fileParts[1], out fileNum)) return null;
return fileNum;
}
private static string IncrementNumber(string fileName)
{
var number = GetFileNumber(fileName).GetValueOrDefault() + 1;
var fileParts = fileName.Split('.');
return $"{fileParts[0]}.{number}.{fileParts[fileParts.Length - 1]}";
}
private static string GetLatestFile(string filePath, int maxLines)
{
var fileDir = Path.GetDirectoryName(filePath);
var fileName = Path.GetFileNameWithoutExtension(filePath);
var fileExt = Path.GetExtension(filePath);
var latest = Directory.GetFiles(fileDir, $"{fileName}*{fileExt}")
.OrderByDescending(f => f, new FileWriterHelper())
.FirstOrDefault() ?? filePath;
return File.Exists(latest) && File.ReadAllLines(latest).Length >= maxLines
? Path.Combine(fileDir, IncrementNumber(Path.GetFileName(latest)))
: latest;
}
public static void WriteLinesToFile(string filePath, string header,
List<string> lines, int maxFileLines)
{
while ((lines?.Count ?? 0) > 0 && maxFileLines > 0)
{
var latestFile = GetLatestFile(filePath, maxFileLines);
if (!File.Exists(latestFile)) File.CreateText(latestFile).Close();
var lineCount = File.ReadAllLines(latestFile).Length;
if (lineCount == 0 && header != null)
{
File.WriteAllText(latestFile, string.Concat(header, Environment.NewLine));
lineCount = 1;
}
var numLinesToWrite = maxFileLines - lineCount;
File.AppendAllLines(latestFile, lines.Take(numLinesToWrite));
lines = lines.Skip(numLinesToWrite).ToList();
}
}
}
That was a bit of work, but now to use it is really simple:
private static void Main()
{
// Generate 5000 lines to write
var fileLines = Enumerable.Range(0, 5000).Select(i => $"Line number {i}").ToList();
// File path with base file name
var filePath = #"f:\public\temp\temp.csv";
// This should create 10 files
FileWriterHelper.WriteLinesToFile(filePath,
"HEADER: This should be the first line in each file.", fileLines, 500);
GetKeyFromUser("\nDone! Press any key to exit...");
}
If you run that once, it will create 10 files (because of the number of lines we're generating and the max number of lines per file we specified). And if you run it again, it will create 10 more, since we're using the same path and file name pattern, it recognizes the previous files that were in the location.
I'm sure it could use some work, but hopefully it's a start!

C# string parse

I have string like this
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'"
Each pair of single quote is a field delimited by a comma. I want to empty the 8th field in the string. I cannot simply do replace("MUL,NBLD,NITA,NUND","") because that field could contain anything. also please note the the 4th field is a number and therefore has no single quote around 5.
How can I achieve this?
static void Main()
{
var temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
var parts = Split(temp).ToArray();
parts[7] = null;
var ret = string.Join(",", parts);
// or replace the above 3 lines with this...
//var ret = string.Join(",", Split(temp).Select((v,i)=>i!=7 ? v : null));
//ret == "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40',,'','Address line 2'"
}
public static IEnumerable<string> Split(string input, char delimiter = ',', char quote = '\'')
{
string temp = "";
bool skipDelimiter = false;
foreach (var c in input)
{
if (c == quote)
skipDelimiter = !skipDelimiter;
else if (c == delimiter && !skipDelimiter)
{
//do split
yield return temp;
temp = "";
continue;
}
temp += c;
}
yield return temp;
}
I made a small implementation below. I explain the logic in the comments. Basically you want to write a simple parser to accomplish what you described.
edit0: just realized I did the opposite of what you asked for oops..fixed now
edit1: replacing the string with null as opposed to eliminating the entire field from the comma-delimited list.
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
//keep track of the single quotes
int singleQuoteCount= 0;
//keep track of commas
int comma_count = 0;
String field = "";
foreach (Char chr in temp)
{
//add to the field string if we are not between the 7th and 8th comma not counting commas between single quotes
if (comma_count != 7)
field += chr;
//plug in null string between two single quotes instead of whatever chars are in the eigth field.
else if (chr == '\'' && singleQuoteCount %2 ==1)
field += "\'',";
if (chr == '\'') singleQuoteCount++;
//only want to add to comma_count if we are outside of single quotes.
if (singleQuoteCount % 2 == 0 && chr == ',') comma_count++;
}
}
If you would use '-' (or other char) instead of ',' inside of the fields (exam: 'MUL-NBLD-NITA-NUND'), you could use this code:
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL-NBLD-NITA-NUND','','Address line 2'";
temp = replaceField(temp, 8);
}
static string replaceField(string list, int field)
{
string[] fields = list.Split(',');
string chosenField = fields[field - 1 /*<--Arrays start at 0!*/];
if(!(field == fields.Length))
list = list.Replace(chosenField + ",", "");
else
list = list.Replace("," + chosenField, "");
return list;
}
//Return-Value: "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','','Address line 2'"

How remove some special words from a string content?

I have some strings containing code for emoji icons, like :grinning:, :kissing_heart:, or :bouquet:. I'd like to process them to remove the emoji codes.
For example, given:
Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
I want to get this:
Hello , how are you? Are you fine?
I know I can use this code:
richTextBox2.Text = richTextBox1.Text.Replace(":kissing_heart:", "").Replace(":bouquet:", "").Replace(":grinning:", "").ToString();
However, there are 856 different emoji icons I have to remove (which, using this method, would take 856 calls to Replace()). Is there any other way to accomplish this?
You can use Regex to match the word between :anything:. Using Replace with function you can make other validation.
string pattern = #":(.*?):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, (m) =>
{
if (m.ToString().Split(' ').Count() > 1) // more than 1 word and other validations that will help preventing parsing the user text
{
return m.ToString();
}
return String.Empty;
}); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
If you don't want to use Replace that make use of a lambda expression, you can use \w, as #yorye-nathan mentioned, to match only words.
string pattern = #":(\w*):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, String.Empty); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
string Text = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
i would solve it that way
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
Emoj.ForEach(x => Text = Text.Replace(x, string.Empty));
UPDATE - refering to Detail's Comment
Another approach: replace only existing Emojs
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
var Matches = Regex.Matches(Text, #":(\w*):").Cast<Match>().Select(x => x.Value);
Emoj.Intersect(Matches).ToList().ForEach(x => Text = Text.Replace(x, string.Empty));
But i'm not sure if it's that big difference for such short chat-strings and it's more important to have code that's easy to read/maintain. OP's question was about reducing redundancy Text.Replace().Text.Replace() and not about the most efficient solution.
I would use a combination of some of the techniques already suggested. Firstly, I'd store the 800+ emoji strings in a database and then load them up at runtime. Use a HashSet to store these in memory, so that we have a O(1) lookup time (very fast). Use Regex to pull out all potential pattern matches from the input and then compare each to our hashed emoji, removing the valid ones and leaving any non-emoji patterns the user has entered themselves...
public class Program
{
//hashset for in memory representation of emoji,
//lookups are O(1), so very fast
private HashSet<string> _emoji = null;
public Program(IEnumerable<string> emojiFromDb)
{
//load emoji from datastore (db/file,etc)
//into memory at startup
_emoji = new HashSet<string>(emojiFromDb);
}
public string RemoveEmoji(string input)
{
//pattern to search for
string pattern = #":(\w*):";
string output = input;
//use regex to find all potential patterns in the input
MatchCollection matches = Regex.Matches(input, pattern);
//only do this if we actually find the
//pattern in the input string...
if (matches.Count > 0)
{
//refine this to a distinct list of unique patterns
IEnumerable<string> distinct =
matches.Cast<Match>().Select(m => m.Value).Distinct();
//then check each one against the hashset, only removing
//registered emoji. This allows non-emoji versions
//of the pattern to survive...
foreach (string match in distinct)
if (_emoji.Contains(match))
output = output.Replace(match, string.Empty);
}
return output;
}
}
public class MainClass
{
static void Main(string[] args)
{
var program = new Program(new string[] { ":grinning:", ":kissing_heart:", ":bouquet:" });
string output = program.RemoveEmoji("Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:");
Console.WriteLine(output);
}
}
Which results in:
Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:,
but valid :nonetheless:
You do not have to replace all 856 emoji's. You only have to replace those that appear in the string. So have a look at:
Finding a substring using C# with a twist
Basically you extract all tokens ie the strings between : and : and then replace those with string.Empty()
If you are concerned that the search will return strings that are not emojis such as :some other text: then you could have a hash table lookup to make sure that replacing said found token is appropriate to do.
Finally got around to write something up. I'm combining a couple previously mentioned ideas, with the fact we should only loop over the string once. Based on those requirement, this sound like the perfect job for Linq.
You should probably cache the HashSet. Other than that, this has O(n) performance and only goes over the list once. Would be interesting to benchmark, but this could very well be the most efficient solution.
The approach is pretty straight forwards.
First load all Emoij in a HashSet so we can quickly look them up.
Split the string with input.Split(':') at the :.
Decide if we keep the current element.
If the last element was a match, keep the current element.
If the last element was no match, check if the current element matches.
If it does, ignore it. (This effectively removes the substring from the output).
If it doesn't, append : back and keep it.
Rebuild our string with a StringBuilder.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
ISet<string> emojiList = new HashSet<string>(new[] { "kissing_heart", "bouquet", "grinning" });
Console.WriteLine("Hello:grinning: , ho:w: a::re you?:kissing_heart:kissing_heart: Are you fine?:bouquet:".RemoveEmoji(':', emojiList));
Console.ReadLine();
}
public static string RemoveEmoji(this string input, char delimiter, ISet<string> emojiList)
{
StringBuilder sb = new StringBuilder();
input.Split(delimiter).Aggregate(true, (prev, curr) =>
{
if (prev)
{
sb.Append(curr);
return false;
}
if (emojiList.Contains(curr))
{
return true;
}
sb.Append(delimiter);
sb.Append(curr);
return false;
});
return sb.ToString();
}
}
}
Edit: I did something cool using the Rx library, but then realized Aggregate is the IEnumerable counterpart of Scan in Rx, thus simplifying the code even more.
If efficiency is a concern and to avoid processing "false positives", consider rewriting the string using a StringBuilder while skipping the special emoji tokens:
static HashSet<string> emojis = new HashSet<string>()
{
"grinning",
"kissing_heart",
"bouquet"
};
static string RemoveEmojis(string input)
{
StringBuilder sb = new StringBuilder();
int length = input.Length;
int startIndex = 0;
int colonIndex = input.IndexOf(':');
while (colonIndex >= 0 && startIndex < length)
{
//Keep normal text
int substringLength = colonIndex - startIndex;
if (substringLength > 0)
sb.Append(input.Substring(startIndex, substringLength));
//Advance the feed and get the next colon
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
if (colonIndex < 0) //No more colons, so no more emojis
{
//Don't forget that first colon we found
sb.Append(':');
//Add the rest of the text
sb.Append(input.Substring(startIndex));
break;
}
else //Possible emoji, let's check
{
string token = input.Substring(startIndex, colonIndex - startIndex);
if (emojis.Contains(token)) //It's a match, so we skip this text
{
//Advance the feed
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
}
else //No match, so we keep the normal text
{
//Don't forget the colon
sb.Append(':');
//Instead of doing another substring next loop, let's just use the one we already have
sb.Append(token);
startIndex = colonIndex;
}
}
}
return sb.ToString();
}
static void Main(string[] args)
{
List<string> inputs = new List<string>()
{
"Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:",
"Tricky test:123:grinning:",
"Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:"
};
foreach (string input in inputs)
{
Console.WriteLine("In <- " + input);
Console.WriteLine("Out -> " + RemoveEmojis(input));
Console.WriteLine();
}
Console.WriteLine("\r\n\r\nPress enter to exit...");
Console.ReadLine();
}
Outputs:
In <- Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
Out -> Hello , how are you? Are you fine?
In <- Tricky test:123:grinning:
Out -> Tricky test:123
In <- Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:
Out -> Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:, but valid :nonetheless:
Use this code I put up below I think using this function your problem will be solved.
string s = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
string rmv = ""; string remove = "";
int i = 0; int k = 0;
A:
rmv = "";
for (i = k; i < s.Length; i++)
{
if (Convert.ToString(s[i]) == ":")
{
for (int j = i + 1; j < s.Length; j++)
{
if (Convert.ToString(s[j]) != ":")
{
rmv += s[j];
}
else
{
remove += rmv + ",";
i = j;
k = j + 1;
goto A;
}
}
}
}
string[] str = remove.Split(',');
for (int x = 0; x < str.Length-1; x++)
{
s = s.Replace(Convert.ToString(":" + str[x] + ":"), "");
}
Console.WriteLine(s);
Console.ReadKey();
I'd use extension method like this:
public static class Helper
{
public static string MyReplace(this string dirty, char separator)
{
string newText = "";
bool replace = false;
for (int i = 0; i < dirty.Length; i++)
{
if(dirty[i] == separator) { replace = !replace ; continue;}
if(replace ) continue;
newText += dirty[i];
}
return newText;
}
}
Usage:
richTextBox2.Text = richTextBox2.Text.MyReplace(':');
This method show be better in terms of performance compare to one with Regex
I would split the text with the ':' and then build the string excluding the found emoji names.
const char marker = ':';
var textSections = text.Split(marker);
var emojiRemovedText = string.Empty;
var notMatchedCount = 0;
textSections.ToList().ForEach(section =>
{
if (emojiNames.Contains(section))
{
notMatchedCount = 0;
}
else
{
if (notMatchedCount++ > 0)
{
emojiRemovedText += marker.ToString();
}
emojiRemovedText += section;
}
});

File ReadLine Count

I am trying to read from a file,the program goes through each line of the text file comparing the first 8 characters of the each line and joining these lines into one where the 8 characters are similar.See code:
while ((line1 = fileread1.ReadLine()) != null)
{
line2 = fileread2.ReadLine();
while (line2 != null)
{
if (line1.Length >= 8 && line2.Length >= 8 &&
line1.Substring(0, 8) == line2.Substring(0, 8))
{
//line2 = line2.Remove(0, 60);
line1 = line1 +" "+ line2;
}
line2 = fileread3.ReadLine();
counter2++;
}
filewrite.WriteLine(line1);
counter1++;
}
Qusetion 1:
How can i get the count of fileread2 and assign it to fileread3,because i need every time the inner loop executes to reset the count of fileread3 to be the same as fileread2.
Question 2:
How do i write the combined lines as single line where the first 8 characters match.
From reading the comments what I understand is that you actually want to remove duplicates. Or, to be more specific, the lines that start with the same 8 characters.
If that's the case why not use a collection of keys to remember whether the string with the same 8 initial characters has not been already loaded.
E.g. you can have a collection (like List) where you add the 8 character substring. Then, before writing the line you first check if the substring is already in the collection. If it is then you know it's a duplicate and you don't write it.
Another option, if you want to calculate how many duplicates of a specific type there are, would be to use a Dictionary. When you have a duplicate of specific type you can then increase the counter in the dictionary.
Since you have large files, probably File.ReadAllLines is not an option for you. You can declare the following methods (we need to work with Stream, since StreamReader doesn't have Position property):
public static string ReadLine(Stream stream)
{
return ReadLine(stream, Encoding.UTF8);
}
public static string ReadLine(Stream stream, Encoding encoding)
{
List<byte> lineBytes = new List<byte>();
while (stream.Position < stream.Length)
{
byte b = (byte)stream.ReadByte();
if (b == 0x0a)
break;
if (b == 0x0d)
continue;
lineBytes.Add(b);
}
return encoding.GetString(lineBytes.ToArray());
}
Sample code:
string sourceFileName = "input.txt";
string targetFileName = "output.txt";
using (StreamWriter targetWriter = new StreamWriter(targetFileName))
using (FileStream sourceStream = File.OpenRead(sourceFileName))
{
HashSet<string> processedKeys = new HashSet<string>();
while (sourceStream.Position < sourceStream.Length)
{
string line = ReadLine(sourceStream);
if (line.Length < 8)
targetWriter.WriteLine(line);
else
{
string key = line.Substring(0, 8);
if (processedKeys.Contains(key))
continue;
targetWriter.Write(line);
long backupPosition = sourceStream.Position;
while (sourceStream.Position < sourceStream.Length)
{
string dupLine = ReadLine(sourceStream);
if (dupLine.Length < 8)
continue;
string dupKey = dupLine.Substring(0, 8);
if (dupKey == key)
targetWriter.Write(" " + dupLine);
}
sourceStream.Position = backupPosition;
targetWriter.WriteLine();
processedKeys.Add(key);
}
}
}
If you want to group the data based on the first 8 characters, all you need is:
var groups = File.ReadLines(your_file_name).GroupBy(line => line.Substring(0, 8));
To count the elements you call Count() on the elements in the groups
groups.First().Count()
(you can also enumerate over groups via foreach (var group in groups)..)
The group result will contain the complete line from the file. If you don't want the "key" (i.e. the first 8 characters) in the result, you can use the overload of GroupBy
var groups = File.ReadLines(your_file_name).GroupBy(line => line.Substring(0, 8), line => line.Substring(8));
In order to get new output with the "key" and the lines concatenated, you would use something like this:
var groups = File.ReadLines(input_file).GroupBy(line => line.Substring(0, 8), line => line.Substring(8));
File.WriteAllLines(output_file, groups.Select(group => string.Format("{0}{1}", group.Key, string.Join(string.Empty, group))));

comparing string and variable but failing based on contains

What I have going on is I have two files. Both files are delimited by '|'. If file 1 matches a line in file 2 I need to combine the lines. Here is the code:
string[] mathlines = File.ReadAllLines(#"C:\math.txt");
var addlines = File.ReadAllLines(#"K:\add.txt");
foreach (string ml in mathlines)
{
string[] parse = ml.Split('|');
if (addlines.Contains(parse[0]))
{
File.AppendAllText(#"C:\final.txt", parse[0]+"|"+parse[1]+"\n");
}
else
{
File.AppendAllText(#"C:\final.txt", ml + "\n");
}
}
I realize that the math part isn't setup yet, but I need to get the match part working.
Here is an example:
mathlines:
dart|504.91
GI|1782.06
Gcel|194.52
clay|437.35
grado|217.77
greGCR|14.82
rp|372.54
rp2|11.92
gsg|349.92
GSxil|4520.55
addlines:
Gimet|13768994304
GSxil|394735896576
Ho|4994967296
gen|485331304448
GSctal|23482733690
Obr|88899345920
As you can see mathlines contains GSxil and so does addlines but my if (addlines.Contains) never fines the variable in addlines. Any help is always loved! Thanks.
Sorry forgot to mention that I need it to match exactly on the comparison. Also i need to split out the variable on the correct line that matches. So I would need to split out the 394735896576 this example and then append the 394735896576.
addLines.Contains(parse[0]) is going to match on the entire string; you need to match based on part. There are more efficient solutions, but a O(n^2) option is to use LINQ Any():
if (addLines.Any(l => l.StartsWith(parse[0])))
{
...
You could load all lines from addlines.txt into a dictionary and then use that to find a match for each line in mathlines.txt. This method would be much faster than what you have currently.
string[] mathlines = File.ReadAllLines(#"C:\math.txt");
string[] addlines = File.ReadAllLines(#"K:\addlines.txt");
string[] finallines = new string[mathlines.Length];
var addlinesLookup = new Dictionary<string, string>();
for (int i = 0; i < addlines.Length; i++)
{
string[] parts = addlines[i].Split('|');
if (parts.Length == 2) // Will there ever be more than 2 parts?
{
addlinesLookup.Add(parts[0], parts[1]);
}
}
for (int i = 0; i < mathlines.Length; i++)
{
string[] parts = mathlines[i].Split('|');
if (parts.Length >= 1)
{
if (addlinesLookup.ContainsKey(parts[0]))
{
finallines[i] = mathlines[i] + "|" + addlinesLookup[parts[0]] + "\n";
}
{
finallines[i] = mathlines[i] + "\n";
}
}
}
File.AppendAllLines(#"C:\final.txt", finallines, Encoding.ASCII);

Categories