Find string in txt file using a list c# [closed]

Find string in txt file using a list c# [closed] - c#

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to find out if a .txt file contains words stored in a list named Abreviated. This list is filled by reading values from a csv file as shown below;
StreamReader sr = new StreamReader(#"C:\textwords.csv");
string TxtWrd = sr.ReadLine();
while ((TxtWrd = sr.ReadLine()) != null)
{
Words = TxtWrd.Split(Seperators, StringSplitOptions.None);
Abreviated.Add(Words[0]);
Expanded.Add(Words[1]);
}
I would like to use this list to check if a .txt file contains any of the words in the list. The .txt file is being read using a streamreader and is stored as a string FileContent. the code i have to try and find the matches is below;
if (FC.Contains(Abreviated.ToString()))
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
This will always return the else statement even though one of the words is in the text file.
any advice on how to get this working?
Thanks in advance!

You can use key-value pair data structure for storing abbreviated word and respective full word as key-value pair. In C#, Dictionary has generic implementation for storing key value pair.
I've refactored your code which makes easy to reuse.
internal class FileParser
{
internal Dictionary<string, string> WordDictionary = new Dictionary<string, string>();
private string _filePath;
private char Seperators => ',';
internal FileParser(string filePath)
{
_filePath = filePath;
}
internal void Parse()
{
StreamReader sr = new StreamReader(_filePath);
string TxtWrd = sr.ReadLine();
while ((TxtWrd = sr.ReadLine()) != null)
{
var words = TxtWrd.Split(Seperators, StringSplitOptions.None);
//WordDictionary.TryAdd(Words[0], Words[1]); // available in .NET corefx https://github.com/dotnet/corefx/issues/1942
if (!WordDictionary.ContainsKey(words[0]))
WordDictionary.Add(words[0], words[1]);
}
}
internal bool IsWordAvailable(string word)
{
return WordDictionary.ContainsKey(word);
}
}
Now, you can reuse above class within your assembly like in following way :
public class Program
{
public static void Main(string[] args)
{
var fileParser = new FileParser(#"C:\textwords.csv");
if(fileParser.IsWordAvailable("abc"))
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
}
}

You are comparing your entire file's content to the string representation of a collections of words. You need to compare each individual word found in the file content to your abbreviated list. One way you could do the comparison is to split the file content into individual words and then look those up individually against your abbreviated list.
string[] fileWords = FC.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
bool hasMatch = false;
for(string fileWord : fileWords)
{
if(Abbreviated.Contains(fileWord))
{
hasMatch = true;
break;
}
}
if (hasMatch)
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
I would recommend switching your abbreviated collection to a HashSet or a Dictionary that also includes your matching expanded text for the abbreviation. Also, there are probably alternate ways to do the search you are looking for with regex.

I'm unsure on what some of your variables are so this may be slightly different to what you have, but gives the same functionality.
static void Main(string[] args)
{
List<string> abbreviated = new List<string>();
List<string> expanded = new List<string>();
StreamReader sr = new StreamReader("textwords.csv");
string TxtWrd = "";
while ((TxtWrd = sr.ReadLine()) != null)
{
Debug.WriteLine("line: " + TxtWrd);
string[] Words = TxtWrd.Split(new char[] { ',' } , StringSplitOptions.None);
abbreviated.Add(Words[0]);
expanded.Add(Words[1]);
}
if (abbreviated.Contains("wuu2"))
{
//show message box
} else
{
//don't
}
}
As mentioned in one of the comments, a Dictionary might be better suited for this.
This assumes that the data in your file is in the following format, with a new set on each line.
wuu2,what are you up to

If all you want to do is check if a text file contains words in your list, you can read the entire contents of the file into a string (instead of line by line), split the string on your separators, and then check if the intersection of the words in the text file and your list of words has any items:
// Get the "separators" into a list
var wordsFile = #"c:\public\temp\textWords.csv"; // (#"C:\textwords.csv");
var separators = File.ReadAllText(wordsFile).Split(',');
// Get the words of the file into a list (add more delimeters as necessary)
var txtFile = #"c:\public\temp\temp.txt";
var allWords = File.ReadAllText(txtFile).Split(new[] {' ', '.', ',', ';', ':', '\r', '\n'});
// Get the intersection of the file words and the separator words
var commonWords = allWords.Intersect(separators).ToList().Distinct();
if (commonWords.Any())
{
Console.WriteLine("The text file contains the following matching words:");
Console.WriteLine(string.Join(", ", commonWords));
}
else
{
Console.WriteLine("The file did not contain any matching words.");
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();

Related

Compare two strings to find any duplicates

I really cannot find the answer to this question and it is driving me mental. I have two strings, one is a text file that is read into a string called logfile. The other is just a user input string, called text1. Eventually it's just going to be a guessing game with hints, but I can't figure out how to compare these two for equality.
string LOG_PATH = "E:\\Users\\start.txt";
string logfile = File.ReadAllText(LOG_PATH);
string text1 = "";
text1 = Console.ReadLine();
if (logfile.Contains(text1))
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}
This code works fine when there is only one word in the text file and matches. If the text file only contains the word "Mostly" and the user entered mostly and a bunch of other words, the console prints found. But if the text file has mostly and a bunch of other random words, say "Mostly cloudy today", the console prints not found. Is it possible to match to strings for ANY duplicates at all?

You can try it with different ways,
Using Except(),
var wordsFromFile = File.ReadAllText(LOG_PATH).Split(' ').ToList();
var inputWords = Console.ReadLine().Split(' ').ToList();
Console.WriteLine(wordsFromFile.Except(inputWords).Any() ? "Found" : "Not Found");
Similar way using foreach() loop,
var wordsFromFile = File.ReadAllText(LOG_PATH).Split(' ').ToList();
var inputWords = Console.ReadLine();
string result = "Not Found";
foreach(var word in inputWords)
{
if(wordsFromFile.Contains(word))
{
result = "Found";
break
}
}
Console.WriteLine(result);

Very similar to what Prasad did, except we ignore blank lines and use a case-insensitive comparison:
string LOG_PATH = #"E:\Users\start.txt";
List<String> logfileWords = new List<String>();
foreach (String line in File.ReadLines(LOG_PATH))
{
logfileWords.AddRange(line.Trim().Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries));
}
Console.Write("Words to search for (separated by spaces): ");
String[] inputs = Console.ReadLine().Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine("Inputs:");
foreach(String input in inputs)
{
Console.WriteLine(input + " ==> " + (logfileWords.Any(w => w.Equals(input, StringComparison.InvariantCultureIgnoreCase)) ? "found" : "not found"));
}

Dealing with fields containing unescaped double quotes with TextFieldParser

I am trying to import a CSV file using TextFieldParser. A particular CSV file is causing me problems due to its nonstandard formatting. The CSV in question has its fields enclosed in double quotes. The problem appears when there is an additional set of unescaped double quotes within a particular field.
Here is an oversimplified test case that highlights the problem. The actual CSV files I am dealing with are not all formatted the same and have dozens of fields, any of which may contain these possibly tricky formatting issues.
TextReader reader = new StringReader("\"Row\",\"Test String\"\n" +
"\"1\",\"This is a test string. It is parsed correctly.\"\n" +
"\"2\",\"This is a test string with a comma, which is parsed correctly\"\n" +
"\"3\",\"This is a test string with double \"\"double quotes\"\". It is parsed correctly\"\n" +
"\"4\",\"This is a test string with 'single quotes'. It is parsed correctly\"\n" +
"5,This is a test string with fields that aren't enclosed in double quotes. It is parsed correctly.\n" +
"\"6\",\"This is a test string with single \"double quotes\". It can't be parsed.\"");
using (TextFieldParser parser = new TextFieldParser(reader))
{
parser.Delimiters = new[] { "," };
while (!parser.EndOfData)
{
string[] fields= parser.ReadFields();
Console.WriteLine("This line was parsed as:\n{0},{1}",
fields[0], fields[1]);
}
}
Is there anyway to properly parse a CSV with this type of formatting using TextFieldParser?

I agree with Hans Passant's advice that it is not your responsibility to parse malformed data. However, in accord with the Robustness Principle, some one faced with this situation may attempt to handle specific types of malformed data. The code I wrote below works on the data set specified in the question. Basically it detects the parser error on the malformed line, determines if it is double-quote wrapped based on the first character, and then splits/strips all the wrapping double-quotes manually.
using (TextFieldParser parser = new TextFieldParser(reader))
{
parser.Delimiters = new[] { "," };
while (!parser.EndOfData)
{
string[] fields = null;
try
{
fields = parser.ReadFields();
}
catch (MalformedLineException ex)
{
if (parser.ErrorLine.StartsWith("\""))
{
var line = parser.ErrorLine.Substring(1, parser.ErrorLine.Length - 2);
fields = line.Split(new string[] { "\",\"" }, StringSplitOptions.None);
}
else
{
throw;
}
}
Console.WriteLine("This line was parsed as:\n{0},{1}", fields[0], fields[1]);
}
}
I'm sure it is possible to concoct a pathological example where this fails (e.g. commas adjacent to double-quotes within a field value) but any such examples would probably be unparseable in the strictest sense, whereas the problem line given in the question is decipherable despite being malformed.

Jordan's solution is quite good, but it makes an incorrect assumption that the error line will always begin with a double-quote. My error line was this:
170,"CMS ALT",853,,,NON_MOVEX,COM,NULL,"2014-04-25","" 204 Route de Trays"
Notice the last field had extra/unescaped double quotes, but the first field was fine. So Jordan's solution didn't work. Here is my modified solution based on Jordan's:
using(TextFieldParser parser = new TextFieldParser(new StringReader(csv))) {
parser.Delimiters = new [] {","};
while (!parser.EndOfData) {
string[] fields = null;
try {
fields = parser.ReadFields();
} catch (MalformedLineException ex) {
string errorLine = SafeTrim(parser.ErrorLine);
fields = errorLine.Split(',');
}
}
}
You may want to handle the catch block differently, but the general concept works great for me.

It may be easier to just do this manually, and it would certainly give you more control:
Edit:
For your clarified example, i still suggest manually handling the parsing:
using System.IO;
string[] csvFile = File.ReadAllLines(pathToCsv);
foreach (string line in csvFile)
{
// get the first comma in the line
// everything before this index is the row number
// everything after is the row value
int firstCommaIndex = line.IndexOf(',');
//Note: SubString used here is (startIndex, length)
string row = line.Substring(0, firstCommaIndex+1);
string rowValue = line.Substring(firstCommaIndex+1).Trim();
Console.WriteLine("This line was parsed as:\n{0},{1}",
row, rowValue);
}
For a generic CSV that does not allow commas in the fields:
using System.IO;
string[] csvFile = File.ReadAllLines(pathToCsv);
foreach (string line in csvFile)
{
string[] fields = line.Split(',');
Console.WriteLine("This line was parsed as:\n{0},{1}",
fields[0], fields[1]);
}

Working Solution :
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = false;
string[] colFields = csvReader.ReadFields();
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
for (i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
else
{
if (fieldData[i][0] == '"' && fieldData[i][fieldData[i].Length - 1] == '"')
{
fieldData[i] = fieldData[i].Substring(1, fieldData[i].Length - 2);
}
}
}
csvData.Rows.Add(fieldData);
}
}

If you dont set HasFieldsEnclosedInQuotes = true the resultant list of columns will be more if the data contains (,) comma.
e.g
"Col1","Col2","Col3"
"Test1", 100, "Test1,Test2"
"Test2", 200, "Test22"
This file should have 3 columns but while parsing you will get 4 fields which is wrong.

Please set HasFieldsEnclosedInQuotes = true on TextFieldParser object before you start reading file.

Search and replace values in text file with C#

I have a text file with a certain format. First comes an identifier followed by three spaces and a colon. Then comes the value for this identifier.
ID1 :Value1
ID2 :Value2
ID3 :Value3
What I need to do is searching e.g. for ID2 : and replace Value2 with a new value NewValue2. What would be a way to do this? The files I need to parse won't get very large. The largest will be around 150 lines.

If the file isn't that big you can do a File.ReadAllLines to get a collection of all the lines and then replace the line you're looking for like this
using System.IO;
using System.Linq;
using System.Collections.Generic;
List<string> lines = new List<string>(File.ReadAllLines("file"));
int lineIndex = lines.FindIndex(line => line.StartsWith("ID2 :"));
if (lineIndex != -1)
{
lines[lineIndex] = "ID2 :NewValue2";
File.WriteAllLines("file", lines);
}

Here's a simple solution which also creates a backup of the source file automatically.
The replacements are stored in a Dictionary object. They are keyed on the line's ID, e.g. 'ID2' and the value is the string replacement required. Just use Add() to add more as required.
StreamWriter writer = null;
Dictionary<string, string> replacements = new Dictionary<string, string>();
replacements.Add("ID2", "NewValue2");
// ... further replacement entries ...
using (writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadLines("input.txt"))
{
bool replacementMade = false;
foreach (var replacement in replacements)
{
if (line.StartsWith(replacement.Key))
{
writer.WriteLine(string.Format("{0} :{1}",
replacement.Key, replacement.Value));
replacementMade = true;
break;
}
}
if (!replacementMade)
{
writer.WriteLine(line);
}
}
}
File.Replace("output.txt", "input.txt", "input.bak");
You'll just have to replace input.txt, output.txt and input.bak with the paths to your source, destination and backup files.

Ordinarily, for any text searching and replacement, I'd suggest some sort of regular expression work, but if this is all you're doing, that's really overkill.
I would just open the original file and a temporary file; read the original a line at a time, and just check each line for "ID2 :"; if you find it, write your replacement string to the temporary file, otherwise, just write what you read. When you've run out of source, close both, delete the original, and rename the temporary file to that of the original.

Something like this should work. It's very simple, not the most efficient thing, but for small files, it would be just fine:
private void setValue(string filePath, string key, string value)
{
string[] lines= File.ReadAllLines(filePath);
for(int x = 0; x < lines.Length; x++)
{
string[] fields = lines[x].Split(':');
if (fields[0].TrimEnd() == key)
{
lines[x] = fields[0] + ':' + value;
File.WriteAllLines(lines);
break;
}
}
}

You can use regex and do it in 3 lines of code
string text = File.ReadAllText("sourcefile.txt");
text = Regex.Replace(text, #"(?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$)", "NewValue2",
RegexOptions.Multiline);
File.WriteAllText("outputfile.txt", text);
In the regex, (?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$) means, find anything that starts with id2 with any number of spaces before and after :, and replace the following string (any alpha numeric character, excluding punctuations) all the way 'till end of the line. If you want to include punctuations, then replace \w*? with .*?

You can use regexes to achieve this.
Regex re = new Regex(#"^ID\d+ :Value(\d+)\s*$", RegexOptions.IgnoreCase | RegexOptions.Compiled);
List<string> lines = File.ReadAllLines("mytextfile");
foreach (string line in lines) {
string replaced = re.Replace(target, processMatch);
//Now do what you going to do with the value
}
string processMatch(Match m)
{
var number = m.Groups[1];
return String.Format("ID{0} :NewValue{0}", number);
}

How to read different types of data from text file?

I need to read text data from file, where there are different types of data in every line.
So, I created a one big class named subject. My data looks something like this:
Subject name M1 M2 M3 M4
Subject1 5 7 8 3
Old Subject 1 2 5 9
The main question is, if it is possible to read all the data in line 1 for instance and assign it to proper fields, like SubjName = Subject1, M1 = 5, M2 = 7, M3 = 8 and so on, WITHOUT using substrings? (something like stream >> Subject.SubjName; stream >> Subject.M1 = 5 and so on in C++).
Here's the code that I have.
internal void Read()
{
TextReader tr = new StreamReader("Data.txt");
string line;
while ((line = tr.ReadLine()) != null) //read till end of line
{
tr.ReadLine(); //Skips the first line
}
Thanks in advance
EDIT: To clarify, I'd prefer that fields are delimited.

Something like the solution in this question might help, but obviously use a tab delimeter (\t)
CSV to object model mapping
from line in File.ReadAllLines(fileName).Skip(1)
let columns = line.Split(',')
select new
{
Plant = columns[0],
Material = int.Parse(columns[1]),
Density = float.Parse(columns[2]),
StorageLocation = int.Parse(columns[3])
}

It is not clear from your question how the records are stored in the file - whether fields are delimited or fixed length.
Regardless - you can use the TextFieldParser class, which:
Provides methods and properties for parsing structured text files.
It lives in the Microsoft.VisualBasic.FileIO namespace in the Microsoft.VisualBasic.dll assembly.

Split and Dictionary and your two methods of choice here. You read in your line, split it by empty spaces, then save it as a name/object pair in a dictionary.
Put the code below into a *.cs file, then build and run it as a demo:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Collections;
namespace stringsAsObjects
{
class stringObject
{
public static int Main(string[] args)
{
int counter = 0;
string line;
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader("Data.txt");
string nameLine = file.ReadLine();
string valueLine = file.ReadLine();
file.Close();
string[] varNames = nameLine.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
string[] varValues = valueLine.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, object> map = new Dictionary<string, object>();
for(int i = 0; i<varNames.Length; i++)
{
try
{
map[varNames[i]] = varValues[i];
}
catch (Exception ex)
{
map[varNames[i]] = null;
}
}
foreach (object de in map)
{
System.Console.WriteLine(de);
}
Console.ReadKey();
return 0;
}
}
}

Removing duplicate substrings within a string in C#

How can I remove duplicate substrings within a string? so for instance if I have a string like smith:rodgers:someone:smith:white then how can I get a new string that has the extra smith removed like smith:rodgers:someone:white. Also I'd like to keep the colons even though they are duplicated.
many thanks

string input = "smith:rodgers:someone:smith:white";
string output = string.Join(":", input.Split(':').Distinct().ToArray());
Of course this code assumes that you're only looking for duplicate "field" values. That won't remove "smithsmith" in the following string:
"smith:rodgers:someone:smithsmith:white"
It would be possible to write an algorithm to do that, but quite difficult to make it efficient...

Something like this:
string withoutDuplicates = String.Join(":", myString.Split(':').Distinct().ToArray());

Assuming the format of that string:
var theString = "smith:rodgers:someone:smith:white";
var subStrings = theString.Split(new char[] { ':' });
var uniqueEntries = new List<string>();
foreach(var item in subStrings)
{
if (!uniqueEntries.Contains(item))
{
uniqueEntries.Add(item);
}
}
var uniquifiedStringBuilder = new StringBuilder();
foreach(var item in uniqueEntries)
{
uniquifiedStringBuilder.AppendFormat("{0}:", item);
}
var uniqueString = uniquifiedStringBuilder.ToString().Substring(0, uniquifiedStringBuilder.Length - 1);
Is rather long-winded but shows the process to get from one to the other.

not sure why you want to keep the duplicate colons. if you are expecting the output to be "smith:rodgers:someone::white" try this code:
public static string RemoveDuplicates(string input)
{
string output = string.Empty;
System.Collections.Specialized.StringCollection unique = new System.Collections.Specialized.StringCollection();
string[] parts = input.Split(':');
foreach (string part in parts)
{
output += ":";
if (!unique.Contains(part))
{
unique.Add(part);
output += part;
}
}
output = output.Substring(1);
return output;
}
ofcourse i've not checked for null input, but i'm sure u'll do it ;)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find string in txt file using a list c# [closed] - c#

Related

Compare two strings to find any duplicates

Dealing with fields containing unescaped double quotes with TextFieldParser

Search and replace values in text file with C#

How to read different types of data from text file?

Removing duplicate substrings within a string in C#

Categories

Resources