I'm trying to cycle through a .txt to build a test function for another application I'm building.
I've got a list of UK based lat/long values that are formatted like this:
Latitude: 57°39′55″N 57.665198
Longitude: 6°57′27″W -6.95739395
Distance: 184.8338 mi Bearing: 329.815°
with the intended result of this small application being just the lat/long values:
57.665198
-6.95739395
So far I've got a StreamReader working with a myString.StartsWith("Latitude") {} but I'm stuck.
How do I detect a splitstring of 2 spaces " " inside of a string and delete everything before that? My code so far is this:
static void Main(string[] args)
{
string text = "";
using (var streamReader = new StreamReader(#"c:\mb\latlong.txt", Encoding.UTF8))
{
text = streamReader.ReadToEnd();
if (text.Trim().StartsWith("Latitude: "))
{
text.Split()
} else if (text.StartsWith("Distance: "))
{
} else if (text.StartsWith(""))
{
}
streamReader.ReadLine();
}
Console.ReadKey();
}
Thanks in advance
You can try using regular expressions
var result = File
.ReadLines(#"C:\MyFile.txt")
.SelectMany(line => Regex
.Matches(line, #"(?<=\s)-?[0-9]+(\.[0-9]+)*$")
.OfType<Match>()
.Select(match => match.Value));
Test
// 57.665198
// -6.95739395
Console.Write(String.Join(Environment.NewLine, result));
Use string.IndexOf(" ") to find the position of the two spaces in the string. Then you can use string.Substring(position) to get the string after that point.
In your code:
if (text.Trim().StartsWith("Latitude: "))
{
var positionOfTwoSpaces = text.IndexOf(" ");
var latString = text.Substring(positionOfTwoSpaces);
var latValue = float.Parse(latString);
}
You can try the regular expression solution. (You might need to fix up the space counts in the regex definitions)
static void Main(string[] args)
{
string text = "";
Regex lat = new Regex("Latitude: .+? (.+)");
Regex lon = new Regex("Longitude .+? (.+)");
using (var streamReader = new StreamReader(#"c:\mb\latlong.txt", Encoding.UTF8))
{
string line;
while ((line = streamReader.ReadLine() != null)
{
if (lat.IsMatch(line))
lat.Match(line).Groups[1].Value // latitude
else if(lon.IsMatch(line))
lon.Match(line).Groups[1].Value // longitude
}
}
Console.ReadKey();
}
A simple solution would be
string[] fileLines = IO.File.ReadAllLines("input file path");
List<string> resultLines = new List<string>();
foreach (string line in fileLines) {
string[] parts = line.Split(" "); //Double space
if (parts.Count() > 1) {
string lastPart = parts.LastOrDefault();
if (!string.IsNullOrEmpty(lastPart)) {
resultLines.Add(lastPart);
}
}
}
IO.File.WriteAllLines("output file path", resultLines.ToArray());
As I already suggested in my comment. You can look for the last occurrence of the space and substring from there.
using System;
using System.IO;
using System.Text;
public class Test
{
public static void Main()
{
String line = String.Empty;
while(!String.IsNullOrEmpty((line = streamReader.ReadLine())))
{
if(line.StartsWith("Latitude:"))
{
line = line.Substring(line.LastIndexOf(' ') + 1);
Console.WriteLine(line);
}
}
Console.ReadKey();
}
}
Working example.
I didn't provide all the code because this is just copy paste for the longitude case. I think you can do this by your own. :)
Related
string lot = "RU644276G01";
var year = "201" + lot.Substring(2, 1);
var folder = #"\\sinsdn38.ap.infineon.com\ArchView\03_Reports\" + year +
#"\" + lot.Substring(3, 2) + #"\" + lot.Substring(0,8) + #"\";
DirectoryInfo di = new DirectoryInfo(folder);
foreach (var fi in di.GetFiles("*.TLT"))
{
var file = fi.FullName;
string line;
using (StreamReader sr = new StreamReader(file))
{
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("TEST-END"))
{
timeStampTextBox.Text = line;
}
}
}
This is my code currently.
I want to read from a specific line (for example line 8) and the line starts with "Test-End". However, line 8 contains all these
"TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193"
but I only want to read "2017-01-98 15:51".
How do I change my code to get that? Currently I'm getting the whole line instead of the specific timestamp that I want.
Edit
How do I change the code such that the string lot =" " can be any number, meaning it does not need to be RU644276G01, it can be a different number which will be typed by users. I have created a textbox for users to input the number.
You extract the text. It seems quite regular pattern, so regular expressions should be able to help:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var line = "TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193";
Regex re = new Regex(#"^(?:TEST-END : )(.*?\d{4}-\d{2}-\d{2} \d{2}:\d{2})");
var match = re.Match(line);
Console.WriteLine(match.Groups[1]);
Console.ReadLine(); // leave console open
}
}
Output:
2017-01-08 15:51 // this is group 1, group 0 is the full capture including TEST-END :
Use this to check it in regexr: https://regexr.com/3l1sf if you hover about the text it will diplay your capturing groups
The regex means:
^ start of the string
(?:TEST-END : ) non capturing group, text must be present
( a group
.*? as few (0-n) anythings as possible
\d{4}-\d{2}-\d{2} \d{2}:\d{2} 4 digits-2 digits-2digits 2digits:2digits
) end of group
More about regular expressions:
RegEx-Class
a regex Tester (one of many, the one I use): https://regexr.com/
Here is my answer using Regular Expressions.
if (line.StartsWith("TEST-END"))
{
Regex re = new Regex(#"\d{4}-\d{2}-\d{2} \d{2}:\d{2}");
var match = re.Match(line);
if(m.Success)
{
timeStampTextBox.Text = match.Value;
}
}
Output: 2017-01-08 15:51
you can split the line with ":", like this
var value = line.split(':');
and get your date like this.
var date = value[1] + ":" + value[2].Replace("PROGRAM", "");
above statement means
date = "2017-01-98 15" + ":" + "51"
if (line.StartsWith("TEST-END"))
{
var value = line.split(':');
var date = value[1] + ":" + value[2].Replace("PROGRAM", "");
timeStampTextBox.Text = date;
}
This is not the best answer, it depends on exactly the statement you had given.
I finally got all three parameters out of the last line
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> dict = new Dictionary<string, string>();
string pattern = #"(?'name'[^\s]+)\s:\s(?'value'[\w\s\-]*|\d{4}-\d{2}-\d{2}\s\d{2}:\d{2})";
string line = "TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193";
MatchCollection matches = Regex.Matches(line, pattern, RegexOptions.RightToLeft);
foreach (Match match in matches)
{
Console.WriteLine("name : '{0}', value : '{1}'", match.Groups["name"].Value, match.Groups["value"].Value);
dict.Add(match.Groups["name"].Value, match.Groups["value"].Value);
}
DateTime date = DateTime.Parse(dict["TEST-END"]);
Console.ReadLine();
}
}
}
I have a string as in the following format:
"one,",2,3,"four " ","five"
I need output in the following format:
one,
2
3
four "
five
Can anyone help me to create Regex for the above?
You can do this without Regex. It's not clear to me, what you're trying to do though. I've adjusted the code for the updated question:
var text = "\"one\",2,3,\"four \"\",\"five\"";
var collection = text
.Split(',')
.Select(s =>
{
if (s.StartsWith("\"") && s.EndsWith("\""))
{
s = s.Substring(1, s.Length - 2);
}
return s;
})
.ToList();
foreach (var item in collection)
{
Console.WriteLine(item);
}
I've added another sample for you, which uses a CSV reader. I've installed the "CsvHelper" package from NuGet:
const string text = "\"one,\",2,3,\"four \"\"\",\"five\"";
using (var textReader = new StringReader(text))
using (var reader = new CsvReader(textReader))
{
reader.Configuration.Delimiter = ',';
reader.Configuration.AllowComments = false;
reader.Configuration.HasHeaderRecord = false;
if (reader.Read())
{
foreach (var item in reader.CurrentRecord)
{
Console.WriteLine(item);
}
}
}
string newString = Regex.Replace(oldString, #'[^",]', ' ');
I hope the regular expression is good, but I just want you you to see the idea.
EDIT:
string newString = Regex.Replace(oldString, #'[^",]', '\n');
So here is my problem, I'm trying to get the content of a text file as a string, then parse it. What I want is a tab containing each word and only words (no blank, no backspace, no \n ...) What I'm doing is using a function LireFichier that send me back the string containing the text from the file (works fine because it's displayed correctly) but when I try to parse it fails and start doing random concatenation on my string and I don't get why.
Here is the content of the text file I'm using :
truc,
ohoh,
toto, tata, titi, tutu,
tete,
and here's my final string :
;tete;;titi;;tata;;titi;;tutu;
which should be:
truc;ohoh;toto;tata;titi;tutu;tete;
Here is the code I wrote (all using are ok):
namespace ConsoleApplication1{
class Program
{
static void Main(string[] args)
{
string chemin = "MYPATH";
string res = LireFichier(chemin);
Console.WriteLine("End of reading...");
Console.WriteLine("{0}",res);// The result at this point is good
Console.WriteLine("...starting parsing");
res = parseString(res);
Console.WriteLine("Chaine finale : {0}", res);//The result here is awfull
Console.ReadLine();//pause
}
public static string LireFichier(string FilePath) //Read the file, send back a string with the text
{
StreamReader streamReader = new StreamReader(FilePath);
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
public static string parseString(string phrase)//is suppsoed to parse the string
{
string fin="\n";
char[] delimiterChars = { ' ','\n',',','\0'};
string[] words = phrase.Split(delimiterChars);
TabToString(words);//I check the content of my tab
for(int i=0;i<words.Length;i++)
{
if (words[i] != null)
{
fin += words[i] +";";
Console.WriteLine(fin);//help for debug
}
}
return fin;
}
public static void TabToString(string[] montab)//display the content of my tab
{
foreach(string s in montab)
{
Console.WriteLine(s);
}
}
}//Fin de la class Program
}
I think your main issue is
string[] words = phrase.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
You could try using the string splitting option to remove empty entries for you:
string[] words = phrase.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
See the documentation here.
Try this:
class Program
{
static void Main(string[] args)
{
var inString = LireFichier(#"C:\temp\file.txt");
Console.WriteLine(ParseString(inString));
Console.ReadKey();
}
public static string LireFichier(string FilePath) //Read the file, send back a string with the text
{
using (StreamReader streamReader = new StreamReader(FilePath))
{
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
}
public static string ParseString(string input)
{
input = input.Replace(Environment.NewLine,string.Empty);
input = input.Replace(" ", string.Empty);
string[] chunks = input.Split(',');
StringBuilder sb = new StringBuilder();
foreach (string s in chunks)
{
sb.Append(s);
sb.Append(";");
}
return sb.ToString(0, sb.ToString().Length - 1);
}
}
Or this:
public static string ParseFile(string FilePath)
{
using (var streamReader = new StreamReader(FilePath))
{
return streamReader.ReadToEnd().Replace(Environment.NewLine, string.Empty).Replace(" ", string.Empty).Replace(',', ';');
}
}
Your main problem is that you are splitting on \n, but the linebreaks read from your file are \r\n.
You output string does contain all of your items, but the \r characters left in it cause later "lines" to overwrite earlier "lines" on the console.
(\r is a "return to start of line" instruction; without the \n "move to the next line" instruction your words from line 1 are being overwritten by those in line 2, then line 3 and line 4.)
As well as splitting on \r as well as \n, you need to check a string is not null or empty before adding it to your output (or, preferably, use StringSplitOptions.RemoveEmptyEntries as others have mentioned).
string ParseString(string filename) {
return string.Join(";", System.IO.File.ReadAllLines(filename).Where(x => x.Length > 0).Select(x => string.Join(";", x.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Select(y => y.Trim()))).Select(z => z.Trim())) + ";";
}
I am working on a ASP.NET 4.0 web application, the main goal for it to do is go to the URL in the MyURL variable then read it from top to bottom, search for all lines that start with "description" and only keep those while removing all HTML tags. What I want to do next is remove the "description" text from the results afterwords so I have just my device names left. How would I do this?
protected void parseButton_Click(object sender, EventArgs e)
{
MyURL = deviceCombo.Text;
WebRequest objRequest = HttpWebRequest.Create(MyURL);
objRequest.Credentials = CredentialCache.DefaultCredentials;
using (StreamReader objReader = new StreamReader(objRequest.GetResponse().GetResponseStream()))
{
originalText.Text = objReader.ReadToEnd();
}
//Read all lines of file
String[] crString = { "<BR> " };
String[] aLines = originalText.Text.Split(crString, StringSplitOptions.RemoveEmptyEntries);
String noHtml = String.Empty;
for (int x = 0; x < aLines.Length; x++)
{
if (aLines[x].Contains(filterCombo.SelectedValue))
{
noHtml += (RemoveHTML(aLines[x]) + "\r\n");
}
}
//Print results to textbox
resultsBox.Text = String.Join(Environment.NewLine, noHtml);
}
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
Ok so I figured out how to remove the words through one of my existing functions:
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n").Replace("description", "").Replace("INFRA:CORE:", "")
.Replace("RESERVED", "")
.Replace(":", "")
.Replace(";", "")
.Replace("-0/3/0", "");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
public static void Main(String[] args)
{
string str = "He is driving a red car.";
Console.WriteLine(str.Replace("red", "").Replace(" ", " "));
}
Output:
He is driving a car.
Note: In the second Replace its a double space.
Link : https://i.stack.imgur.com/rbluf.png
Try this.It will remove all occurrence of the word which you want to remove.
Try something like this, using LINQ:
List<string> lines = new List<string>{
"Hello world",
"Description: foo",
"Garbage:baz",
"description purple"};
//now add all your lines from your html doc.
if (aLines[x].Contains(filterCombo.SelectedValue))
{
lines.Add(RemoveHTML(aLines[x]) + "\r\n");
}
var myDescriptions = lines.Where(x=>x.ToLower().BeginsWith("description"))
.Select(x=> x.ToLower().Replace("description",string.Empty)
.Trim());
// you now have "foo" and "purple", and anything else.
You may have to adjust for colons, etc.
void Main()
{
string test = "<html>wowzers description: none <div>description:a1fj391</div></html>";
IEnumerable<string> results = getDescriptions(test);
foreach (string result in results)
{
Console.WriteLine(result);
}
//result: none
// a1fj391
}
static Regex MyRegex = new Regex(
"description:\\s*(?<value>[\\d\\w]+)",
RegexOptions.Compiled);
IEnumerable<string> getDescriptions(string html)
{
foreach(Match match in MyRegex.Matches(html))
{
yield return match.Groups["value"].Value;
}
}
Adapted From Code Project
string value = "ABC - UPDATED";
int index = value.IndexOf(" - UPDATED");
if (index != -1)
{
value = value.Remove(index);
}
It will print ABC without - UPDATED
I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?
What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()
Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}
Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}
What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.
You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}
You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}