How would I access a txt file and split the links - c#

Alright, I have a program that grabs links off of a website and puts it into a txt BUT the links aren't separated onto their own lines and I need to somehow do that without having to manually do it myself, here is the code used to grab the links off of the website, write the links to a text file then grab the txt file and read it.
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
var client = new WebClient();
string text = client.DownloadString("https://currentlinks.com");
File.WriteAllText("C:/ProgramData/oof.txt", text);
string searchKeyword = "https://foobar.to/showthread.php";
string fileName = "C:/ProgramData/oof.txt";
string[] textLines = File.ReadAllLines(fileName);
List<string> results = new List<string>();
foreach (string line in textLines)
{
if (line.Contains(searchKeyword))
{
results.Add(line);
}
var sb = new StringBuilder();
foreach (var item in results)
{
sb.Append(item);
}
textBox1.Text = sb.ToString();
var parsed = textBox1;
TextWriter tw = new StreamWriter("C:/ProgramData/parsed.txt");
// write lines of text to the file
tw.WriteLine(parsed);
// close the stream
tw.Close();
}
}

You are getting all the Links (URLs) in one single string. There is not straight forward way to get all the URLs individually without some assumptions.
With the sample data you shared, I assume that the URLs in the string follow simple URLs format and do not have any fancy stuff in it. They start with http and one url does not have any other http.
With above assumptions, I suggest following code.
// Sample data as shared by the OP
string data = "https://forum.to/showthread.php?tid=22305https://forum.to/showthread.php?tid=22405https://forum.to/showthread.php?tid=22318";
//Splitting the string by string `http`
var items = data.Split(new [] {"http"},StringSplitOptions.RemoveEmptyEntries).ToList();
//At this point all the strings in items collection will be without "http" at the start.
//So they will look like as following.
// s://forum.to/showthread.php?tid=22305
// s://forum.to/showthread.php?tid=22405
// s://forum.to/showthread.php?tid=22318
//So we need to add "http" at the start of each of the item as following.
items = items.Select(i => "http" + i).ToList();
// After this they will become like following.
// https://forum.to/showthread.php?tid=22305
// https://forum.to/showthread.php?tid=22405
// https://forum.to/showthread.php?tid=22318
//Now we need to create a single string with newline character between two items so
//that they represent a single line individually.
var text = String.Join("\r\n", items);
// Then write the text to the file.
File.WriteAllText("C:/ProgramData/oof.txt", text);
This should help you resolve your issue.

.Split way
Could you use yourString.Split("https://");?
Example:
//This simple example assumes that all links are https (not http)
string contents = "https://www.example.com/dogs/poodles/poodle1.htmlhttps://www.example.com/dogs/poodles/poodle2.html";
const string Prefix = "https://";
var linksWithoutPrefix = contents.Split(Prefix, StringSplitOptions.RemoveEmptyEntries);
//using System.Linq
var linksWithPrefix = linksWithoutPrefix.Select(l => Prefix + l);
foreach (var match in linksWithPrefix)
{
Console.WriteLine(match);
}
Regex way
Another option is to use reg exp.
Failed - cannot find/write the right regex ... got to go now
string contents = "http://www.example.com/dogs/poodles/poodle1.htmlhttp://www.example.com/dogs/poodles/poodle2.html";
//From https://regexr.com/
var rgx = new Regex(#"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*");
var matches = rgx.Matches(contents);
foreach(var match in matches )
{
Console.WriteLine(match);
}
//This finds 'http://www.example.com/dogs/poodles/poodle1.htmlhttp' (note the htmlhttp at the end

Related

Remove all lines from a .txt file, except for lines beginning with specific word

I am attempting to create a program that will read all lines from a text file and remove all text, except for the lines beginning with 'Line 1:, Line 2:, Line 3:' etc.
UPDATE
Thank you for all your suggestions. Here is the final working code:
//PROCEDURE
private void Procedure()
{
// READ AND APPEND LINES
var file_path = #"Tags.txt";
var sb = new StringBuilder();
foreach (var line in File.ReadLines(file_path))
{
if (Regex.IsMatch(line, #"^Line\s+[0-9]+:") || (Regex.IsMatch(line, #"^Zeile\s+[0-9]+:") || (Regex.IsMatch(line, #"^Linea\s+[0-9]+:"))))
{
sb.AppendLine(line);
}
}
// SAVE BACK
File.WriteAllText(file_path, sb.ToString());
}
private void btnRefine_Click(object sender, RoutedEventArgs e)
{
Procedure();
}
Any improvements to the code are always welcome.
void ProcessFile()
{
var file_path = #"Tags.txt";
var sb = new StringBuilder();
foreach (var line in File.ReadLines(file_path))
{
if (!Regex.IsMatch(line, #"^Line\s+[0-9]+:"))
{
sb.AppendLine(line);
}
}
// Save back
File.WriteAllText(file_path, sb.ToString());
}
UPDATE
You could use LINQ instead. Then the previous code will look like this:
void ProcessFile()
{
var file_path = #"Tags.txt";
File.WriteAllLines(file_path, File.ReadLines(file_path).Where(line => !Regex.IsMatch(line, #"^Line\s+[0-9]+:")));
}
I would make use of File.ReadAllLines and File.WriteAllLines to do the file IO. They are convenient in that they allow you to easily use LINQ-style operations on all the lines of a file. This comes at the cost of the entire file being read into memory -- which may not be practical for a file that is many GB in size.
The LINQ Where clause would allow you to filter lines according to a predicate of your choosing.
The criteria for keeping a line is that it starts with your Line 123: pattern. That can be articulated using a regular expression like ^Line\s+\d+:. Which basically calls for the line to begin with Line followed by some whitespace, followed by some digits, then a colon. Regex.IsMatch will allow you to test to see if each line matches the regular expression.
Here's a one-liner:
File.WriteAllLines("output.txt", File.ReadAllLines("input.txt")
.Where(line => Regex.IsMatch(line, "^Line\s+\d+:")));
After getting all lines as a List you can simply use RemoveAll to remove lines like this,
List<string> lines = new List<string> (File.ReadAllLines("Tags.txt"));
lines.RemoveAll(line => !Regex.IsMatch(line, #"^Line\s+\d+:");
using (StreamWriter fw = new StreamWriter(new FileStream("TagsNew.txt", FileMode.CreateNew, FileAccess.Write)))
{
foreach (string line in lines)
{
fw.WriteLine(line);
}
}
Hope this helps.

Importing .csv file in to listview

I'm trying to load a .csv file into a listview:
ofDialog.Filter = #"CSV Files|*.csv";
ofDialog.Title = #"Select your backlink file...";
ofDialog.FileName = "backlinks.csv";
// is cancel pressed?
if (ofDialog.ShowDialog() == DialogResult.Cancel)
return;
try
{
string filename = ofDialog.FileName;
var lines = File.ReadAllLines(filename);
foreach (string line in lines)
{
var parts = line.Split(' ');
ListViewItem lvi = new ListViewItem(parts[0]);
lvi.SubItems.Add(parts[1]);
listViewMain.Items.Add(lvi);
}
// update count
Helpers.returnMessage(File.ReadAllLines(ofDialog.FileName).Count() + " rows imported.");
}
catch (Exception ex)
{
Helpers.returnMessage(ex.Message);
}
The csv contents looks like:
URL Rating Domain Rating IP From Referring Page URL Referring Page Title Internal Links Count External Links Count Link URL TextPre Link Anchor TextPost Size Type NoFollow Site-wide Image Encoding Alt First Seen Previous Visited Last Check Original
24 89 91.198.174.192 http://en.wikipedia.org/wiki/Humbug_(sweet) "Humbug (sweet) - Wikipedia, the free encyclopedia" 118 16 http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg 12163 href True False False utf8 2013-09-08T15:14:50Z 2015-03-11T01:48:40Z 2015-03-11T01:48:40Z True
There is no delimeter "," like in regular .csv files, and has different spaces between some fields, i'm stuck on the best way to split each section and add to the listview, i have a mental block lol
any help would be appreciated :)
cheers guys
Graham
For opening the CSV file, I would first check it is not a tab separated file, where you can use \t as the delimiter to read the file in a similar method as you are.
Failing this you could use a (very long and complicated) regex string to match the different "columns" as different parts. The regex string would look something like:
\s+([0-9]*)\s+([0-9]*)\s+([0-9]*.[0-9]*.[0-9]*.[0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+(\"[a-zA-Z0-9 \-\(\),]*\")\s+([0-9]*)\s+([0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+([a-zA-Z:\/._\(\)]*)\s+([0-9]*)\s+([a-zA-Z]*)\s+(True|False)\s+(True|False)\s+(True|False)\s+([a-z0-9]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+(True|False)
This would return each column as a different group, which you can access as detailed below:
var regex = new Regex(regexString);
foreach(var line in lines)
{
var match = regex.Match(line);
var urlRating = match.Groups[0].Value;
var domainRating = match.Groups[1].Value;
var ip = match.Groups[2].Value;
// ...
}
You can see more about the regex string I have created (and possibly simplify it/extend it for the additional lines) here: https://regex101.com/r/oN4tW3/1
For more on C# regex look here: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx
Edit: I would avoid the regex method if it is tab seperated as it is more complex and fragile

Verifying and parsing csv to 2D array in C# Visual Studios

Just trying out C# to make a button that loads csv files verify them and parse them:
protected void Upload_Btn_Click(object sender, EventArgs e)
{
string test = PNLdataLoader.FileName;
//checks if file is csv
Regex regex = new Regex("*.csv");
Match match = regex.Match(test);
if (match.Success)
{
string CSVFileAsString = System.Text.Encoding.ASCII.GetString(PNLdataLoader.FileBytes);
System.IO.MemoryStream MS = new System.IO.MemoryStream(PNLdataLoader.FileBytes);
System.IO.StreamReader SR = new System.IO.StreamReader(MS);
//Store each line in CSVlines array of strings
string[] CSVLines = new string[0];
while (!SR.EndOfStream)
{
System.Array.Resize(ref CSVLines, CSVLines.Length + 1);
CSVLines[CSVLines.Length - 1] = SR.ReadLine();
}
}
So far I got it to store the lines in CSVLines but I am not sure what is wrong with the regex. Is there a more efficient way to do this?
That isn't a valid expression, its saying match whatever character that comes before * 0 or more times, since there is no character before that there is a problem.
This will probably match most things, it does not include special characters.
Regex regex = new Regex("[a-zA-Z0-9]{1,}.csv");
You could also do this instead:
if(test.EndsWith(".csv"))
and lastly, I would change your array to a List<T> or something like that, futher explained here: What is more efficient: List<T>.Add() or System.Array.Resize()?
//Store each line in CSVlines array of strings
List<string> CSVLines = new List<string>();
while (!SR.EndOfStream)
{
CSVLines.Add(SR.ReadLine());
}
EDIT:
List<T> is in System.Collections.Generic

Search and replace values in text file with C#

I have a text file with a certain format. First comes an identifier followed by three spaces and a colon. Then comes the value for this identifier.
ID1 :Value1
ID2 :Value2
ID3 :Value3
What I need to do is searching e.g. for ID2 : and replace Value2 with a new value NewValue2. What would be a way to do this? The files I need to parse won't get very large. The largest will be around 150 lines.
If the file isn't that big you can do a File.ReadAllLines to get a collection of all the lines and then replace the line you're looking for like this
using System.IO;
using System.Linq;
using System.Collections.Generic;
List<string> lines = new List<string>(File.ReadAllLines("file"));
int lineIndex = lines.FindIndex(line => line.StartsWith("ID2 :"));
if (lineIndex != -1)
{
lines[lineIndex] = "ID2 :NewValue2";
File.WriteAllLines("file", lines);
}
Here's a simple solution which also creates a backup of the source file automatically.
The replacements are stored in a Dictionary object. They are keyed on the line's ID, e.g. 'ID2' and the value is the string replacement required. Just use Add() to add more as required.
StreamWriter writer = null;
Dictionary<string, string> replacements = new Dictionary<string, string>();
replacements.Add("ID2", "NewValue2");
// ... further replacement entries ...
using (writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadLines("input.txt"))
{
bool replacementMade = false;
foreach (var replacement in replacements)
{
if (line.StartsWith(replacement.Key))
{
writer.WriteLine(string.Format("{0} :{1}",
replacement.Key, replacement.Value));
replacementMade = true;
break;
}
}
if (!replacementMade)
{
writer.WriteLine(line);
}
}
}
File.Replace("output.txt", "input.txt", "input.bak");
You'll just have to replace input.txt, output.txt and input.bak with the paths to your source, destination and backup files.
Ordinarily, for any text searching and replacement, I'd suggest some sort of regular expression work, but if this is all you're doing, that's really overkill.
I would just open the original file and a temporary file; read the original a line at a time, and just check each line for "ID2 :"; if you find it, write your replacement string to the temporary file, otherwise, just write what you read. When you've run out of source, close both, delete the original, and rename the temporary file to that of the original.
Something like this should work. It's very simple, not the most efficient thing, but for small files, it would be just fine:
private void setValue(string filePath, string key, string value)
{
string[] lines= File.ReadAllLines(filePath);
for(int x = 0; x < lines.Length; x++)
{
string[] fields = lines[x].Split(':');
if (fields[0].TrimEnd() == key)
{
lines[x] = fields[0] + ':' + value;
File.WriteAllLines(lines);
break;
}
}
}
You can use regex and do it in 3 lines of code
string text = File.ReadAllText("sourcefile.txt");
text = Regex.Replace(text, #"(?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$)", "NewValue2",
RegexOptions.Multiline);
File.WriteAllText("outputfile.txt", text);
In the regex, (?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$) means, find anything that starts with id2 with any number of spaces before and after :, and replace the following string (any alpha numeric character, excluding punctuations) all the way 'till end of the line. If you want to include punctuations, then replace \w*? with .*?
You can use regexes to achieve this.
Regex re = new Regex(#"^ID\d+ :Value(\d+)\s*$", RegexOptions.IgnoreCase | RegexOptions.Compiled);
List<string> lines = File.ReadAllLines("mytextfile");
foreach (string line in lines) {
string replaced = re.Replace(target, processMatch);
//Now do what you going to do with the value
}
string processMatch(Match m)
{
var number = m.Groups[1];
return String.Format("ID{0} :NewValue{0}", number);
}

C# get text from file between two hashes

In my C# program (at this point) I have two fields in my form. One is a word list using a listbox; the other is a textbox. I have been able to successfully load a large word list into the listbox from a text file. I can also display the selected item in the listbox into the textbox this way:
private void wordList_SelectedIndexChanged(object sender, EventArgs e)
{
string word = wordList.Text;
concordanceDisplay.Text = word;
}
I have another local file I need to get at to display some of its contents in the textbox. In this file each headword (as in a dictionary) is preceded by a #. So, I would like to take the variable 'word' and search in this local file to put the entries into the textbox, like so:
#headword1
entry is here...
...
...
#headword2
entry is here...
...
...
#headword3
entry is here...
...
...
You get the format of the text file. I just need to search for the correct headword with # before that word, and copy all info from there until the next hash in the file, and place it in the text box.
Obviously, I am a newbie, so be gentle. Thanks much.
P.S. I used StreamReader to get at the word list and display it in the listbox like so:
StreamReader sr = new StreamReader("C:\\...\\list-final.txt");
string line;
while ((line = sr.ReadLine()) != null)
{
MyList.Add(line);
}
wordList.DataSource = MyList;
var sectionLines = File.ReadAllLines(fileName) // shortcut to read all lines from file
.SkipWhile(l => l != "#headword2") // skip everything before the heading you want
.Skip(1) // skip the heading itself
.TakeWhile(l => !l.StartsWith("#")) // grab stuff until the next heading or the end
.ToList(); // optional convert to list
string getSection(string sectionName)
{
StreamReader sr = new StreamReader(#"C:\Path\To\file.txt");
string line;
var MyList = new List<string>();
bool inCorrectSection = false;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("#"))
{
if (inCorrectSection)
break;
else
inCorrectSection = Regex.IsMatch(line, #"^#" + sectionName + #"($| -)");
}
else if (inCorrectSection)
MyList.Add(line);
}
return string.Join(Environment.NewLine, MyList);
}
// in another method
textBox.Text = getSection("headword1");
Here are a few alternate ways to check if the section matches, in rough order of how accurate they are in detecting the right section name:
// if the separator after the section name is always " -", this is the best way I've thought of, since it will work regardless of what's in the sectionName
inCorrectSection = Regex.IsMatch(line, #"^#" + sectionName + #"($| -)");
// as long as the section name can't contain # or spaces, this will work
inCorrectSection = line.Split('#', ' ')[1] == sectionName;
// as long as only alphanumeric characters can ever make up the section name, this is good
inCorrectSection = Regex.IsMatch(line, #"^#" + sectionName + #"\b");
// the problem with this is that if you are searching for "head", it will find "headOther" and think it's a match
inCorrectSection = line.StartsWith("#" + sectionName);

Categories