C# String manipulation - c#

I am working on an application that gets text from a text file on a page.
Example link: http://test.com/textfile.txt
This text file contains the following text:
1 Milk Stuff1.rar
2 Milk Stuff2.rar
3 Milk Stuff2-1.rar
4 Union Stuff3.rar
What I am trying to do is as follows, to remove everything from each line, except for "words" that start with 'Stuff' and ends with '.rar'.
The problem is, most of the simple solutions like using .Remove, .Split or .Replace end up failing. This is because, for example, formatting the string using spaces ends up returning this:
1
Milk
Stuff1.rar\n2
Milk
Stuff2.rar\n3
Milk
Stuff2-1.rar\n4
Union
Stuff3.rar\n
I bet it's not as hard as it looks, but I'd apreciate any help you can give me.
Ps: Just to be clear, this is what I want it to return:
Stuff1.rar
Stuff2.rar
Stuff2-1.rar
Stuff3.rar
I am currently working with this code:
client.HeadOnly = true;
string uri = "http://test.com/textfile.txt";
byte[] body = client.DownloadData(uri);
string type = client.ResponseHeaders["content-type"];
client.HeadOnly = false;
if (type.StartsWith(#"text/"))
{
string[] text = client.DownloadString(uri);
foreach (string word in text)
{
if (word.StartsWith("Patch") && word.EndsWith(".rar"))
{
listBox1.Items.Add(word.ToString());
}
}
}
This is obviously not working, but you get the idea.
Thank you in advance!

This should work:
using (var writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadAllLines("input.txt"))
{
var match = Regex.Match(line, "Stuff.*?\\.rar");
if (match.Success)
writer.WriteLine(match.Value);
}
}

I would be tempted to use a regular expression for this sort of thing.
Something like
Stuff[^\s]*.rar
will pull out just the text you require.
How about a function like:
public static IEnumerable<string> GetStuff(string fileName)
{
var regex = new Regex(#"Stuff[^\s]*.rar");
using (var reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var match = regex.Match(line);
if (match.Success)
{
yield return match.Value;
}
}
}
}

for(string line in text)
{
if(line.EndsWith(".rar"))
{
int index = line.LastIndexOf("Stuff");
if(index != -1)
{
listBox1.Items.Add(line.Substring(index));
}
}
}

Related

Is it possible in C# to return a array back to the calling program?

Is it possible in C# to return a array back to the calling program? If it is not possible, please say it is not all possible. Another alternative is to create a long string and use string.split(). But that does not look nice.
ExamnationOfReturnsFiled("ABCDE1234E") //Calling program.
public yearsfiled[] ExamnationOfReturnsFiled(string panreceived) //function.
{
int k = 0; //to increment the array element.
string item = panreceived; //string value received call program.
string[] yearsfiled = new string[20];//Declaring a string array.
Regex year = new Regex(#"[0-9]{4}-[0-9]{2}");//to capture 2012-13 like entries.
using (StreamReader Reader1 = new StreamReader(#"C: \Users\Unnikrishnan C\Documents\Combined_Blue_Book.txt"))
{
Regex tofindpan = new Regex(item);//Regular Expression to catch the string from the text file being read.
bool tosearch = false;
Regex blank = new Regex(#"^\s*$"); //to detect a blank line.
while ((str.line1 = Reader1.ReadLine()) != null)
{
Match Tofindpan = tofindpan.Match(#"[A-Z]{5}[0-9]{4}[A-Z]{1}");
Match Blank = blank.Match(line1);
if (Blank.Success)
{
tosearch = false;
}
if (Tofindpan.Success)
{
tosearch = true; //when true the
}
if (tosearch == true)
{
Match Year = year.Match(str.line1);
if (Year.Success)
{
yearsfiled[k] = Year.Value;
k++;
}
}
}
return yearsfiled;
}
}
public string[] ExamnationOfReturnsFiled(string panreceived) //function
you are returning type not variable name change the method signature like above
You should be returning a string[]. Your return type yearsfiled[] is a variable name, not a type name
//from calling programme. Tested and succeeded.
string[] yearsfiled = new string[20];
yearsfiled = ExamnationOfReturnsFiled(item1);
// the function name modified as follows.
public static string[] ExamnationOfReturnsFiled(string panreceived)
{
Everything else as in the original post.
}
//It was tested. And found successful. Thanks so much to #Midhun Mundayadan and #Eavidan.

searching for strings in a text file using foreach loops

so I am trying to search a file for some specific strings these strings are stroed in a list and are called universities, Courses and UGPG I am using a Streamreader to load the file in.
the issue I am having is that after the first foreach loop has executed the remaining searches I want to complete return N/a as if the strings are not present in the text file. however I know they are in the text file.
Is there a reason for this or a better way to code this?
my code is below.
any help would be greatly appreciated.
validdirectory = new DirectoryInfo(path);
Vfiles = validdirectory.GetFiles("*.txt");
foreach (FileInfo file in Vfiles)
{
//reads the file contents
bool Stepout = false;
bool nextouterloop = false;
using (StreamReader ReadMessage = new StreamReader(file.FullName))
{
String MessageContents = ReadMessage.ReadToEnd();
Message_Viewer.Text = MessageContents;
foreach (string Uni_Name in Universities)
{
if (MessageContents.Contains(Uni_Name))
{
Display_Uni.Text = Uni_Name;
}
}
foreach (string course in Courses)
{
if (MessageContents.Contains(course))
{
Display_Course.Text = course;
}
Display_Course.Text = "N/A";
}
if (MessageContents.Contains("Postgraduate"))
{
Display_UGPG.Text = "Postgraduate";
}
else if (MessageContents.Contains("Undergraduate"))
{
Display_UGPG.Text = "Undergraduate";
}
Display_UGPG.Text = "N/A";
}
}
Remove the assignement of N/A inside the loop and let it run until completition.
At the end you could just test the content of the textboxes to see if your loops have found something and, if not, set the N/A text
foreach (string course in Courses)
{
if (MessageContents.Contains(course))
Display_Course.Text = course;
}
if (MessageContents.Contains("Postgraduate"))
Display_UGPG.Text = "Postgraduate";
else if (MessageContents.Contains("Undergraduate"))
Display_UGPG.Text = "Undergraduate";
if(string.IsNullOrWhitespace(Display_Course.Text))
Display_Course.Text = "N/A";
if(string.IsNullOrWhitespace(Display_UGPG.Text ))
Display_UGPG.Text = "N/A";
By the way, having you used arrays or lists for the universities and courses I suppose that you want to see all the matching names. Actually, your code writes always the last course and university found in the textboxes overwriting the previous name found.
You should change the line that set the Text property with a call to AppendText (perhaps adding also a newline if the textboxes are multiline = true)
....
Display_Uni.AppendText(Uni_Name + Environment.NewLine);
...
Display_Course.AppendText(course + Environment.NewLine);
Here is a possible solution without the complicated foreach loops:
if (Universities.Select(p => MessageContents.Contains(p)).Any())
{
Display_Uni.Text = Uni_Name;
}
else if (Courses.Select(p => MessageContents.Contains(p)).Any())
{
Display_Course.Text = course;
}
else if (MessageContents.Contains("Postgraduate"))
{
Display_UGPG.Text = "Postgraduate";
}
else if (MessageContents.Contains("Undergraduate"))
{
Display_UGPG.Text = "Undergraduate";
}
else
{
Display_UGPG.Text = "N/A";
}

C# HtmlDecode Specific tags only

I have a large htmlencoded string and i want decode only specific whitelisted html tags.
Is there a way to do this in c#, WebUtility.HtmlDecode() decodes everything.
`I am looking for an implementaiton of DecodeSpecificTags() that will pass below test.
[Test]
public void DecodeSpecificTags_SimpleInput_True()
{
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
List<string> whiteList = new List<string>(){ "strong","br" } ;
Assert.IsTrue(DecodeSpecificTags(whiteList,input) == output);
}`
You could do something like this
public string DecodeSpecificTags(List<string> whiteListedTagNames,string encodedInput)
{
String regex="";
foreach(string s in whiteListedTagNames)
{
regex="<"+#"\s*/?\s*"+s+".*?"+">";
encodedInput=Regex.Replace(encodedInput,regex);
}
return encodedInput;
}
A better approach could be to use some html parser like Agilitypack or csquery or Nsoup to find specific elements and decode it in a loop.
check this for links and examples of parsers
Check It, i did it using csquery :
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
var decoded = HttpUtility.HtmlDecode(output);
var encoded =input ; // HttpUtility.HtmlEncode(decoded);
Console.WriteLine(encoded);
Console.WriteLine(decoded);
var doc=CsQuery.CQ.CreateDocument(decoded);
var paras=doc.Select("strong").Union(doc.Select ("br")) ;
var tags=new List<KeyValuePair<string, string>>();
var counter=0;
foreach (var element in paras)
{
HttpUtility.HtmlEncode(element.OuterHTML).Dump();
var key ="---" + counter + "---";
var value= HttpUtility.HtmlDecode(element.OuterHTML);
var pair= new KeyValuePair<String,String>(key,value);
element.OuterHTML = key ;
tags.Add(pair);
counter++;
}
var finalstring= HttpUtility.HtmlEncode(doc.Document.Body.InnerHTML);
finalstring.Dump();
foreach (var element in tags)
{
finalstring=finalstring.Replace(element.Key,element.Value);
}
Console.WriteLine(finalstring);
Or you could use HtmlAgility with a black list or white list based on your requirement. I'm using black listed approach.
My black listed tag is store in a text file, for example "script|img"
public static string DecodeSpecificTags(this string content, List<string> blackListedTags)
{
if (string.IsNullOrEmpty(content))
{
return content;
}
blackListedTags = blackListedTags.Select(t => t.ToLowerInvariant()).ToList();
var decodedContent = HttpUtility.HtmlDecode(content);
var document = new HtmlDocument();
document.LoadHtml(decodedContent);
decodedContent = blackListedTags.Select(blackListedTag => document.DocumentNode.Descendants(blackListedTag))
.Aggregate(decodedContent,
(current1, nodes) =>
nodes.Select(htmlNode => htmlNode.WriteTo())
.Aggregate(current1,
(current, nodeContent) =>
current.Replace(nodeContent, HttpUtility.HtmlEncode(nodeContent))));
return decodedContent;
}

C#: What's an efficient way of parsing a string with one delimiter through ReadLine() of TextReader?

C#: What's an efficient way to parse a string with one delimiter for each ReadLine() of TextReader?
My objective is to load a list of proxies to ListView into two columns (Proxy|Port) reading from a .txt file. How would I go upon splitting each readline() into the proxy and port variables with the delimiter ":"?
This is what I've got so far,
public void loadProxies(string FilePath)
{
string Proxy; // example/temporary place holders
int Port; // updated at each readline() loop.
using (TextReader textReader = new StreamReader(FilePath))
{
string Line;
while ((Line = textReader.ReadLine()) != null)
{
// How would I go about directing which string to return whether
// what's to the left of the delimiter : or to the right?
//Proxy = Line.Split(':');
//Port = Line.Split(':');
// listview stuff done here (this part I'm familiar with already)
}
}
}
If not, is there a more efficient way to do this?
string [] parts = line.Split(':');
string proxy = parts[0];
string port = parts[1];
You could split them this way:
string line;
string[] tokens;
while ((Line = textReader.ReadLine()) != null)
{
tokens = line.Split(':');
proxy = tokens[0];
port = tokens[1];
// listview stuff done here (this part I'm familiar with already)
}
it's best practise to use small letter names for variables in C#, as the other ones are reserved for class / namespace names etc.
How about running a Regex on the whole file?
var parts=
Regex.Matches(input, #"(?<left>[^:]*):(?<right>.*)",RegexOptions.Multiline)
.Cast<Match>()
.Where(m=>m.Success)
.Select(m => new
{
left = m.Groups["left"],
right = m.Groups["right"]
});
foreach(var part in parts)
{
//part.left
//part.right
}
Or, if it's too big, why not Linqify the ReadLine operation with yielding method?
static IEnumerable<string> Lines(string filename)
{
using (var sr = new StreamReader(filename))
{
while (!sr.EndOfStream)
{
yield return sr.ReadLine();
}
}
}
And run it like so:
var parts=Lines(filename)
.Select(
line=>Regex.Match(input, #"(?<left>[^:]*):(?<right>.*)")
)
.Where(m=>m.Success)
.Select(m => new
{
left = m.Groups["left"],
right = m.Groups["right"]
});
foreach(var part in parts)
{
//part.left
//part.right
}
In terms of efficiency I expect you'd be hard-pressed to beat:
int index = line.IndexOf(':');
if (index < 0) throw new InvalidOperationException();
Proxy = line.Substring(0, index);
Port = int.Parse(line.Substring(index + 1));
This avoids the array construction / allocation associated with Split, and only looks as far as the first delimited. But I should stress that this is unlikely to be a genuine performance bottleneck unless the data volume is huge, so pretty-much any approach should be fine. In fact, perhaps the most important thing (I've been reminded by the comment below) is to suspend the UI while adding:
myListView.BeginUpdate();
try {
// TODO: add all the items here
} finally {
myListView.EndUpdate();
}
You might want to try something like this.
var items = File.ReadAllText(FilePath)
.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Split(':'))
.Select(pieces => new {
Proxy = pieces[0],
Port = int.Parse(pieces[1])
});
If you know that you won't have a stray newline at the end of the file you can do this.
var items = File.ReadAllLines(FilePath)
.Select(line => line.Split(':'))
.Select(pieces => new {
Proxy = pieces[0],
Port = Convert.ToInt32(pieces[1])
});

Formatting Twitter text (TweetText) with C#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = #"#([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = #"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = #"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("{0}", url));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("#" + screenName,
String.Format("#{0}", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("#{1}",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}
That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up #name and replacing it with Real Name;
2) multiple #name's in a row get commas, if they don't have them;
3) Tweets that start with #name(s) are formatted "To #name:".
I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:
#user1 #user2 check out this cool link I got from #user3: http://url.com/page.htm#anchor #coollinks
And turns it into:
<span class="salutation">
To Real Name,
Real Name:
</span> check out this cool link I got from
<span class="salutation">
Real Name
</span>:
http://site.com/...
#coollinks
It also wraps all that markup in a little JavaScript:
document.getElementById('twitter').innerHTML = '{markup}';
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.
I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.
public static string FormatTwitterText(this string text, string shareurl)
{
if (string.IsNullOrEmpty(text))
return string.Empty;
string finaltext = string.Empty;
string sharepath = string.Format("http://url.com/{0}", shareurl);
//list of all words, trimmed and new space removed
List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, #"\n", "").Trim())
.Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
.ToList();
int extraChars = 3; //to account for the two dots ".."
int finalLength = 140 - sharepath.Length - extraChars;
int runningLengthCount = 0;
int collectionCount = textlist.Count;
int count = 0;
foreach (string eachwordformated in textlist
.Select(eachword => string.Format("{0} ", eachword)))
{
count++;
int textlength = eachwordformated.Length;
runningLengthCount += textlength;
int nextcount = count + 1;
var nextTextlength = nextcount < collectionCount ?
textlist[nextcount].Length :
0;
if (runningLengthCount + nextTextlength < finalLength)
finaltext += eachwordformated;
}
return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
}
There is a good resource for parsing Twitter messages this link, worked for me:
How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0
http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/
It contains support for:
Urls
#hashtags
#usernames
BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

Categories