I would like to count number of lines in a file based on crlf (0D0A) count. My current code only counting the number of lines based on cr (0D). Can anybody give suggestion ?
public static int Countline(string file)
{
var lineCount = 0;
using (var reader = File.OpenText(file))
{
while (reader.ReadLine() != null)
{
lineCount++;
}
}
return lineCount;
}
Usage:
Countline("text.txt", "\r\n");
Method:
public static int Countline(string file, string lineSeperator)
{
string text = File.ReadAllText(file);
return System.Text.RegularExpressions.Regex.Matches(text, lineSeperator).Count;
}
string content = System.IO.File.ReadAllText( fileName );
int numMatches = content.Select((c, i) => content.Substring(i)).Count(sub => sub.StartsWith(Environment.NewLine));
Note I'm using Environment.NewLine for line endings but you can replace with the whole string if you prefer.
public int CountLines(string Text)
{
int count = 0;
foreach (ReadOnlySpan<char> _ in Text.AsSpan().EnumerateLines())
{
count++;
}
return count;
}
Benchmark:
Related
I need to parse reactjs file in main.451e57c9.js to retrieve version number with C#.
This file contains mixed data, here is little part of it:
.....inally{if(s)throw i}}return a}}(e,t)||xe(e,t)||we()}var Se=
JSON.parse('{"shortVersion":"v3.1.56"}')
,Ne="
AASAAAAAqCAYAAAATb4ZSAAAACXBIWXMAAAsTAAALEw.....
I need to extract json data of {"shortVersion":"v3.1.56"}
The last time I tried to simply find the string shortVersion and return a certain number of characters after, but it seems like I'm trying to create the bicycle from scratch. Is there proper way to identify and extract json from the mixed text?
public static void findVersion()
{
var partialName = "main.*.js";
string[] filesInDir = Directory.GetFiles(#pathToFile, partialName);
var lines = File.ReadLines(filesInDir[0]);
foreach (var line in File.ReadLines(filesInDir[0]))
{
string keyword = "shortVersion";
int indx = line.IndexOf(keyword);
if (indx != -1)
{
string code = line.Substring(indx + keyword.Length);
Console.WriteLine(code);
}
}
}
RESULT
":"v3.1.56"}'),Ne=".....
string findJson(string input, string keyword) {
int startIndex = input.IndexOf(keyword) - 2; //Find the starting point of shortversion then subtract 2 to start at the { bracket
input = input.Substring(startIndex); //Grab everything after the start index
int endIndex = 0;
for (int i = 0; i < input.Length; i++) {
char letter = input[i];
if (letter == '}') {
endIndex = i; //Capture the first instance of the closing bracket in the new trimmed input string.
break;
}
}
return input.Remove(endIndex+1);
}
Console.WriteLine(findJson("fwekjfwkejwe{'shortVersion':'v3.1.56'}wekjrlklkj23klj23jkl234kjlk", "shortVersion"));
You will recieve {'shortVersion':'v3.1.56'} as output
Note you may have to use line.Replace('"', "'");
Try below method -
public static object ExtractJsonFromText(string mixedStrng)
{
for (var i = mixedStrng.IndexOf('{'); i > -1; i = mixedStrng.IndexOf('{', i + 1))
{
for (var j = mixedStrng.LastIndexOf('}'); j > -1; j = mixedStrng.LastIndexOf("}", j -1))
{
var jsonProbe = mixedStrng.Substring(i, j - i + 1);
try
{
return JsonConvert.DeserializeObject(jsonProbe);
}
catch
{
}
}
}
return null;
}
Fiddle
https://dotnetfiddle.net/N1jiWH
You should not use GetFiles() since you only need one and that returns all before you can do anything. This should give your something you can work with here and it should be as fast as it likely can be with big files and/or lots of files in a folder (to be fair I have not tested this on such a large file system or file)
using System;
using System.IO;
using System.Linq;
public class Program
{
public static void Main()
{
Console.WriteLine("Hello World");
var path = $#"c:\SomePath";
var jsonString = GetFileVersion(path);
if (!string.IsNullOrWhiteSpace(jsonString))
{
// do something with string; deserialize or whatever.
var result=JsonConvert.DeserializeObject<List<Version>>(jsonString);
var vers = result.shortVersion;
}
}
private static string GetFileVersion(string path)
{
var partialName = "main.*.js";
// JSON string fragment to find: doubled up braces and quotes for the $# string
string matchString = $#"{{""shortVersion"":";
string matchEndString = $#" ""}}'";
// we can later stop on the first match
DirectoryInfo dir = new DirectoryInfo(path);
if (!dir.Exists)
{
throw new DirectoryNotFoundException("The directory does not exist.");
}
// Call the GetFileSystemInfos method and grab the first one
FileSystemInfo info = dir.GetFileSystemInfos(partialName).FirstOrDefault();
if (info.Exists)
{
// walk the file contents looking for a match (assumptions made here there IS a match and it has that string noted)
var line = File.ReadLines(info.FullName).SkipWhile(line => !line.Contains(matchString)).Take(1).First();
var indexStart = line.IndexOf(matchString);
var indexEnd = line.IndexOf(matchEndString, indexStart);
var jsonString = line.Substring(indexStart, indexEnd + matchEndString.Length);
return jsonString;
}
return string.Empty;
}
public class Version
{
public string shortVersion { get; set; }
}
}
Use this this should be faster - https://dotnetfiddle.net/sYFvYj
public static object ExtractJsonFromText(string mixedStrng)
{
string pattern = #"\(\'\{.*}\'\)";
string str = null;
foreach (Match match in Regex.Matches(mixedStrng, pattern, RegexOptions.Multiline))
{
if (match.Success)
{
str = str + Environment.NewLine + match;
}
}
return str;
}
I'm learning about string utilities in C#, and I have a method that replaces parts of a string.
Using the replace method I need to get an output such as
"Old file name: file00"
"New file name: file01"
Depending on what the user wants to change it to.
I am looking for help on making the method (NextImageName) replace only the digits, but not the file name.
class BuildingBlock
{
public static string ReplaceOnce(string word, string characters, int position)
{
word = word.Remove(position, characters.Length);
word = word.Insert(position, characters);
return word;
}
public static string GetLastName(string name)
{
string result = "";
int posn = name.LastIndexOf(' ');
if (posn >= 0) result = name.Substring(posn + 1);
return result;
}
public static string NextImageName(string filename, int newNumber)
{
if (newNumber > 9)
{
return ReplaceOnce(filename, newNumber, (filename.Length - 2))
}
if (newNumber < 10)
{
}
if (newNumber == 0)
{
}
}
The other "if" statements are empty for now until I find out how to do the first one.
The correct way to do this would be to use Regular Expressions.
Ideally you would separate "file" from "00" in "file00". Then take "00", convert it to an Int32 (using Int32.Parse()) and then rebuild your string with String.Format().
public static string NextImageName(string filename, int newNumber)
{
string oldnumber = "";
foreach (var item in filename.ToCharArray().Reverse())
if (char.IsDigit(item))
oldnumber = item + oldnumber ;
else
break;
return filename.Replace(oldnumber ,newNumber.ToString());
}
public static string NextImageName(string filename, int newNumber)
{
int i = 0;
foreach (char c in filename) // get index of first number
{
if (char.IsNumber(c))
break;
else
i++;
}
string s = filename.Substring(0,i); // remove original number
s = s + newNumber.ToString(); // add new number
return s;
}
I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines.
Example:
Source string:
"consecuti?vely"
Replace rules:
.Replace("cuti?",#"cuti?(-\s+)?")
.Replace("con",#"con(-\s+)?")
.Replace("consecu",#"consecu(-\s+)?")
Desired output:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems.
Whats the best solution to perform such a multiple replace, which will produce the desired output?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\s+)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context.
EDIT: My current code, doesnt work when replace rules overlap like in example above
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + #"(-\s+)?");
}
return regex;
}
the output of this program is "con(-\s+)?secu(-\s+)?ti?(-\s+)?vely";
and as I understand your problem, my code can completely solve your problem.
class Program
{
class somefields
{
public string first;
public string secound;
public string Add;
public int index;
public somefields(string F, string S)
{
first = F;
secound = S;
}
}
static void Main(string[] args)
{
//declaring output
string input = "consecuti?vely";
List<somefields> rules=new List<somefields>();
//declaring rules
rules.Add(new somefields("cuti?",#"cuti?(-\s+)?"));
rules.Add(new somefields("con",#"con(-\s+)?"));
rules.Add(new somefields("consecu",#"consecu(-\s+)?"));
// finding the string which must be added to output string and index of that
foreach (var rul in rules)
{
var index=input.IndexOf(rul.first);
if (index != -1)
{
var add = rul.secound.Remove(0,rul.first.Count());
rul.Add = add;
rul.index = index+rul.first.Count();
}
}
// sort rules by index
for (int i = 0; i < rules.Count(); i++)
{
for (int j = i + 1; j < rules.Count(); j++)
{
if (rules[i].index > rules[j].index)
{
somefields temp;
temp = rules[i];
rules[i] = rules[j];
rules[j] = temp;
}
}
}
string output = input.ToString();
int k=0;
foreach(var rul in rules)
{
if (rul.index != -1)
{
output = output.Insert(k + rul.index, rul.Add);
k += rul.Add.Length;
}
}
System.Console.WriteLine(output);
System.Console.ReadLine();
}
}
You should probably write your own parser, it's probably easier to maintain :).
Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it.
Try this one:
var final = Regex.Replace(originalTextOfThePage, #"(\w+)(?:\-[\s\r\n]*)?", "$1");
I had to give up an easy solution and did the editing of the regex myself. As a side effect, the new approach goes only twice trough the string.
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
var hyphenationToken = #"(-\s+)?";
return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
}
private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
{
if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;
var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
for (int i = 0; i < sourceString.Length; i++)
{
if (!linkedInsertionPositions.Any())
{
sb.Append(sourceString.Substring(i));
break;
}
if (i == linkedInsertionPositions.First.Value)
{
sb.Append(insertionToken);
}
if (i >= linkedInsertionPositions.First.Value)
{
linkedInsertionPositions.RemoveFirst();
}
sb.Append(sourceString[i]);
}
return sb.ToString();
}
private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
var indexesToInsertPossibleHyphenation = new List<int>();
//....
// Aho-Corasick to find all occurences of all
//strings in "hyphenatedParts" in the "regex" string
// ....
return indexesToInsertPossibleHyphenation;
}
I'm trying to count the number of words from a rich textbox in C# the code that I have below only works if it is a single line. How do I do this without relying on regex or any other special functions.
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
foreach(string av in split_text)
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
new_text = new_text.TrimEnd(',');
split_text = new_text.Split(',');
MessageBox.Show(split_text.Length.ToString ());
char[] delimiters = new char[] {' ', '\r', '\n' };
whole_text.Split(delimiters,StringSplitOptions.RemoveEmptyEntries).Length;
Since you are only interested in word count, and you don't care about individual words, String.Split could be avoided. String.Split is handy, but it unnecessarily generates a (potentially) large number of String objects, which in turn creates an unnecessary burden on the garbage collector. For each word in your text, a new String object needs to be instantiated, and then soon collected since you are not using it.
For a homework assignment, this may not matter, but if your text box contents change often and you do this calculation inside an event handler, it may be wiser to simply iterate through characters manually. If you really want to use String.Split, then go for a simpler version like Yonix recommended.
Otherwise, use an algorithm similar to this:
int wordCount = 0, index = 0;
// skip whitespace until first word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && !char.IsWhiteSpace(text[index]))
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
}
This code should work better with cases where you have multiple spaces between each word, you can test the code online.
There are some better ways to do this, but in keeping with what you've got, try the following:
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
// new line split here
string[] lines = trimmed_text.Split(Environment.NewLine.ToCharArray());
// don't need this here now...
//string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
Now make two foreach loops. One for each line and one for counting words within the lines.
foreach (string line in lines)
{
// Modify the inner foreach to do the split on ' ' here
// instead of split_text
foreach (string av in line.Split(' '))
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
}
new_text = new_text.TrimEnd(',');
// use lines here instead of split_text
lines = new_text.Split(',');
MessageBox.Show(lines.Length.ToString());
}
This was a phone screening interview question that I just took (by a large company located in CA who sells all kinds of devices that starts with a letter "i"), and I think I franked... after I got offline, I wrote this. I wish I were able to do it during interview..
static void Main(string[] args)
{
Debug.Assert(CountWords("Hello world") == 2);
Debug.Assert(CountWords(" Hello world") == 2);
Debug.Assert(CountWords("Hello world ") == 2);
Debug.Assert(CountWords("Hello world") == 2);
}
public static int CountWords(string test)
{
int count = 0;
bool wasInWord = false;
bool inWord = false;
for (int i = 0; i < test.Length; i++)
{
if (inWord)
{
wasInWord = true;
}
if (Char.IsWhiteSpace(test[i]))
{
if (wasInWord)
{
count++;
wasInWord = false;
}
inWord = false;
}
else
{
inWord = true;
}
}
// Check to see if we got out with seeing a word
if (wasInWord)
{
count++;
}
return count;
}
Have a look at the Lines property mentioned in #Jay Riggs comment, along with this overload of String.Split to make the code much simpler. Then the simplest approach would be to loop over each line in the Lines property, call String.Split on it, and add the length of the array it returns to a running count.
EDIT: Also, is there any reason you're using a RichTextBox instead of a TextBox with Multiline set to True?
I use an extension method for grabbing word count in a string. Do note, however, that double spaces will mess the count up.
public static int CountWords(this string line)
{
var wordCount = 0;
for (var i = 0; i < line.Length; i++)
if (line[i] == ' ' || i == line.Length - 1)
wordCount++;
return wordCount;
}
}
Your approach is on the right path. I would do something like, passing the text property of richTextBox1 into the method. This however won't be accurate if your rich textbox is formatting HTML, so you'll need to strip out any HTML tags prior to running the word count:
public static int CountWords(string s)
{
int c = 0;
for (int i = 1; i < s.Length; i++)
{
if (char.IsWhiteSpace(s[i - 1]) == true)
{
if (char.IsLetterOrDigit(s[i]) == true ||
char.IsPunctuation(s[i]))
{
c++;
}
}
}
if (s.Length > 2)
{
c++;
}
return c;
}
We used an adapted form of Yoshi's answer, where we fixed the bug where it would not count the last word in a string if there was no white-space after it:
public static int CountWords(string test)
{
int count = 0;
bool inWord = false;
foreach (char t in test)
{
if (char.IsWhiteSpace(t))
{
inWord = false;
}
else
{
if (!inWord) count++;
inWord = true;
}
}
return count;
}
using System.Collections;
using System;
class Program{
public static void Main(string[] args){
//Enter the value of n
int n = Convert.ToInt32(Console.ReadLine());
string[] s = new string[n];
ArrayList arr = new ArrayList();
//enter the elements
for(int i=0;i<n;i++){
s[i] = Console.ReadLine();
}
string str = "";
//Filter out duplicate values and store in arr
foreach(string i in s){
if(str.Contains(i)){
}else{
arr.Add(i);
}
str += i;
}
//Count the string with arr and s variables
foreach(string i in arr){
int count = 0;
foreach(string j in s){
if(i.Equals(j)){
count++;
}
}
Console.WriteLine(i+" - "+count);
}
}
}
int wordCount = 0;
bool previousLetterWasWhiteSpace = false;
foreach (char letter in keyword)
{
if (char.IsWhiteSpace(letter))
{
previousLetterWasWhiteSpace = true;
}
else
{
if (previousLetterWasWhiteSpace)
{
previousLetterWasWhiteSpace = false;
wordCount++;
}
}
}
public static int WordCount(string str)
{
int num=0;
bool wasInaWord=true;;
if (string.IsNullOrEmpty(str))
{
return num;
}
for (int i=0;i< str.Length;i++)
{
if (i!=0)
{
if (str[i]==' ' && str[i-1]!=' ')
{
num++;
wasInaWord=false;
}
}
if (str[i]!=' ')
{
wasInaWord=true;
}
}
if (wasInaWord)
{
num++;
}
return num;
}
class Program
{
static void Main(string[] args)
{
string str;
int i, wrd, l;
StringBuilder sb = new StringBuilder();
Console.Write("\n\nCount the total number of words in a string
:\n");
Console.Write("---------------------------------------------------
---\n");
Console.Write("Input the string : ");
str = Console.ReadLine();
l = 0;
wrd = 1;
foreach (var a in str)
{
sb.Append(a);
if (str[l] == ' ' || str[l] == '\n' || str[l] == '\t')
{
wrd++;
}
l++;
}
Console.WriteLine(sb.Replace(' ', '\n'));
Console.Write("Total number of words in the string is : {0}\n",
wrd);
Console.ReadLine();
}
This should work
input.Split(' ').ToList().Count;
This can show you the number of words in a line
string line = Console.ReadLine();
string[] word = line.Split(' ');
Console.WriteLine("Words " + word.Length);
You can also do it in this way!! Add this method to your extension methods.
public static int WordsCount(this string str)
{
return Regex.Matches(str, #"((\w+(\s?)))").Count;
}
And call it like this.
string someString = "Let me show how I do it!";
int wc = someString.WordsCount();
How can I delete the first n lines in a string?
Example:
String str = #"a
b
c
d
e";
String output = DeleteLines(str, 2)
//Output is "c
//d
//e"
You can use LINQ:
String str = #"a
b
c
d
e";
int n = 2;
string[] lines = str
.Split(Environment.NewLine.ToCharArray())
.Skip(n)
.ToArray();
string output = string.Join(Environment.NewLine, lines);
// Output is
// "c
// d
// e"
If you need to take into account "\r\n" and "\r" and "\n" it's better to use the following regex:
public static class StringExtensions
{
public static string RemoveFirstLines(string text, int linesCount)
{
var lines = Regex.Split(text, "\r\n|\r|\n").Skip(linesCount);
return string.Join(Environment.NewLine, lines.ToArray());
}
}
Here are some more details about splitting text into lines.
Combination of Get the index of the nth occurrence of a string? (search for Environment.NewLine) and substring should do the trick.
Try the following:
public static string DeleteLines(string s, int linesToRemove)
{
return s.Split(Environment.NewLine.ToCharArray(),
linesToRemove + 1
).Skip(linesToRemove)
.FirstOrDefault();
}
the next example:
string str = #"a
b
c
d
e";
string output = DeleteLines(str, 2);
returns
c
d
e
Try this:
public static string DeleteLines (string text, int lineCount) {
while (text.Split('\n').Length > lineCount)
text = text.Remove(0, text.Split('\n')[0].Length + 1);
return text;
}
It might not be very efficient but it works perfectly for the little project i've been working on recently
Try the following:
private static string DeleteLines(string input, int lines)
{
var result = input;
for(var i = 0; i < lines; i++)
{
var idx = result.IndexOf('\n');
if (idx < 0)
{
// do what you want when there are less than the required lines
return string.Empty;
}
result = result.Substring(idx+1);
}
return result;
}
Note: This method is not ideal for extremely long multi-line strings as it does not consider memory management. If dealing with these kind of strings, I suggest you alter the method to use the StringBuilder class.
With ability to delete first n lines or last n lines:
public static string DeleteLines(
string stringToRemoveLinesFrom,
int numberOfLinesToRemove,
bool startFromBottom = false) {
string toReturn = "";
string[] allLines = stringToRemoveLinesFrom.Split(
separator: Environment.NewLine.ToCharArray(),
options: StringSplitOptions.RemoveEmptyEntries);
if (startFromBottom)
toReturn = String.Join(Environment.NewLine, allLines.Take(allLines.Length - numberOfLinesToRemove));
else
toReturn = String.Join(Environment.NewLine, allLines.Skip(numberOfLinesToRemove));
return toReturn;
}
public static string DeleteLines(string input, int linesToSkip)
{
int startIndex = 0;
for (int i = 0; i < linesToSkip; ++i)
startIndex = input.IndexOf('\n', startIndex) + 1;
return input.Substring(startIndex);
}