Split text file every 120,000 Lines? - c#

So I have a textfile I need to split every 120,000, when it's split at the 120,000th line I need the rest to into another text file. Any ideas on this guys?

You can use Batch from MoreLINQ to group your lines into batches of 120,000 lines, which can then each be handles separately.
foreach(var batch in File.ReadLines(inputFile).Batch(120000))
WriteToFile(batch);

var lines = new List<string>();
int counter = 0,i = 1;
string line;
using (var reader = new StreamReader("filePath"))
{
while ((line = reader.ReadLine()) != null)
{
lines.Add(line);
counter++;
if (counter == 120000)
{
string fileName = String.Format("file{0}.txt",i);
File.WriteAllLines(fileName,lines);
lines.Clear();
counter = 0;
i++;
}
}
}
if(lines.Count > 0) File.WriteAllLines("path", lines);
Note: You should use different file names when using the File.WriteAllLines, otherwise you will just overwrite a single file's content.For example you can use another counter for it and increment it for every file, "file1, file2 etc..".

Just another way using Enumerable.GroupBy and "integer division groups":
int batchSize = 120000;
var fileGroups = File.ReadLines(path)
.Select((line, index) => new { line, index })
.GroupBy(x => x.index / batchSize)
.Select((group, index) => new {
Path = Path.Combine(dir, string.Format("FileName_{0}.txt", index + 1)),
Lines = group.Select(x => x.line)
});
foreach (var file in fileGroups)
File.WriteAllLines(file.Path, file.Lines);

Related

Count how many files starts with the same first characters c#

I want to make function that will count how many files in selected folder starts with the same 10 characters.
For example in folder will be files named File1, File2, File3 and int count will give 1 because all 3 files starts with the same characters "File", if in folder will be
File1,File2,File3,Docs1,Docs2,pdfs1,pdfs2,pdfs3,pdfs4
will give 3, because there are 3 unique values for fileName.Substring(0, 4).
I've tried something like this, but it gives overall number of files in folder.
int count = 0;
foreach (string file in Directory.GetFiles(folderLocation))
{
string fileName = Path.GetFileName(file);
if (fileName.Substring(0, 10) == fileName.Substring(0, 10))
{
count++;
}
}
Any idea how to count this?
You can try querying directory with a help of Linq:
using System.IO;
using System.Linq;
...
int n = 10;
int count = Directory
.EnumerateFiles(folderLocation, "*.*")
.Select(file => Path.GetFileNameWithoutExtension(file))
.Select(file => file.Length > n ? file.Substring(0, n) : file)
.GroupBy(name => name, StringComparer.OrdinalIgnoreCase)
.OrderByDescending(group => group.Count())
.FirstOrDefault()
?.Count() ?? 0;
You could instantiate a list of strings of files with a unique name, and check if each file is in that list or not:
int count = 0;
int length = 0;
List<string> list = new List<string>();
foreach (string file in Directory.GetFiles(folderLocation))
{
boolean inKnown = false;
string fileName = Path.GetFileName(file);
for (string s in list)
{
if (s.Length() < length)
{
// Add to known list just so that we don't check for this string later
inKnown = true;
count--;
break;
}
if (s.Substring(0, length) == fileName.Substring(0, length))
{
inKnown = true;
break;
}
}
if (!inKnown)
{
count++;
list.Add(s);
}
}
The limitation here is that you are asking if the first ten characters are the same, but your examples given showed the first 4, so just adjust the length variable according to how many characters you would like to check for.
#acornTime give me idea, his solution didn't work but this worked. Thanks for help!
List<string> list = new List<string>();
foreach (string file in Directory.GetFiles(folderLocation))
{
string fileName = Path.GetFileName(file);
list.Add(fileName.Substring(0, 10));
}
list = list.Distinct().ToList();
//count how many items are in list
int count = list.Count;

Remove duplicate combination of numbers in C# from csv

I'm trying to remove the duplicate combination from a csv file.
I tried using Distinct but it seems to stay the same.
string path;
string newcsvpath = #"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";
OpenFileDialog openfileDial = new OpenFileDialog();
if (openfileDial.ShowDialog() == DialogResult.OK)
{
path = openfileDial.FileName;
var lines = File.ReadLines(path);
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();
var unique = grouped.Select(g => g.First());
var buffer = new StringBuilder();
foreach (var name in unique)
{
string value = name;
buffer.AppendLine(value);
}
File.WriteAllText(newcsvpath ,buffer.ToString());
label5.Text = "Complete";
}
For example, I have a combination of
{ 1,1,1,1,1,1,1,1 } { 1,1,1,1,1,1,1,2 }
{ 2,1,1,1,1,1,1,1 } { 1,1,1,2,1,1,1,1 }
The output should be
{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 }
From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.
So from reading your file, you have:
var lines = new[]
{
"1,1,1,1,1,1,1,1",
"1,1,1,1,1,1,1,2",
"2,1,1,1,1,1,1,1",
"1,1,1,2,1,1,1,1"
};
Now let's convert it to an array of number sequences:
var linesAsNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.ToArray())
.ToArray();
Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:
var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.OrderBy(number => number)
.ToArray())
.ToArray();
When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question
var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());
Try it
HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
StringBuilder textEditor= new StringBuilder();
foreach (string col in columns)
{
textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
if (!record.Add(textEditor.ToString())
{
}
}

Read word list from text file with control word length

How can read word list text file but check word length before reading words?
var s = File.ReadAllText("words_alpha.txt").Where(r=>r.ToString().Length<6 &&
r.ToString()!="\n"
try this instead
var s = File.ReadAllLines(#"words_alpha.txt")
.Select((x, i) => new { Line = x, LineNumber = i })
.Where(x => x.Line.Length > 6 && x.ToString()!="\n")
.ToList();
hope this helps.
var lines = File.ReadAllLines("words_alpha.txt");
foreach (var line in lines)
{
if(line.Length > 6)
{
//YOUR CODE
}
}

Occurence of elements in the file with c# and Dictionary

I have a file as
outlook temperature Humidity Windy PlayTennis
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
I want to find occurence of each element e.g
sunny: 2
rain: 3
overcast:1
hot: 3
and so on
My code is:
string file = openFileDialog1.FileName;
var text1 = File.ReadAllLines(file);
StringBuilder str = new StringBuilder();
string[] lines = File.ReadAllLines(file);
string[] nonempty=lines.Where(s => s.Trim(' ')!="")
.Select(s => Regex.Replace(s, #"\s+", " ")).ToArray();
string[] colheader = null;
if (nonempty.Length > 0)
colheader = nonempty[0].Split();
else
return;
var linevalue = nonempty.Skip(1).Select(l => l.Split());
int colcount = colheader.Length;
Dictionary<string, string> colvalue = new Dictionary<string, string>();
for (int i = 0; i < colcount; i++)
{
int k = 0;
foreach (string[] values in linevalue)
{
if(! colvalue.ContainsKey(values[i]))
{
colvalue.Add(values[i],colheader[i]);
}
label2.Text = label2.Text + k.ToString();
}
}
foreach (KeyValuePair<string, string> pair in colvalue)
{
label1.Text += pair.Key+ "\n";
}
Output I get here is
sunny
overcast
rain
hot
mild
cool
N
P
true
false
I also want to find the occurence, which I am unable to get. Can u please help me out here.
This LINQ query will return Dictionary<string, int> which will contain each word in file as key, and word's occurrences as value:
var occurences = File.ReadAllLines(file).Skip(1) // skip titles line
.SelectMany(l => l.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))
.GroupBy(w => w)
.ToDictionary(g => g.Key, g => g.Count());
Usage of dictionary:
int sunnyOccurences = occurences["sunny"];
foreach(var pair in occurences)
label1.Text += String.Format("{0}: {1}\n", pair.Key, pair.Value);
Seems to me like you are implementing a simple Tag Cloud. I have used non-generic collection but you can replace it with generic. Replace the HashTable with Dictionary
Follow this code:
Hashtable tagCloud = new Hashtable();
ArrayList frequency = new ArrayList();
Read from a file and store it as array
string[] lines = File.ReadAllLines("file.txt");
//use the specific delimiter
char[] delimiter = new char[] { ' ' };
StringBuilder buffer = new StringBuilder();
foreach (string line in lines)
{
if (line.ToString().Length != 0)
{
buffer.Append((" " + line.Trim()));
}
}
string[] words = buffer.ToString().Trim().Split(delimiter);
Storing occurrence of each word.
List<string> listOfWords = new List<string>(words);
foreach (string i in listOfWords)
{
int c = 0;
foreach (string j in words)
{
if (i.Equals(j))
c++;
}
frequency.Add(c);
}
Store as key value pair. Value will be word and key will be its occurrence
for (int i = 0; i < listOfWords.Count; i++)
{
//use dictionary here
tagCloud.Add(listOfWords[i], (int)frequency[i]);
}
If all you want is the keyword and a count of how many times they appear in the file, then lazyberezovsky's solution is about as elegant of a solution as you will find. But if you need to do any other metrics on the file's data, then I would load the file into a collection that keeps your other metadata intact.
Something simple like:
var forecasts = File.ReadAllLines(file).Skip(1) // skip the header row
.Select(line => line.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)) // split the line into an array of strings
.Select (f =>
new
{
Outlook = f[0],
Temperature = f[1],
Humidity = f[2],
Windy = f[3],
PlayTennis = f[4]
});
will give you an IEnumerable<> of an anonymous type that has properties that can be queried.
For example if you wanted to see how many times "sunny" occurred in the Outlook then you could just use LINQ to do this:
var count = forecasts.Count( f => f.Outlook == "sunny");
Or if you just wanted the list of all outlooks you could write:
var outlooks = forecasts.Select(f => f.Outlook).Distinct();
Where this is useful is when you want to do more complicated queries like "How many rainy cool days are there?
var count = forecasts.Count (f => f.Outlook == "rain" && f.Temperature == "cool");
Again if you just want all words and their occurrence count, then this is overkill.

Clearing a line of a txt file by the line ID

I have looked all over for the answer to this, but I can't find it anywhere. I need to be able to clear a line from a txt file by the last integer in the line (the ID number), but I have no idea how to do that. Please help? Basically I was thinking that I need to find the last integer, and if it does not equal to the input, then it would move to the next line until it finds the right integer. Then that line is cleared. Here is some of my code that obviously doesn't work:
public static void TicID(CommandArgs args)
{
if (args.Parameters.Count == 1)
{
if (i == 1)
{
try
{
string idToDelete = args.Parameters[0];
StreamReader idreader = new StreamReader("Tickets.txt");
StreamWriter iddeleter = new StreamWriter("Tickets.txt");
string id = Convert.ToString(idreader.Read());
string line = null;
while (idreader.Peek() >= 0)
{
if (String.Compare(id, idToDelete) == 0)
{
iddeleter.WriteLine(line);
}
else
{
idreader.ReadLine();
}
}
}
The most straightforward way to delete lines is to write the lines that should not be deleted:
var idToDelete = "1";
var path = #"C:\Temp\Test.txt";
var lines = File.ReadAllLines(path);
using (var writer = new StreamWriter(path, false)) {
for (var i = 0; i < lines.Length; i++) {
var line = lines[i];
//assuming it's a CSV file
var cols = line.Split(',');
var id = cols[cols.Length - 1];
if (id != idToDelete) {
writer.WriteLine(line);
}
}
}
This is the LINQ-way:
var lines = from line in File.ReadAllLines(path)
let cols = line.Split(',')
let id = cols[cols.Length - 1]
where id != idToDelete
select line;
File.WriteAllLines(path, lines);
Create a StreamReader object and read line by line into a string array or something like that using StreamReader's instance method ReadLine() and now find your line of choice in your text array and delete it.
Important note:
do { /*see description above*/ } while (streamReader.Peak() != -1);

Categories