I'm messing around with a scanning engine I'm working on and I'm trying to read the memory of a process. My code is below (it's a little messy) but for some reason if I read the memory of an application in different states, or after it has a lot of things loaded into memory, I get the same memory size no matter what. Are my entry point addresses and length incorrect?
If I use a memory editor I don't get the same results I do with this.
Process process = Process.GetProcessesByName(processName)[0];
List<Byte[]> moduleMemory = new List<byte[]>();
byte[] temp;
//MessageBox.Show(pm.FileName);
temp = new byte[pm.ModuleMemorySize];
int read;
if (ReadProcessMemory(process.Handle, pm.BaseAddress, temp, temp.Length, out read)) {
moduleMemory.Add(temp);
}
}
//string d = Encoding.Default.GetString(moduleMemory[0]);
MessageBox.Show("Size: " + moduleMemory[0].Length);
Your problem is probaly caused by the fact, that Process class caches values:
The process component obtains information about a group of properties
all at once. After the Process component has obtained information
about one member of any group, it will cache the values for the other
properties in that group and not obtain new information about the
other members of the group until you call the Refresh method.
Therefore, a property value is not guaranteed to be any newer than the
last call to the Refresh method. The group breakdowns are
operating-system dependent.
Therefore after target process loads some additional modules, process instance will still return old values. Calling process.Refresh() should update all cached values and fix the issue.
As I see this code does nothing more than reading the memory layout of the executable module (.exe file) which the process was created for. So no wonder you get the same size all the time.
I assume you are up to read the "operational" memory of the process. If so, you should have a look at this discussion.
Related
I am writing .NET applications running on Windows Server 2016 that does an http get on a bunch of pieces of a large file. This dramatically speeds up the download process since you can download them in parallel. Unfortunately, once they are downloaded, it takes a fairly long time to pieces them all back together.
There are between 2-4k files that need to be combined. The server this will run on has PLENTLY of memory, close to 800GB. I thought it would make sense to use MemoryStreams to store the downloaded pieces until they can be sequentially written to disk, BUT I am only able to consume about 2.5GB of memory before I get an System.OutOfMemoryException error. The server has hundreds of GB available, and I can't figure out how to use them.
MemoryStreams are built around byte arrays. Arrays cannot be larger than 2GB currently.
The current implementation of System.Array uses Int32 for all its internal counters etc, so the theoretical maximum number of elements is Int32.MaxValue.
There's also a 2GB max-size-per-object limit imposed by the Microsoft CLR.
As you try to put the content in a single MemoryStream the underlying array gets too large, hence the exception.
Try to store the pieces separately, and write them directly to the FileStream (or whatever you use) when ready, without first trying to concatenate them all into 1 object.
According to the source code of the MemoryStream class you will not be able to store more than 2 GB of data into one instance of this class.
The reason for this is that the maximum length of the stream is set to Int32.MaxValue and the maximum index of an array is set to 0x0x7FFFFFC7 which is 2.147.783.591 decimal (= 2 GB).
Snippet MemoryStream
private const int MemStreamMaxLength = Int32.MaxValue;
Snippet array
// We impose limits on maximum array lenght in each dimension to allow efficient
// implementation of advanced range check elimination in future.
// Keep in sync with vm\gcscan.cpp and HashHelpers.MaxPrimeArrayLength.
// The constants are defined in this method: inline SIZE_T MaxArrayLength(SIZE_T componentSize) from gcscan
// We have different max sizes for arrays with elements of size 1 for backwards compatibility
internal const int MaxArrayLength = 0X7FEFFFFF;
internal const int MaxByteArrayLength = 0x7FFFFFC7;
The question More than 2GB of managed memory has already been discussed long time ago on the microsoft forum and has a reference to a blog article about BigArray, getting around the 2GB array size limit there.
Update
I suggest to use the following code which should be able to allocate more than 4 GB on a x64 build but will fail < 4 GB on a x86 build
private static void Main(string[] args)
{
List<byte[]> data = new List<byte[]>();
Random random = new Random();
while (true)
{
try
{
var tmpArray = new byte[1024 * 1024];
random.NextBytes(tmpArray);
data.Add(tmpArray);
Console.WriteLine($"{data.Count} MB allocated");
}
catch
{
Console.WriteLine("Further allocation failed.");
}
}
}
As has already been pointed out, the main problem here is the nature of MemoryStream being backed by a byte[], which has fixed upper size.
The option of using an alternative Stream implementation has been noted. Another alternative is to look into "pipelines", the new IO API. A "pipeline" is based around discontiguous memory, which means it isn't required to use a single contiguous buffer; the pipelines library will allocate multiple slabs as needed, which your code can process. I have written extensively on this topic; part 1 is here. Part 3 probably has the most code focus.
Just to confirm that I understand your question: you're downloading a single very large file in multiple parallel chunks and you know how big the final file is? If you don't then this does get a bit more complicated but it can still be done.
The best option is probably to use a MemoryMappedFile (MMF). What you'll do is to create the destination file via MMF. Each thread will create a view accessor to that file and write to it in parallel. At the end, close the MMF. This essentially gives you the behavior that you wanted with MemoryStreams but Windows backs the file by disk. One of the benefits to this approach is that Windows manages storing the data to disk in the background (flushing) so you don't have to, and should result in excellent performance.
In Shredding files in .NET it is recommended to use Eraser or this code here on CodeProject to securely erase a file in .NET.
I was trying to make my own method of doing so, as the code from CodeProject had some problems for me. Here's what I came up with:
public static void secureDelete(string file, bool deleteFile = true)
{
string nfName = "deleted" + rnd.Next(1000000000, 2147483647) + ".del";
string fName = Path.GetFileName(file);
System.IO.File.Move(file, file.Replace(fName, nfName));
file = file.Replace(fName, nfName);
int overWritten = 0;
while (overWritten <= 7)
{
byte[] data = new byte[1 * 1024 * 1024];
rnd.NextBytes(data);
File.WriteAllBytes(file, data);
overWritten += 1;
}
if (deleteFile) { File.Delete(file); }
}
It seems to work fine. It renames the file randomly and then overwrites it with 1 mb of random data 7 times. However, I was wondering how safe it actually is, and if there was anyway I could make it safer?
A file system, especially when accessed through a higher-level API such as the ones found in System.IO is so many levels of abstraction above the actual storage implementation that this approach makes little sense for modern drives.
To be clear: the CodeProject article, which promotes overwriting a file by name multiple times, is absolute nonsense - for SSDs at least. There is no guarantee whatsoever that writing to a file at some path multiple times writes to the same physical location on disk every time.
Of course, opening a file with read-write access and overwriting it from the beginning, conceptually writes to the same "location". But that location is pretty abstract.
See it like this: hard disks, but especially solid state drives, might take a write, such as "set byte N of cluster M to O", and actually write an entire new cluster to an entirely different location on the drive, to prolong the drive's lifetime (as repeated writes to the same memory cells may damage the drive).
From Coding for SSDs – Part 3: Pages, Blocks, and the Flash Translation Layer | Code Capsule:
Pages cannot be overwritten
A NAND-flash page can be written to only if it is in the “free” state. When data is changed, the content of the page is copied into an internal register, the data is updated, and the new version is stored in a “free” page, an operation called “read-modify-write”. The data is not updated in-place, as the “free” page is a different page than the page that originally contained the data. Once the data is persisted to the drive, the original page is marked as being “stale”, and will remain as such until it is erased.
This means that somewhere on the drive, the original data is still readable, namely in the cluster M to which a write was requested. That is, until it is overwritten. The cluster is now marked as "free", but you'll need very low-level access to the disk to access that cluster in order to overwrite it, and I'm not sure that's possible with SSDs.
Even if you would overwrite the entire SSD or hard drive multiple times, chances are that some of your very private data is hidden in a now defunct sector or page on the disk or SSD, because at the moment of overwriting or clearing it the drive determined that location to be defective. A forensics team will be able to read this data (albeit damaged). So, if you have data on a hard drive that can be used against you: toss the drive into a fire.
See also Get file offset on disk/cluster number for some more (links to) information about lower-level file system APIs.
But all of this is to be taken with quite a grain of salt, as all of this is hearsay and I have no actual experience with this level of disk access.
I have a console app that reads in a large text file with 40k+ lines, each line is a key that I use in a search for which the results are written to a output file. Issue is I leave this console app running for a while until it just suddenly closes and I realize that the process memory usage was really high was sitting at 1.6gb when I last saw it crash.
I looked around and didn't find many answers I did try to use the gcAllowVeryLargeObjects but that seems like I'm just dodging the problem.
Below is a snippet from my main() of where I write out to the file. I can't seem to understand why the memory usage gets so high. I flush the writer after every write (could it be because I'm keeping the file open for such a long period of time?).
TextWriter writer = new StreamWriter("output.csv", false));
foreach (var item in list)
{
Console.WriteLine("{0}/{1}", count, numofitem);
var result = TableServiceContext.Read(p.id);
if (result != null)
{
writer.WriteLine(String.Join(",", result.id,
result.code,
result.hash));
}
count++;
writer.Flush();
}
writer.Close();
Edit: I have 32gb of ram on my computer so I am sure it's not running out of memory because I don't have enough ram.
Edit2: changed the name of the repository as that was misleading.
If the average line length is 1KB then 40K lines is 40MB, and it nothing. That's why, I'm pretty sure problem is in your repository class. If it is EF repository, try to recreate DbContext for each line.
If you want to tune up your program, then, you can use the following method: Try to put timestamps to Console output, you can use Stopwatch class, and try to recreate your repository each 10 or 100 or N lines. Then, looking at timestamps, you can find optimal N to use.
var timer = Stopwatch.StartNew();
...
Console.WriteLine(timer.ElapsedMilliseconds);
From looking at the code I think the problem isN't the Streamwriter but some memory leak in your repository. Suggestions to check:
replace the repository by some dummy e.g. class dummy_repository with just the three properties id, value, hash.
likewise create a long "list" e.g. 40k small entries.
run your program and see if it still consumes memory (I am pretty sure it will not)
then step by step add back your original parts. See what step causes the memory leak.
I'm trying to monitor memory usage.
I wrote a sample c# code to be certain that I'm measuring correctly:
var list = new List<byte[]>();
int INCREMENT = 100; // 100MB
for(int i=0; i<10; i++){
list.Add(new byte[INCREMENT * 1024 * 1024]);// 100 MB steps
Thread.sleep(4000);
}
I used task manager and recorded the readings for "Private Working Set":
3'800k = 3.7M
3'900k = 3.8M
4'100k = 4M
4'300k = 4.2M
4'500k = 4.4M
5'200k = 5.07M
5'400k = 5.27M
5'600k = 5.47M
5'900k = 5.76M
6'100k = 5.96M
Does anyone know why the numbers make no sense?
Instead of looking at "Memory (Private Working Set)", look at "Commit Size".
You may have to add it with "/View/Select Columns..." then check "Commit Size".
For me it increased by about a GB, while working set went up by 3 MB.
If you look at the definition for "Memory (private working set)" in Task Manager, it says "Amount of physical memory in use by the process that cannot be used by other processes". This is very different from "private bytes" which is the number of virtual memory bytes that cannot be shared by other processes.
The data you allocate in your sample may or may not be backed by physical memory at any given time. That's what is reflected by "Memory (private working set)". Since you never write any of that memory Windows has cleverly decided not to back the virtual memory with real memory pages. If you fill the array with data you'll see that the corresponding memory pages are allocated.
When I run the code posted above, I actually see the "commit size" in task manager increase.
If I want to retrieve that in a sript, I need to use the wmi api.
When I use a wmi query such as this:
SELECT PrivateBytes FROM Win32_PerfRawData_PerfProc_Process WHERE IDProcess=1234
it does not detect the increase.
You can run this query either in powershell, python,...while the test app is running.
I appreciate if someone can comment on this as well.
I am currently running nginx on my windows system and am making a little control panel to show statistics of my web server.
I'm trying to get the performance counters for the CPU Usage and Memory Usage for the process but nginx shows as more than one process, it can vary from 2 - 5 depending on the setting in the configuration file. My setting shows two processes, so nginx.exe and nginx.exe
I know what performance counters to use, % Processor Time and Working Set - Private but how would I be able to get the individual values of both processes so i can add them together for a final value?
I tried using the code found at Waffles question but it only could output the values for the first process out of the two.
Thanks.
EDIT - Working Code
for (int i = 0; i < instances.Length; i++)
{
//i = i + 1;
if (i == 0)
{
toPopulate = new PerformanceCounter
("Process", "Working Set - Private",
toImport[i].ProcessName,
true);
}
else
{
toPopulate = new PerformanceCounter
("Process", "Working Set - Private",
toImport[i].ProcessName + "#" + i,
true);
}
totalNginRam += toPopulate.NextValue();
instances[i] = toPopulate;
}
Look at the accepted answer to that question. Try running perfmon. Processes that have the same names will be identified as something like this process#1, process#2, etc. In your case it could be nginx#1, nginx#2, etc.
Edit:
You need to pass the instance name to either the appropriate constructor overload or the InstanceName property. According to this, it looks like the proper format is to use underscore. So, process_1, process_2.
When using Azure Log Analytics, you can specify a path such as
Process(nginx*)\% Processor Time
This seems to be collecting data from all processes that match the wildcard pattern at any time. I can confirm that it picks up data from new processes (started after changing the settings) and it does not pick up data from "dead" processes. However, the InstanceName (such as nginx#3) may be reused, making it hard to tell when a process was "replaced" by a new one.
I have not been able to do this in Performance Monitor. The closest thing is to type "nginx*" in the search box of the "Add Counters" dialog, then select <All searched instances>. This will create one counter per process, and counters will not be dynamically added or removed as processes are started or stopped.
Perhaps it can be done with data collector sets created via PowerShell. However, even if you are able to set a path with a wildcard in the instance part, it is not guaranteed that it will behave as you expect (i.e., automatically collect data from all processes that are running at any time).