Cannot create a capped collection larger than 500 Megabytes - c#

I'm using a Mongo db on 32bit system and I need to create a large capped collection with a max size of 1GB. Everything works fine on 64bit system, but on 32bit I'm getting the error:
com.mongodb.CommandResult$CommandFailure: command failed [command failed [create] {
"serverUsed" : "localhost:27017" ,
"errmsg" : "exception: assertion db\\pdfile.cpp:437" ,
"code" : 0 ,
"ok" : 0.0}
The total storage size for the server is 2GB on 32bit system, but even with this size I can't create a collection larger than 500MB. What does this magic number mean?
Mongo db server version is 2.0.6
Additional info:
I have a couple of database files, the total size of which is 34MB. Before running a mongo db, I'm copying those files into the 'data' directory, starting Mongo db and then in shell I see the same number for the totat size - 35651584 (34MB) (the command used is taken from the comments below). If I try to create a collection of size 500MB I see a new file added (512MB). But if for example I will try to create a collection of size 600MB, I have an error discribed above (but the 512MB file still added).
The Mongo db server log
The Mongo db is started with the command line options:
> db.adminCommand("getCmdLineOpts")
{
"argv" : [
"mongod.exe",
"--dbpath",
"..\\data",
"-vvvvvv",
"--logpath",
"..\\log\\server.log"
],
"parsed" : {
"dbpath" : "..\\data",
"logpath" : "..\\log\\server.log",
"vvvvvv" : true
},
"ok" : 1
}
>

MongoDB runs much better on a 64-bit system, can you change to x64? As Stennie said you're must likely hitting a mmap limit due to other data in your database.
Can you test this hypothesis by connecting with the mongo shell and trying to create a new by running a new collection that is 1 byte larger than 512 MB -
db.createCollection("mycoll6", {capped:true, size:536870913})
You should hopefully get the following error message -
"errmsg" : "exception: can't map file memory - mongo requires 64 bit build for larger datasets",
In the Mongo shell, connect to the admin database and view the size of your database to see how much data you have -
use admin
show dbs
Update: based on some additional testing (I used Ubuntu 12.04 32-bit), this seems like it could be a bug.
Ubuntu Testing
db.createCollection("my13", {capped:true, size:536608768})
{
"errmsg" : "exception: assertion db/pdfile.cpp:437",
"code" : 0,
"ok" : 0
}
db.createCollection("my13", {capped:true, size:536608767})
{ "ok" : 1 }`
536608767 bytes is a little under 512 MB, leaving room for some sort of header in the file.
I thought it was maybe related to [smallfiles][2] as all 32-bit installs run with that option, however, an x64 build with the smallfiles does not display the same symptoms.
I have logged SERVER-6722 for this issue.

Related

Get Volume Guid of EFI partition on Windows 2012 R2

I am trying to extract the VolumeGuid of EFI partition. I have been able to do it successfully on Windows 10 Machine using WMI query and Via code using C# ManagementObjectSearcher. I created a VHD with a GPT partition type within it I created the following, a recovery partition, EFI system partition and a basic data partition. The following is the WMI query I run in powershell after mounting the VHD.
I am unable to extract the same in a Windows 2012 R2 machine. Rest of the partitions volume guid I am able to extract in Window 2012 R2 machine.
Sample DiskPart script
CREATE PARTITION PRIMARY SIZE=450 OFFSET=1024 ID=de94bba4-06d1-4d40-a16a-bfd50179d6ac
FORMAT FS=NTFS LABEL="Recovery" UNIT=4096 QUICK
CREATE PARTITION PRIMARY SIZE=99 OFFSET=461824 ID=c12a7328-f81f-11d2-ba4b-00a0c93ec93b
FORMAT FS=FAT32 LABEL="" UNIT=512 QUICK
CREATE PARTITION PRIMARY SIZE=129481 OFFSET=579584 ID=ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
FORMAT FS=NTFS LABEL="" UNIT=4096 QUICK
WMI Query
"Get-WmiObject -Query "SELECT * FROM Msft_Volume" -Namespace Root/Microsoft/Windows/Storage"
In powershell on windows 10, I can see out for the EFI partition as shown below.
__GENUS : 2
__CLASS : MSFT_Volume
__SUPERCLASS : MSFT_StorageObject
__DYNASTY : MSFT_StorageObject
__RELPATH : MSFT_Volume.ObjectId="{1}\\\\computer\\root/Microsoft/Windows/Storage/Providers_v2\\WSP_Volume
.ObjectId=\"{efe10384-2fc4-11e9-bb16-806e6f6e6963}:VO:\\\\?\\Volume{f2f37b30-47b8-4553-804d-9b14
f6b32e1b}\\\""
__PROPERTY_COUNT : 18
__DERIVATION : {MSFT_StorageObject}
__SERVER : computer
__NAMESPACE : Root\Microsoft\Windows\Storage
__PATH : \\computer\Root\Microsoft\Windows\Storage:MSFT_Volume.ObjectId="{1}\\\\computer\\root/Micros
oft/Windows/Storage/Providers_v2\\WSP_Volume.ObjectId=\"{efe10384-2fc4-11e9-bb16-806e6f6e6963}:V
O:\\\\?\\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\\\""
AllocationUnitSize : 512
DedupMode : 4
DriveLetter :
DriveType : 3
FileSystem : FAT32
FileSystemLabel :
FileSystemType : 6
HealthStatus : 0
ObjectId : {1}\\computer\root/Microsoft/Windows/Storage/Providers_v2\WSP_Volume.ObjectId="{efe10384-2fc4-
11e9-bb16-806e6f6e6963}:VO:\\?\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\"
OperationalStatus : {2}
PassThroughClass :
PassThroughIds :
PassThroughNamespace :
PassThroughServer :
Path : \\?\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\
Size : 99614720
SizeRemaining : 99613696
UniqueId : **\\?\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\**
PSComputerName : computer
However the above WMI query does not return details for EFI partition when running on the "Windows 2012 R2". Even the same query run using c# code doesnt work.
Is there any restriction on Windows 2012 R2 that prevents it from displaying the EFI partition details?
Is there any other way to extract the volume guid of EFI partition?
Currently I had to assign a drive letter to EFI partition in order to read it, I would prefer using the \?\Volume{guid} syntax to open the drive and read it programmatically as it will avoid unnecessarily assigning a drive letter.
Kindly suggest.

MongoDb C# driver is slow even in simple queries

I need to retrieve all documents from a collection in mongodb. There's nothing special about the query, I just need all the documents to be returned.
The statuses of my collection are:
{
"ns" : "MyDb.MyCollection",
"size" : 206553804,
"count" : 123663,
"avgObjSize" : 1670,
"storageSize" : 30953472,
"capped" : false,
"nindexes" : 1,
"totalIndexSize" : 1122304,
"indexSizes" : {
"_id_" : 1122304
},
"ok" : 1.0
}
In my C # function, I wrote the following code:
var client = new MongoClient("mongodb://localhost:27017");
var database = client.GetDatabase("MyDb");
var collection = database.GetCollection<BsonDocument>("MyCollection");
var documents = collection.Find(new BsonDocument()).ToList();
The problem is that in the line of the function find() takes about 60 seconds to return all documents and this is hindering my application performance. This simple query in other collections is also taking longer than usual to return something. The database is running locally.
I'm using:
MongoDb 3.6.10;
MongoDb.Driver 2.7.0
.NET Framework 4.7
Windows 10
Also, my machine is a 7th generation i7 with 16gb ram and 256 SSD. Queries take the same time when the application is on the production server.
Is there anything I can do to improve this?
Thanks in advance

GhostscriptRasterizer Objects Returns 0 as PageCount value

txtStatus.Text = "";
if (!File.Exists(txtOpenLocation.Text))
{
txtStatus.Text = "File Not Found";
return;
}
txtStatus.Text = "File Found";
const string DLL_32BITS = "gsdll32.dll";
const string DLL_64BITS = "gsdll64.dll";
//select DLL based on arch
string NomeGhostscriptDLL;
if (Environment.Is64BitProcess)
{
NomeGhostscriptDLL = DLL_64BITS;
}
else
{
NomeGhostscriptDLL = DLL_32BITS;
}
GhostscriptVersionInfo gvi = new GhostscriptVersionInfo(NomeGhostscriptDLL);
var rasterizer = new GhostscriptRasterizer();
try
{
rasterizer.Open(txtOpenLocation.Text, gvi, true);
Console.WriteLine(rasterizer.PageCount); //This line always prints 0
} catch(Exception er)
{
txtStatus.AppendText("\r\nUnable to Load the File: "+ er.ToString());
return;
}
I have googled on it, but got no solution, and no helpful documentation about the rasterizer.Open() function.
The Console.WriteLine(rasterizer.PageCount); always prints 0, regardless which pdf file I load.
txtStatusis a multiline TextBox in UI. txtOpenLocation is another TextBox in UI, non-editable by user, and its value is set by a OpenFileDialog.
I am using Visual Studio 2019 Community Edition.
Another observation I feel worth mentioning— for every pdf file on my machine, when I try to open any pdf file with either Adobe Acrobat DC or Foxit Reader, first the reader crashes, becomes 'not responsive' for about 10 to 15 seconds and then it opens the pdf file.
I had the same problem yesterday, I downloaded version 9.26 from here https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926/gs926aw32.exe, and works!
I think this is a bug of ghostscript 9.27 release.
This isn't a bug at all, I suspect, (I certainly do not believe its a Ghostscript bug) but its probably a change in behaviour. Due to reported security vulnerabilities the Ghostscript developers have been removing access to many non-standard PostScript extensions (unique to Ghostscript). Most recently access to the dictionary for processing PDF files has been secured.
My suspicion is that Ghostscript.NET (which is not maintained by the Ghostscript developers) is using one or more non-standard extensions to do the work of retrieving the count of pages. Without knowing what exactly is being used currently I can't be sure of course.
If the developer of Ghostscript.NET would like to contact us and confirm this is the problem then we can discuss the currently supported method for retrieving the count of pages in a PDF file.
It won't help at all to send me a project using Ghostscript.NET, since I don't know anything about it. I'm also not a C# or .NET developer, so the code would likely be meaningless to me.
Ghostscript returns considerable information on the back channel, stdout and/or stderr. These can be redirected to an application-defined data sink. I imagine that Ghostscript.NET will give you some means to retrieve these and if you plan to do any real development involving Ghostscript then I would very strongly reccomend that you find out how to get this information.
When you say 'no error is thrown from Ghostscript' I think you may be confusing Ghostscript and Ghostscript.NET. Without seeing the back channel from Ghostscript I don't see how you can tell if Ghostscript is generating an error.
NB if you plan to distribute your application you must abide by the terms of the AGPL version 3 (which is the license applying to Ghostscript), and that includes shipping a copy of the license, and some means for informing users where they can get the original.
As with the OP and the primary answer to this question, I too encountered this exact issue just yesterday.
I just want to add that for me the suggested version of ghostscript (9.26) wasn't working. It complained that I should be using a 64 bit version.
For those who need that, it's here: https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926/gs926aw64.exe
I had to just guess at the URL. I'm amazed at how difficult it has been to find older versions.
This issue has been fixed in the latest release v.1.2.2 of GhostScript.NET
The fix was to stop using pdfdict and GS_PDF_ProcSet if the version was over 9.26 as these two functions were made private by the Ghostscript team for security reasons.
I am not very familiar with GhostScript or PostScript, however, I have traced down the issue within the GhostScript.NET code, which uses the gsapi to execute functions. The function that is being executed and failing on gs is within file GhostscriptViewerPdfFormatHandler.cs on the GhostScript.NET Project.
Upon further testing through using both a gs9.26 as recommended by Oswaldo Cotes Solano and comparing the results with gs9.52 using a test script, I have found that GS_PDF_ProcSet is causing a Unrecoverable error, exit code 1 on gs 9.52.
This leads to failure while using the gs9.52 API, however, it is by design since gs9.27 to add security. While -dNOSAFER is not recommended for production ready applications, it will get us by.
An example of the intended execution and result that works in gs9.26 should be similar to:
gswin32c.exe -q -dNOSAFER -sPDFname=c:/pdfs/test.pdf c:/pdfs/pdfpagecount.ps
Executing:
/GSNETViewer_PDFpage {
(%GSNET_VIEWER_PDF_PAGE: ) print dup == flush
pdfgetpage /Page exch store
Page /MediaBox pget
{ (%GSNET_VIEWER_PDF_MEDIA: ) print == flush }
if
Page /CropBox pget
{ (%GSNET_VIEWER_PDF_CROP: ) print == flush }
if
Page /Rotate pget not { 0 } if
(%GSNET_VIEWER_PDF_ROTATE: ) print == flush
} def
Executing:
/Page null def
/Page# 0 def
/PDFSave null def
/DSCPageCount 0 def
Executing:
GS_PDF_ProcSet begin
pdfdict begin
Executing: (C:/pdfs/Output.pdf) (r) file runpdfbegin
Executing: /FirstPage where { pop FirstPage } { 1 } ifelse
Executing: /LastPage where { pop LastPage } { pdfpagecount } ifelse
Executing: flush (%GSNET_VIEWER_PDF_PAGES: ) print exch =only ( ) print =only (
) print flush
%GSNET_VIEWER_PDF_PAGES: 1 1
Executing: process_trailer_attrs
Executing: 1 GSNETViewer_PDFpage
%GSNET_VIEWER_PDF_PAGE: 1
%GSNET_VIEWER_PDF_MEDIA: [0.0 0.0 612.0 792.0]
%GSNET_VIEWER_PDF_CROP: [0.0 0.0 612.0 792.0]
%GSNET_VIEWER_PDF_ROTATE: 0
Executing: Page pdfshowpage_init pdfshowpage_finish
Loading NimbusSans-Regular font from %rom%Resource/Font/NimbusSans-Regular... 4124032 2548352 5183568 3818848 3 done.
showpage, press <return> to continue
While running 2.52 with -dNOSAFER and also adding the -dNOSAFER argument to CLI to avoid the file access error, and the GhostScript.NET source to allow for the same functionality. Although the -dNOSAFER option is not the ideal choice and may have vulnerabilities, to test without diving further in, I've used this method for testing.
C:\Program Files\gs\-\bin>gswin64c.exe -q -dNOSAFER -sPDFname=test.pdf c:/pdfs/pdfpagecount.ps
Error: /undefined in GS_PDF_ProcSet
Operand stack:
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
Dictionary stack:
--dict:738/1123(ro)(G)-- --dict:0/20(G)-- --dict:84/200(L)--
Current allocation mode is local
Current file position is 992
GPL Ghostscript 9.52: Unrecoverable error, exit code 1
Ultimately making a minor change 3 locations of the source has resulted in a working solution with 9.52. I'll do a pull-request with our changes and update the community when a pull request has been issued, otherwise, you can make a pull directly to our fork.
I've had the same problem. I was using c# (.NET) Ghostscript.NET (version 1.2.3). Problem was PDF file name. If it had parenthesis ) or (, then that problem occurs.
I had to rename PDF file in order to escape those characters.
using Ghostscript.NET.Rasterizer;
var strFilePath = "C:\PdfFile(.pdf";
using (var rasterizer = new GhostscriptRasterizer())
{
rasterizer.Open(strFilePath);
var strPageCount = rasterizer.PageCount; //return 0
}
var pattern = "[^A-Za-z0-9 .-]+";
var regEx = new Regex(pattern);
strFilePath = regEx.Replace(strFilePath, "");
using (var rasterizer = new GhostscriptRasterizer())
{
rasterizer.Open(strFilePath);
var strPageCount1 = rasterizer.PageCount; //return number of pages
}

Read bytea data is slow in PostgreSQL

I store data in bytea column in PostgreSQL 9.5 database on Windows.
The data transmission speed is lower than I expect : about 1.5mb per second.
The following code
using (var conn = ConnectionProvider.GetOpened())
using (var comm = new NpgsqlCommand("SELECT mycolumn FROM mytable", conn))
using (var dr = comm.ExecuteReader())
{
var clock = Stopwatch.StartNew();
while (dr.Read())
{
var bytes = (byte[])dr[0];
Debug.WriteLine($"bytes={bytes.Length}, time={clock.Elapsed}");
clock.Restart();
}
}
Produces the following output
bytes=3895534, time=00:00:02.4397086
bytes=4085257, time=00:00:02.7220734
bytes=4333460, time=00:00:02.4462513
bytes=4656500, time=00:00:02.7401579
bytes=5191876, time=00:00:02.7959250
bytes=5159785, time=00:00:02.7693224
bytes=5184718, time=00:00:03.0613514
bytes=720401, time=00:00:00.0227767
bytes=5182772, time=00:00:02.7704914
bytes=538456, time=00:00:00.2996142
bytes=246085, time=00:00:00.0003131
Total: 00:00:22.5199268
The strange thing is that reading last 246kb took less than a millisecond, and reading 720kb in the middle took just 22ms.
Is the reading speed 5mb per 3 sec normal? How I can increase the reading speed?
Details.
My application starts PostgreSQL server on startup and shut downs it on exit.
I start server with the following code
public static void StartServer(string dataDirectory, int port)
{
Invoke("pg_ctl", $"start -w -D \"{dataDirectory}\" -m fast -o \"-B 512MB -p {port} -c temp_buffers=32MB -c work_mem=32MB\"");
}
Also, i change the storage type to my column :
ALTER TABLE mytable ALTER COLUMN mycolumn SET STORAGE EXTERNAL;
I use npgsql 3.0.4.0 and PostgreSQL 9.5 on Windows 10
Running here with Npgsql 3.0.8 (should be the same), PostgreSQL 9.5, windows 10 and Npgsql I don't get your results at all:
bytes=3895534, time=00:00:00.0022591
bytes=4085257, time=00:00:00.0208912
bytes=4333460, time=00:00:00.0228702
bytes=4656500, time=00:00:00.0237144
bytes=5191876, time=00:00:00.0317834
bytes=5159785, time=00:00:00.0268229
bytes=5184718, time=00:00:00.0159028
bytes=720401, time=00:00:00.0130150
bytes=5182772, time=00:00:00.0153306
bytes=538456, time=00:00:00.0021693
bytes=246085, time=00:00:00.0005174
First, what server are you running against, is it on localhost or on some remote machine?
The second thing that comes to mind, is you stopping and starting the server as part of the test. The slow performance you're seeing may be part of a warm-up on PostgreSQL's side. Try to remove it and see if the results can be reproduced after several times of running your tests.
Otherwise this looks like some environmental problem (client machine, server machine or network). I'd try to reproduce the problem on a different machine or in a different setting and go from there.

Not able to see the final result after the Reduce function got executed using Windows Azure Storage in MapReduce

I am using c#.net for writing the map and reduce function.I have basically followed the example being given here
Final command
Hadoop jar hadoop-streaming.jar -files "hdfs:///example/apps/map.exe,hdfs:///example/apps/reduce.exe" -input "/example/apps/data.csv" -output "/example/apps/output.txt" -mapper "map.exe" -reducer "reduce.exe"
The Job ran successfully
Now from the Interactive JS mode, if I write
js> #cat /example/apps/output.txt
cat: File does not exist: /example/apps/output.txt
Where as :
js> #ls /example/apps/output.txt
Found 3 items
-rw-r--r-- 3 xxxx supergroup 0 2013-02-22 10:23 /example/apps/output.txt/_SUCCESS
drwxr-xr-x - xxxx supergroup 0 2013-02-22 10:22 /example/apps/output.txt/_logs
-rw-r--r-- 3 xxxx supergroup 0 2013-02-22 10:23 /example/apps/output.txt/part-00000
What is the mistake I am making and how can I see the output?
The -output flag specifies an output folder, not a file. Since there can be multiple reducers, each one will produce a file in this folder.
In this case, you have one reducer, and it produced one file: part-00000. If there were more reducers, they would be named part-00001, part-00002, etc.
The command cat /example/apps/output.txt/part-00000 will display your output. In the future, don't name your output folders something.txt, as that will just confuse you and others :)

Categories