Postgres 9.1 issues running data directory from VMware shared folder

Discussion:

Arze, Cesar

2014-08-26 22:08:14 UTC

Hi,

Iâve recently encountered an issue running Postgres (both 8.4 and 9.1) on a VMware VM running Ubuntu 10.04 LTS as the guest OS with the data directory running out of a VMware shared folder. Previously on 8.4 this had worked out for me but after upgrading VMware and re-building my VM Iâve started to encounter this issue. It seems like the problem occurs when I run initdb, I get the following error:

# sudo -u postgres /usr/lib/postgresql/9.1/bin/initdb --noclean -D /mnt/pg_data/
Running in noclean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale en_US.UTF-8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "english".

fixing permissions on existing directory /mnt/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /mnt/pg_data/base/1 ... FATAL: could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory

child process exited with exit code 1
initdb: data directory "/mnt/pg_data" not removed at user's request

Here is a snippet of an strace around where the error occurs:

write(4, "insert OID = 767 ( lo_import 11 "..., 141) = 141
write(4, "insert OID = 765 ( lo_export 11 "..., 132) = 132
write(4, "insert OID = 766 ( int4inc 11 10"..., 125) = 125
write(4, "insert OID = 768 ( int4larger 11"..., 134) = 134
write(4, "insert OID = 769 ( int4smaller 1"..., 136) = 136
write(4, "insert OID = 770 ( int2larger 11"..., 134) = 134
write(4, "insert OID = 771 ( int2smaller 1"..., 136) = 136
write(4, "insert OID = 774 ( gistgettuple "..., 142) = 142
write(4, "insert OID = 638 ( gistgetbitmap"..., 144) = 144
write(4, "insert OID = 775 ( gistinsert 11"..., 158) = 158
write(4, "insert OID = 777 ( gistbeginscan"..., 151) = 151
write(4, "insert OID = 778 ( gistrescan 11"..., 155) = 155
write(4, "insert OID = 779 ( gistendscan 1"..., 137) = 137
write(4, "insert OID = 780 ( gistmarkpos 1"..., 137) = 137
write(4, "insert OID = 781 ( gistrestrpos "..., 139) = 139
write(4, "insert OID = 782 ( gistbuild 11 "..., 143) = 143
write(4, "insert OID = 326 ( gistbuildempt"..., 143) = 143
write(4, "insert OID = 776 ( gistbulkdelet"..., 158) = 158
write(4, "insert OID = 2561 ( gistvacuumcl"..., 155) = 155
write(4, "insert OID = 772 ( gistcostestim"..., 187) = 187
write(4, "insert OID = 2787 ( gistoptions "..., 139) = 139
write(4, "insert OID = 784 ( tintervaleq 1"..., 138) = 138
write(4, "insert OID = 785 ( tintervalne 1"..., 138) = 138
write(4, "insert OID = 786 ( tintervallt 1"..., 138) = 138
write(4, "insert OID = 787 ( tintervalgt 1"..., 138) = 138
write(4, "insert OID = 788 ( tintervalle 1"..., 138FATAL: could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
) = -1 EPIPE (Broken pipe)

I probably should be posting to the VMware mailing list with this question but I wanted to see if anyone had any insight or suggestions here. Iâve seen many similar issues but none of the solutions proposed there worked for me.

Thanks for any help,

Cesar

Adrian Klaver

2014-08-26 23:30:24 UTC

Permalink

Hi,
I’ve recently encountered an issue running Postgres (both 8.4 and 9.1)
on a VMware VM running Ubuntu 10.04 LTS as the guest OS with the data
directory running out of a VMware shared folder. Previously on 8.4 this
had worked out for me but after upgrading VMware and re-building my VM
I’ve started to encounter this issue. It seems like the problem occurs

So what is the host OS?

Where is the shared directory located, which OS?

What file system is the directory located on?

# sudo -u postgres /usr/lib/postgresql/9.1/bin/initdb --noclean -D /mnt/pg_data/
creating template1 database in /mnt/pg_data/base/1 ... FATAL: could not
open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No
such file or directory
child process exited with exit code 1
initdb: data directory "/mnt/pg_data" not removed at user's request

What is in /mnt/pg_data after the error?

Thanks for any help,
Cesar

--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Arze, Cesar

2014-08-27 00:36:42 UTC

Permalink

So what is the host OS?

Where is the shared directory located, which OS?

What file system is the directory located on?

Host OS is Redhat 5.4

The shared directory is located on the host OS of Redhat 5.4 and is located on a local drive of the desktop.

The directory is on an ext3 filesystem.

What is in /mnt/pg_data after the error?

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 base

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 global

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_clog

-rw------- 1 postgres postgres 4476 2014-08-26 21:58 pg_hba.conf

-rw------- 1 postgres postgres 1636 2014-08-26 21:58 pg_ident.conf

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_multixact

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_notify

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_serial

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_stat_tmp

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_subtrans

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_tblspc

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_twophase

-rw------- 1 postgres postgres 4 2014-08-26 21:58 PG_VERSION

drwx------ 1 postgres postgres 4096 2014-08-26 21:58 pg_xlog

-rw------- 1 postgres postgres 19169 2014-08-26 21:58 postgresql.conf

Post by Arze, Cesar
Hi,
Iâve recently encountered an issue running Postgres (both 8.4 and 9.1)
on a VMware VM running Ubuntu 10.04 LTS as the guest OS with the data
directory running out of a VMware shared folder. Previously on 8.4 this
had worked out for me but after upgrading VMware and re-building my VM
Iâve started to encounter this issue. It seems like the problem occurs

So what is the host OS?

Where is the shared directory located, which OS?

What file system is the directory located on?

Post by Arze, Cesar
# sudo -u postgres /usr/lib/postgresql/9.1/bin/initdb --noclean -D /mnt/pg_data/
creating template1 database in /mnt/pg_data/base/1 ... FATAL: could not
open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No
such file or directory
child process exited with exit code 1
initdb: data directory "/mnt/pg_data" not removed at user's request

What is in /mnt/pg_data after the error?

Post by Arze, Cesar
Thanks for any help,
Cesar

--
Adrian Klaver
***@aklaver.com

Steve Atkins

2014-08-27 00:33:20 UTC

Permalink

I probably should be posting to the VMware mailing list with this question but I wanted to see if anyone had any insight or suggestions here. I’ve seen many similar issues but none of the solutions proposed there worked for me.

This might not be what you're seeing, but there was a hideous bug in the shared folder (hgfs) driver for linux guest OSes that'll silently corrupt your filesystem if it's accessed via more than one filehandle (e.g. multiple opens, multiple processes, ...).

If you're using vmware tools bundled with workstation 10.0.1 or fusion 6.0.2, you have that bug and cannot safely use hgfs mounts for any files, let alone postgresql. (There was a different bug, with similar results, for earlier versions too, including at least fusion 5.0.1). VMWare claim it's fixed in the tools bundled with 10.0.2 / 6.0.3 (I've not tested it). If you're not running the very latest vmware, upgrade to it and install the latest tools (or avoid using hgfs).

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Arze, Cesar

2014-08-27 00:51:24 UTC

Permalink

Thanks for the info, will look into what version of Workstation I am running (think I have 9.0) and will see if I canât get an upgraded copy and see if it alleviates the issue.

Post by Arze, Cesar
I probably should be posting to the VMware mailing list with this question but I wanted to see if anyone had any insight or suggestions here. Iâve seen many similar issues but none of the solutions proposed there worked for me.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

2014-08-27 03:05:54 UTC

Permalink

Post by Arze, Cesar
Thanks for the info, will look into what version of Workstation I am
running (think I have 9.0) and will see if I can’t get an upgraded
copy and see if it alleviates the issue.

also, there's several years of patches since RHEL 5.4 was released, I
think its up to 5.9.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Arze, Cesar

2014-08-27 03:18:22 UTC

Permalink

My mistake, the host OS is RHEL 5.9

Post by Arze, Cesar
Thanks for the info, will look into what version of Workstation I am
running (think I have 9.0) and will see if I canât get an upgraded
copy and see if it alleviates the issue.

also, there's several years of patches since RHEL 5.4 was released, I
think its up to 5.9.
--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jacob Bunk Nielsen

2014-08-29 11:11:43 UTC

Permalink

Post by Arze, Cesar
creating template1 database in /mnt/pg_data/base/1 ... FATAL: could
not open file "pg_xlog/000000010000000000000001" (log file 0, segment
1): No such file or directory

We've seen something slightly similar when running PostgreSQL in a Linux
container. See this thread for more details:
http://www.postgresql.org/message-id/spamdrop+***@atom.bunk.cc

We have not solved this problem yet, but currently I'm leaning towards
blaming the container layer, so next time we experience problems I think
we'll try to remove the virtualization.

Best regards

Jacob

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general