July 18, 2006 Recovering removed files for Trident server
Recently had an incident where some server critical files under the /usr directory were accidentally removed from trident. The following notes detail how the machine was restored.
The first behavior noticed was that many of the command line prompts for trident were coming back 'command not found'. Echoing the environment PATH variable, it was noted that directories in the PATH under /usr were no longer present which was causing many failures in the server processing.
vi /etc/profile
echo $PATH
#to change path
PATH=$PATH:/mypath
export PATH
Looking at the /var/log/secure files, it was noted that a 'sudo rm -rf' had been logged for the missing /usr directories and the user login name, IP and times were documented and a reverse lookup of the IP also helped confirm user identity.
Trying to restore using mondo rescue
The initial restoration approach was to try and use the 'mondo rescue'
http://www.mondorescue.org files stored on /blazer_backup and created by user root via cron called shell scripts listed under /root/scripts/mondobackup.sh . The 20 CD image .iso files which were created by the mondo backup processes were burned to CD and the server was booted from CD using mondo's 'interactive' mode for file restoration. Unfortunately, the burned CD's had problems being read. Was also unsuccessful in trying to get the interactive mode to read .iso images from the hard drive. One must also be
very careful using the interactive mode to not accidentally start any total restoring processes via the yes/no prompts.
A knoppix Live image CD
http://www.knoppix.net was also used to boot from CD into a knoppix session for analyzing some of the server issues. The following command sequence was used to establish write permission privileges on some system files:
mount /dev/hdc2
mount -o remount,rw /dev/hdc2
#see also
fdisk -l
e2label /dev/hdc2
#to change runlevel edit & suffix grub boot choice
#/etc/inittab determines initial runlevel, can also specify init=/bin/myinit
Note that for the trident command line monitor(or a linux command line monitor in general) you may get a line which periodically pops up saying something like
INIT: Id "x" respawning too fast: disabled for 5 minutes
It is safe to ignore the above message as it relates to monitor specific display settings used with the command line session.
Note that you can use
Shift+PgUp to page back through previous command line screens (especially on boot-up where everything may fly by too fast too read).
Note that you can enter
knoppix 2 at the knoppix boot: prompt to get just a command line shell.
Dump / Restore saves the day
After spending
way too much time trying to restore from the mondo rescue .iso files, I was finally able to restore using the binary 'dump' archive files which were also thankfully part of the backup process.
See
dump
http://www.die.net/doc/linux/man/man8/dump.8.html
restore
http://www.die.net/doc/linux/man/man8/restore.8.html
http://howtos.linux.com/guides/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap29sec310.shtml
Also listing some of the sample bash script listed under /root/scripts/level_0_backup.sh called below which demonstrates how the dump is called via a cron job.
echo "starting level 0 backup (usr)"
export DATE=`date +%y%m%d`
export DESCRIP="trident_full_0_usr"
export BACKUP_DEST_DIR="/blazer_backup/"$DATE"."$DESCRIP
mkdir $BACKUP_DEST_DIR
/sbin/dump -0 -A $BACKUP_DEST_DIR/dump.log -f $BACKUP_DEST_DIR/ \
-L $DESCRIP -u -M -z -B 600000 /usr
echo "done level 0 backup (usr)"
Had to copy the below files one by one(seemed to have problems using wild card(*) ) to specify all in the source directory(/blazer_backup/060302.trident_full_0_usr/) to a shared directory (/nautilus_usr2/jerr_tmp_backup) between nautilus and trident.
#on nautilus
cd /nautilus_usr2/jerr_tmp_backup
cp /blazer_backup/060302.trident_full_0_usr/001 . &
cp /blazer_backup/060302.trident_full_0_usr/002 . &
cp /blazer_backup/060302.trident_full_0_usr/003 . &
cp /blazer_backup/060302.trident_full_0_usr/004 . &
#on trident
mkdir /home/jerr_tmp_target
cd /home/jerr_tmp_target
restore rf /nautilus_usr2/jerr_tmp_backup/001
On trident, you'll get a message about decompressing from tape and see the hard drive light start up. For 600 MB archive files, it took about 10 minutes for these to restore each.
at prompt for next archive enter: /nautilus_usr2/jerr_tmp_backup/002
at prompt for next archive enter: /nautilus_usr2/jerr_tmp_backup/003
at prompt for next archive enter: /nautilus_usr2/jerr_tmp_backup/004
You should get returned back to a command prompt and the restored files will be in your new directory(/home/jerr_tmp_target) which you can see using 'ls'. Move (mv) the directories or files back to their appropriate locations as needed.
After all the missing directories were restored the server was rebooted and things were back to normal.
PostgreSQL? table listing
nautilus
sea_coos_obs=# \l
List of databases
Name | Owner | Encoding
---------------+-----------+-----------
GISdb2 | postgres | SQL_ASCII
biology | postgres | SQL_ASCII
buoys | postgres | SQL_ASCII
carocoops buoys
bymeasurand | postgres | SQL_ASCII
castnet | postgres | SQL_ASCII
#earlier meta-door(inactive)
castnet_prod | postgres | SQL_ASCII
castnet_stage | postgres | SQL_ASCII
cpurvis | postgres | SQL_ASCII
cpurvisGIS | postgres | SQL_ASCII
discharge | postgres | SQL_ASCII #USGS water level, etc db for Ed Yu's scripts
fgdc | metadata | SQL_ASCII
jerrtest | postgres | SQL_ASCII
land_data | postgres | SQL_ASCII
metadata | postgres | SQL_ASCII
meta-door database
metadata_dev | postgres | SQL_ASCII #developer db
metadata_test | postgres | SQL_ASCII #test db
mkanoth | postgres | SQL_ASCII
mkanoth2 | postgres | SQL_ASCII
mkanoth_prod | postgres | SQL_ASCII
myGISdb | postgres | SQL_ASCII
phyto | postgres | SQL_ASCII #earlier phytoweb project(inactive)
postgres | postgres | SQL_ASCII
scinstrument | seacoosui | SQL_ASCII #seacoos sensor inventory
sea_coos_dd | postgres | SQL_ASCII #seacoos data dictionary
sea_coos_obs | postgres | SQL_ASCII
seacoosbb | postgres | SQL_ASCII #supports phpbb at http://carocoops.org/bb ?
wls | postgres | SQL_ASCII
carocoops wls; waves; multi_obs SpringDO
nemo
List of databases
Name | Owner | Encoding
----------------------+----------+-----------
etopo2_bathy | postgres | SQL_ASCII
hurricane | postgres | SQL_ASCII
older storm surge try, ran out of room; junk
sea_coos_model | postgres | SQL_ASCII
seacoos model data (elevation,depth avg. currents,particle traj.)
sea_coos_obs | postgres | SQL_ASCII
latest 2 weeks of seacoos data
sea_coos_obs_archive | postgres | SQL_ASCII
data older than 2 weeks
neptune
List of databases
Name | Owner | Encoding
---------------------------------+----------+-----------
db_multi_obs | postgres | SQL_ASCII
monisha ctd
db_multi_obs_alpha | postgres | SQL_ASCII
??? - last data 2006-08-18 to 2006-09-30
db_multi_obs_carocoops | postgres | SQL_ASCII
waves measurements, some met(ignored)
db_multi_obs_carolinas_test | postgres | SQL_ASCII
carolinas coast
db_multi_obs_carolinas_test_aux | postgres | SQL_ASCII
supports some related postgis functions on latest_obs
db_multi_obs_delta | postgres | SQL_ASCII
'archive' CC database
db_xenia_northinlet | postgres | SQL_ASCII
nerrs northinlet station, based on xenia_v1
db_xenia_v2 | postgres | SQL_ASCII
latest development test db
hurricane | postgres | SQL_ASCII
older storm surge try, ran out of room; junk
nws | postgres | SQL_ASCII
think this was initially used to produce air_pressure map for Carolinas Coast, but think this not used anymore as air_pressure map done using GMT directly
sea_coos_obs | postgres | SQL_ASCII
just hourly 'map' tables for seacoos on the temporary round-robin switch-off while nautilus is loading
some support tables for Carolinas Coast
torino
psql -U postgres -d hurricane -h 129.252.139.237
List of databases
Name | Owner | Encoding
-----------+----------+----------
hurricane | postgres | LATIN1
storm surge results; future junk when swith to shapefiles
LATIN1 encoding?
database (psql) fails to connnect on nautilus
If the postmaster goes down or some other reason causes a failure to connect to the database, the following data streams may be effected. Check the older time series graphs to see that the missing data is filled.
seacoos, carolinas coast
http://seacoos.org,
http://carocoops.org/carolinas
You should receive 'WFS counts are low' in most problem cases from Jesse's script running at UNC.
Check that the scout is running ok (removal of all old /tmp/*lock* files or nautilus:usr2/pg_lock or nautilus:/tmp/*lock*). If the scout runs ok, then the data files cover a 48 hour(2 day) span and the scout should catchup on missed observations.
carocoops
http://carocoops.org
For carocoops you'll need to move all missed files(2 days in the example below) back to the nautilus:/usr2/carocoops_data for reprocessing. Buoy filenames start 'Buoy*' and water level stations start 'WLS*'
cd nautilus:/usr2/prod/buoys/processed
find -name '*.dat' -maxdepth 1 -atime -2 -print | xargs -i mv {} /usr2/carocoops_data
File processing is split between the database and the displays(html, graphs).
You can comment out(remember to uncomment when caught up) the processBuoys.pl line in nautilus:/usr2/home/buoy/scripts/job1.sh to speed up the file processing used towards the displays which aren't needed when catching just the database up.
voulgaris waves, longbay SpringDO?
http://carocoops.org/waves
For waves you'll need to move all missed files(2 days in the example below) back to the source directory for reprocessing.
cd nautilus:/usr2/prod/buoys/processed/waves/springmaid
find -name '*.000' -maxdepth 1 -atime -2 -print | xargs -i mv {} /usr2/home/ncsuftp/pub/waves/springmaid
cd nautilus:/usr2/prod/buoys/processed/waves/folly
find -name '*.000' -maxdepth 1 -atime -2 -print | xargs -i mv {} /usr2/home/ncsuftp/pub/waves/follywaves
You can edit the buoy crontab job waves.sh to run every minute instead of 15 minutes to speed the processing up(remember to switch back when finished processing).
For longbay, just touch(gives the file a new timestamp) the effected files to get them to reprocess
http://carocoops.org/longbay/realtime.php
cd nautilus:/usr2/home/ncsuftp/pub/waves/SpringDO
find -name '*.dat' -maxdepth 1 -atime -2 -print | xargs -i touch {}
northinlet?
http://nautilus.baruch.sc.edu/data/nerrs/platform/nerrs_northinlet_water/html/graph.php?time_interval=3_days#water_level:m
http://nautilus.baruch.sc.edu/data/nerrs/platform/nerrs_northinlet_met/html/graph.php?time_interval=3_days#air_temperature:celsius
new daylight savings time change for 2007 onward for linux servers
see
http://www.linuxquestions.org/questions/showthread.php?t=518752 which references the below line:
I went to
http://www.twinsun.com/tz/tz-link.htm and downloaded
ftp://elsie.nci.nih.gov/pub/tzdata2006p.tar.gz
In a temporarary folder as root user:
tar -zxvf tzdata2007b.tar.gz
zic -d /tmp/zoneinfo northamerica
cp /tmp/zoneinfo/EST5EDT /etc/localtime
note that I made a backup of the old /etc/localtime file before copying over it, just in case of problems. You could also use a symbolic link between /etc/localtime and the new EST5EDT file.
#confirm
zdump -v /etc/localtime | grep 2007
should show:
/etc/localtime Sun Mar 11 06:59:59 2007 UTC = Sun Mar 11 01:59:59 2007 EST isdst=0 gmtoff=-18000
/etc/localtime Sun Mar 11 07:00:00 2007 UTC = Sun Mar 11 03:00:00 2007 EDT isdst=1 gmtoff=-14400
/etc/localtime Sun Nov 4 05:59:59 2007 UTC = Sun Nov 4 01:59:59 2007 EDT isdst=1 gmtoff=-14400
/etc/localtime Sun Nov 4 06:00:00 2007 UTC = Sun Nov 4 01:00:00 2007 EST isdst=0 gmtoff=-18000
Debian notes
//====debian notes
==debian install
only need 1st CD, can download rest online after DHCP configured
burn 1st CD at 16x speed
mo mail, smarthost, torino.baruch.sc.edu
vega video adapter (default)
http://users.pandora.be/Asterisk-PBX/InstallDebian.htm
#eth0 networking problem
https://dcse.dell.com/selfstudy/Associates_7_0/Enterprise/PowerEdge/PE2850/printer_friendly.asp
http://www.ubuntuforums.org/archive/index.php/t-66190.html
run base-config ?
aptitude, get-apt
aptitude search <blah> #seems too aggressive on removing packages for install so instead
apt-get install <blah>
//updatedb to setup locate
//install postgis
turn off unnecessary services/ports
//setup rc.local
//rdate,run postgresql on 5432
optimize postgresql, postgresql.conf
test php, usr2 public_html, phpinfo()
add user jcothran /usr2/home dirs
development save script, backup script
*apt
http://en.wikipedia.org/wiki/Advanced_Packaging_Tool
http://www.debian.org/doc/manuals/apt-howto/ch-apt-get.en.html
http://www.debianhelp.co.uk/pkgadm.htm
http://newbiedoc.sourceforge.net/tutorials/apt-get-intro/index-apt-get-intro.html.en
#left hand column reading
http://forums.debian.net/viewtopic.php?p=28880&sid=0c56e94a245760980017a28ce81a6cd8
#misc
http://schwuk.com/articles/tag/linux
#boot startup
https://invaleed.wordpress.com/2006/07/05/debian-starting-up-stuff-at-boot-time-rclocals/
http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch07_:_The_Linux_Boot_Process#Table_7-1_Linux_Runlevels
#2.6 vs 2.4 kernel
http://ask.slashdot.org/article.pl?sid=05/08/30/2314258
#see last installed directories for apt-get
ls -sort /usr/share/doc
dselect update #seemed to shake up apt-get so not so many removed with rdate, etc
http://wiki.linuxquestions.org/wiki/Control_keys
psql template1 as postgres
==
http://raz.cx/blog/2006/07/ssh-debian-and-system-bootup-in.html
If an Debian ssh server is returning a "System bootup in progress - please wait" message to clients connecting during server boot, then the problem is that the server's /etc/default/rcS contains DELAYLOGIN=yes but needs to contain DELAYLOGIN=no
http://www.debian-administration.org/articles/165
mount -t smbfs //accord.asg.sc.edu/accord_backup /accord_backup -o username=backup,password=b@ckup,workgroup=ACCORD
http://www.ma.utexas.edu/users/stirling/computergeek/NFS_samba.html
mount 129.252.37.88:/usr2 /nautilus_usr2
http://semweb.weblog.ub.rug.nl/node/61
change pg_hba.conf to 'local all trust'
/usr/lib/postgresql/8.1/bin/pg_ctl -D /var/lib/postgresql/8.1/main reload
==
#kernal - use later like 2.6 smp - > 2GB RAM
http://www.linuxforums.org/forum/debian-linux-help/49827-only-1gb-2gb-new-ram-i-installed-showing.html
http://www.debian-administration.org/users/sebastian/weblog/12
http://www.debian.org/doc/manuals/reference/ch-kernel.en.html
#from
http://aruljohn.com/info/kernel.php
apt-get install kernel-image-2.6.17-6-686 kernel-source-2.6.17 kernel-headers-2.6.17-6-686
cd /usr/src
tar xjvf kernel-source-2.6.17.tar.bz2
rm linux
ln -s kernel-source-2.6.17 linux
#used kernel-image-2.4.27-2-686-smp
apt-get install kernel-image-2.4.27-2-686-smp kernel-source-2.4.27 kernel-headers-2.4.27-2-686
had to increase kernel shmmax,shmall settings for postgres
cpan install for perl modules
#example install of package 'DBI::Pg'
sudo perl -MCPAN -e 'install DBI::Pg'
#force install - if needed
sudo perl -MCPAN shell
cpan>force install blah::blah
Partition full leads to database shutdown and PANIC because of incomplete clog
The server partition which hosted the postgres data files became full resulting in an incomplete log and a message like below when trying to restart postgres.
PANIC: could not access status of transaction 14286850
DETAIL: could not read from file "/usr2/pg_data/data/pg_clog/065A" at offset 163840
Answer found at
http://groups.google.com/group/comp.databases.postgresql.hackers/browse_thread/thread/c97c853f640b9ac1/d6bc3c75eed6c2a4?q=could+not+access+status+of+transaction#d6bc3c75eed6c2a4
dd if=/dev/zero bs=8k count=1 >>/usr2/pg_data/pg_clog/065A
fixed the clog so that postgres could restart
to top