crazy little thing called love: 2008

Wednesday, August 27, 2008

First time SNMPv3 ... ProCurve Switch 2650 as example

I'm fairly new to network stuff and thus network management, so snmp too ... is fairly new to me, not to mention snmpv3 :) ...

After some digging around and efforts to understand what snmp is ... it turns out that snmp (v2) was simply a list of variables that can be read by supplying a password (that password is called public community name), and also read and write by supplying a password too (this one is called private community name) ...

It was clear that snmp version 1 and 2 had serious security issues. so when i found this HP ProCurve Switch 2650 that supports snmpv3, i decided to play around with it!

The game was not as straight forward as i thought! it was not only security enhancements that snmpv3 introduced, but rather a more complex and robust authorization and permissions system.

So, lets start with describing the hands on with this switch, i telnet to the switch , i enter it without a password, i switch to enabled mode and i set the password for operator and manager (enabled).

Now, i enabled SNMPv3 by doing :

ProCurve Switch 2650(config)# snmpv3 enable

and the switch created a user called "initial" and used authentication protocol MD5 and asked for authentication password. it set the privacy protocol to DES and asked for privacy password (ill talk a little more about those in a minute). afterwards i was asked if i want to create a user with SHA authentication protocol, i chose not to.

Now from the Linux shell, i used snmpwalk to test my settings, following snmp v2 syntax, i tried:

snmpwalk -v 3 -c MyCommunityName 192.168.254.1 sysUptime

and i got:

snmpwalk: No securityName specified (Sub-id not found: (top) -> sysUptime)

So fiddling a little more around, i would need to the user name (securityName), i found that in the snmpcmd manual pages, so next i tried was this:

snmpwalk -v 3 -u initial -c MyCommunityName 192.168.254.1 sysUptime

and i got:

Error in packet.
Reason: authorizationError (access denied to that object)
Failed object: SNMPv2-MIB::sysUpTime

So the authorization is the problem, looking for password to send got me to the -A option, also from the snmpcmd man pages, which is used to pass the authPassword, and the man page says its insecure to specify pass phrase on the command line, but i'll leave it for now, so i try:

snmpwalk -v 3 -u initial -A password123 -c MyCommunityName 192.168.254.1 sysUptime

but i still got the error:

Error in packet.
Reason: authorizationError (access denied to that object)
Failed object: SNMPv2-MIB::sysUpTime

Now i was a little frustrated, this looked as enough to get things to work! and i couldn't see why it wasn't! so fiddling more around and googling for examples of snmpwalk -v 3 syntax i got one that got things going! and here it is:
snmpwalk -v 3 -u initial -A password123 -l AuthNoPriv -c MyCommunityName 192.168.254.1 sysUptime

So what is the stroy with this -l AuthNoPriv ? again, the man pages came to rescue, according to the man pages:

 -l secLevel
              Set  the  securityLevel  used  for  SNMPv3  messages  (noAuthNoPriv|authNoPriv|authPriv).  Appropriate pass
              phrase(s) must provided when using any level higher  than  noAuthNoPriv.   Overrides  the  defSecurityLevel
              token in the snmp.conf file.

So it seems that this option tells the snmpv3 server that we are using the Auth password but not the privacy pass phrase, which reminds me with the 2 passwords i was asked for when creating the user "initial"! although i didn't understand why snmpwalk didn't guess that this is what i wanted by passing the authPass using the -A option :S. anyway, i was happy things worked for me ... for now!

So apparenly, the default security level would be (since i dont have snmp.conf file) (according to snmp.conf man page) noAuthNoPriv! which made me try and do the following:

snmpwalk -v 3 -u initial -A password123 -l AuthPriv -c MyCommunityName 192.168.254.1 sysUptime

and i got the error:
snmpwalk: USM generic error (Sub-id not found: (top) -> sysUptime)

The error was not really meaningful to me, but logically i had to supply the pricy pass phrase, again man snmpcmd came to rescue, and the option to supply the privacy pass phrase is -X, so now i try to do :

snmpwalk -v 3 -u initial -X password321 -A password123 -c MyCommunityName -l AuthPriv 192.168.254.1 sysUptime

And viola! it works :) And viola! i think i have a very good post about snmpv3 ! frankly i had hard time finding quick info about the errors i got in google, so if this info helped you, and you feel thankful, i would be thankful to you if you google a little about palestine, about the separation wall and the injustice its causing !

Oh! the private pass phrase is apparently used to secure communication, so its a good idea to use it !
That was it for today, and i think ill go crash into my pillow :) and apologies for the politics .

Tuesday, August 12, 2008

Windows for Linux Administrators I

I've never been a fan of M$ Windows, but lateley im forced to deal with Windows, So since im having a lot of trouble doing even simple things, i've decided to write a few notes i've learned that would help those who are familiar with linux to administrate windows machines, but you should also know that im not an expert in either systems, any any information provided here is my own interpretation of similarities between these two systems.

First, lets put a list of commands and their equivilent that i learned recently, i wont be talking about "dir" and "rem".

Starting with the ugly "cmd" of windows, we can see that we can use the command "set" to display environment variables, which is the same as what we have in Linux! nice! thats a good starting point. and and interesting example of how things might be a little diffrent, lets try to print the current directory, in linux we would simply type pwd, in windows, you'll do "echo %CD%", where %CD% is an environment variable that holds the "Current Directory".

Variables in windows are put between percentage signs , %VARNAME% and are not case sensetive.

Now the first thing i wanted to do was listing users i have and gather info about them, but in my case, the machine i have access to, via rdestop, is an Active Directory server, which i hope to be able to switch to SAMBA 4 when the later is ready ... so ... how do we do that in active directory?

the command to do so is called dsquery, it stands for "Directory Service Query" and is one tool from the "Directory Service" tools suite that comes with windows 2003, i dont know about older versions.

Now ill try to see how it works, so i look at :

dsquery /? | more

Looks unixish ... heh :), reading a little there i managed to list users (first 100) using the command:

dsquery user

and then i filtered out myself using the command find! an interesting tool that provides similar functionality to some unix tools. lets suppose my name was "Edward Saeed", i do :

dsquery user| find "Edward Saeed"

but note that you Have to use the quotes, and the string IS case sensitive ... inconsistency ... i belive ... but i could use /I to make it case insensitive!

So now we know that find "something" is similar to the "grep" command in unix!

reading find /? shows that find can also count lines! so find /C "something" is equivilent to "grep "something" | wc -l" . thats good to hear ... who can live without grep and wc :)

Thats it for today :), i'll be packing and hitting the road .

Maybe next time ill be trying to rewrite a few bash scripts in this windowish fashion.

;)

Saturday, August 2, 2008

all printable ASCII characters in one c++ like string

Today i was looking for a string with all printable ASCII characters for usage in some C++ code, i could not find one quickly on google, so i though ill post it here :) I will probably need it sometime in the future ... here it goes :)

char fullset[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&'()*+,-./:;<=>?[\\]^_{|}~";

Monday, June 23, 2008

Reverse DNS records for subnets larger than 24

I've been playing lately with DNS, and it seems there is a need to create zones with less than 255 ip ... so i had to dig a little around, i took a look at mkrdns tool and RFC2317, i wrote a little php script that might be useful to understand what should be done, without deploying automation tool ... the thing is ... i cant test it :(, but anyway, ill post it here, so please, if you have any comments, if you find anything that is incorrect, or any enhancements, please let me know.




<?php
/*
* Author: Maysara A. Abdulhaq
* Contact: maysara(dot)abdulhaq(at)gmail(dot)com
* Usage: Guides howto add revese domains
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*/
function reverse_domain_name($ip,$class=24)
{
 $octets= explode(".",$ip);
 $octetcount=(int) ($class/8);
 for ($i=$octetcount -1 ;$i >= 0 ;$i--)
  $dn= $dn."$octets[$i].";
 $dn= $dn."in-addr.arpa";
 $mask= (~((1<<(8-$class%8))-1)&255);
 $rangebeg = (((int)$octets[$octetcount]) & $mask);
 $rangeend = (((int)$octets[$octetcount]) & $mask)+((1<<8-$class%8) -1);

 echo "You have Entered IP: ".$ip." with subnet: ".$class."\n";
 echo "So the reverse domain is : ".$dn."\n";
 echo "and the range of ips is $rangebeg to $rangeend\n";
 echo "If you wish to use a domain for mapping similar to mkrdns, you can use domain name \n";
 echo "A: $rangebeg-$rangeend.$dn\n";
 echo "B: $rangebeg.$dn\n";
 echo "Or, similar for what is suggested in RFC2317:\n";
 echo "C: $rangebeg/$class.$dn\n";

 echo "for each entry, a CNAME record must be added to $dn \n";
 echo "A: $octets[$octetcount]  CNAME $octets[$octetcount].$rangebeg-$rangeend.$dn\n";
 echo "B: $octets[$octetcount]  CNAME $octets[$octetcount].$rangebeg.$dn\n";
 echo "C: $octets[3] CNAME $octets[3].$rangebeg/$class.$dn\n";
 echo "in addition to the PTR in the above mentioned domain name\n";
 echo "$octets[3] PTR some.domain.tld\n";
 
}

 if ( $argc != 3){
  echo "Usage: $argv[0] IP MASK\n";
  echo "Example: $argv[0] 111.222.121.212 27\n";
  exit(1);
 }
 if ( $argv[2] < 24){
  echo "Error:the mask should be larger than 24\n";
  exit(1);
 }
 reverse_domain_name($argv[1],$argv[2]);

?>

Thursday, May 8, 2008

what the hell is a LUN

Well, playing around with iSCSI i stumbled with LUN ... and according to wikipedia, LUN stands for Logical Unit Number, and its apparently related to scsi and/or raid, and apparently they are logical portions of disks, still what does that mean ?

Well, after looking around a bit, and finding a lot of talking about LUN and LUN explained ... etc, it turns out that from the iscsi-initiator host OS (client) point of view, a LUN is simply a disk ... every LUN regardless of being the only one in a given iscsi-target or whether there are multiple LUNs in an iscsi-target, its a disk, and if things are configured correctly, every LUN will give you a disk device like /dev/sdx or something similar .

So why have multiple targets each with one LUN or have multiple LUNs in one iscsi-target? again, as far as i understood from this very helpful post, from the client (initiator) perspective, its the same thing ... all are disks. just accessed with seperate iscsi-initiator procedure or with a single one, from server (target) perspective, distributing LUNs is just a managerial issue, and seperating LUNs into multiple targets helps the administrator to make policies about who is allowed to access which LUNs by distributing them among targets, and possibly setting authentication.

So ... what does all the talking about using LVM and software raid and other stuff all about ? well, after understanding that the LUN is somehow simply a disk to the client, you can do with it whatever you do with a disk! or multiple disks :) ... so you can format the LUN (now its a disk, like /dev/sdf) or you can use LVM to create Logical volume from multiple disks, which might be iscsi LUN or just old good regular disks :), or setup software raid , again, all this can be done on regular disks, scsi or sata or ata ... and now on LUNs too, you basically stop talking about LUN after iscsi-initiator is done with logging to the target and getting the LUN, afterwards, you treat it as a disk.

Now possibly you have RAID and LVM ...etc on the server side ... whats the story there ? and whats the strory with the word "carved" that is used in a couple of places online when talking about LUNs? ... the idea if fairly simple, when you have LUN on the target, you define where the real storage is going to happen, so on the target you define Path to where the actual storage will happen so you say for example Path=/iscsi-exports/2GBDISK1.iso or Path=/dev/sdc or Path=/dev/sda5 ...etc. as you can see you can put any file that you usually use for read and/or write to, and as you might already know, even regular files can be treated as disk volumes and you can mount them using a loop device. so these partions, disks, regular files or even iscsi LUN from a diffrent iscsi-target (which would appear as devices) can be used for a LUN, now these files, partions, disks ..etc, can reside on a hardware raid, or be part of an LVM, or software raid or anything else you can think of you can store data on, and might as well be a regular disk partion or a whole disk that is used as a LUN :).

Note:please feel free to correct me if there is something wrong or unclear in this post

Oracle RAC on ISCSI ... the small details ... round two

Now im starting over with installing oracle 11g RAC on iSCSI, the first round (actually that was probably the 5th round, but first round i write about) failed due lack of A)storage on iscsi target B)too little memory on one of the nodes.

So now i have a new iSCSI target setup on debian etch from backports, and im setting up a new powerful node to serve as a node in the Oracle RAC.

So i installed Enterprise Linux fine and i am following the same steps described in the "round one" post.

While trying to configure the iSCSI target, a question poped into my mind, what diffrence would it be if i configure a single target with multiple LUN or if i configure multiple targets each with single LUN, i have no idea what diffrence it makes.

So after a while of digging arond, i realized that the script provided from oracle howto will only work properly if you have one LUN in each target, but i had multiple LUNs in my iscsi-targets to have more space ... so i had to modify the script that returns the names of devices for persistant naming, here is the diff:

--- iscsidev.sh.bak 2008-05-10 19:44:49.000000000 +0300
+++ iscsidev.sh 2008-05-10 19:44:25.000000000 +0300
@@ -4,6 +4,7 @@
 
 BUS=${1}
 HOST=${BUS%%:*}
+LUN=${BUS##*:}
 
 [ -e /sys/class/iscsi_host ] || exit 1
 
@@ -16,5 +17,5 @@
    exit 1
 fi
 
-echo "${target_name##*.}"
+echo "${target_name##*.}LUN$LUN"

Sunday, May 4, 2008

traffic control ... learning tc

Well, what do i want to do today, i want to try to distribute traffic in a fair way between clients on my network, so it seemed as a starting point i should start playing with tc.

So i started reading tc man page thoroughly, after a while of playing around with it i tried creating a qdisc using
tc qdisc add dev eth1 root pfifo
notice that anything less didnt work with me ! that went fine, now i want to view what i did, and the command tc qdisc show dev eth1 did it fine, now i tried to delete the qdisc i created using tc qdisc del dev eth1 root and it went well too.

Now i tried tc qdisc show dev eth1 and apparently there is a qdisc there!! i try to delete that but i get RTNETLINK answers: No such file or directory !!! later i realized this is probably a mandatory qdisk for every interface ... although found nothing about it in the man page, but a quick google search for "default qdisc" yielded this very informative link.

Now, what i understand so far that we have 3 elements working together, qdisc, class and rule ... and to put it in simple words, i think that qdisc ( a general framework for what we are talking about ) classifies packets into classes using some rules, and decides later according to the classification which packets should go out to the network interface.

As Expected, a year and a half later (sep 2009), i get back to reading on the topic, and i have no clue where i got the word rule! the word is "filter"!

this time i have used tc actually, i used iptables to "mark" certain packets, which then a tc "filter" captured them and sent them to a "class" that limits the bandwidth !!

Saturday, May 3, 2008

Setting up iSCSI target on Debian etch

Todays adventure is setting up iSCSI target on debian !

First off, a normal installation of debian etch, using the defaults and nothing particularly special, afterwards, i search google for a debian package of iscsi target, and well, i find debian provides one for testing, one for unstable and one in backports :D yay, i quickly add the debian backports repository, and try to install it by doing apt-get install iscsitarget, and i get the following

Setting up iscsitarget (0.4.15+svn145-1~bpo40+1) ... Starting iSCSI enterprise target service: FATAL: Module iscsi_trgt not found. netlink fd: Connection refused failed.

i looked this file using apt-file and googled around a little, but found nothing particularly helpful, but i noticed that there is a package called iscsitarget-source, which is not usual unless the source is needed to install a package, other wise source is usually provided seperately, so i apt-got it and there you have it in /usr/src/iscsitarget.tar.bz2, i extract the archive. now what, i see no doc or README in that archive, i look back at the packs installed, and notice that debian got module-assistant as a dependency for iscsitarget-source ... sure thing, thats the way to go, so i run module-assistant, try to select iscsitarget to build, and then fall back to the main menu and do "prepare" which gets a bunch of new packages and installs them.

Back to modules-assistant ... select ... iscsitarget ... ok ... build .... build fails!! :( i look at the errors, the first one i can see was:

CC [M] /usr/src/modules/iscsitarget/kernel/tio.o
In file included from /usr/src/modules/iscsitarget/kernel/tio.c:7:
/usr/src/modules/iscsitarget/kernel/iscsi.h:236: error: field ‘rx_hash’ has incomplete type

nothing before seems to be relevent to the error or triggering it, so i started asking around, and there you go , #debian-backports on oftc irc network guys tell me that it needs a newer kernel, and so does /usr/share/doc/iscsitarget-source/README.Debian which i didnt read before, so i need to install a newer kernel, which i still find in etch-backports. so i lookup which kernels are available and i do apt-get install linux-image-2.6.22-4-686 and linux-headers-2.6.22-4-686 and reboot.

now back to module-assistant, select, build, install,exit ... /etc/init.d/iscsitarget start ... all starts successfully :) now i can go ahead and play with the /etc/ietd.conf file (iscsi enterprise target daemon conf file).

Note:I wonder ... why would iSCSI target have anything in the kernel space ?

Sunday, April 27, 2008

IP takover, IP failover, how it works ...

I've been interested lately in ip failover of Linux/FreeBSD machines, and been taking a look from time to time at HA linux webpage, frankly i didn't find any document how it is done "the right way" or "how it is usually done" ... probably this is due to the fact that i usually pick up the wrong keywords to google for.

Anyway, i decided ill try to understand how it works and maybe implement something that does it, i had the theory that it should work using a ping test and ifconfig for an alias ip on the interfaces that i want them to failover for each other, and i might to reload some services with new configs, anyway, i dont really know, and i decided to start looking at how this could be done, and as usual writing a post over the span of multiple days ... or weeks :)

The quick search on google brought this page which a little old (written in 2003) but made me feel good since it very much described what i thought is the idea behind ip takeover, but with some very important additions, mainly send_arp and that there must be a mechanism to bring things to normal when the other ip starts to respond again.

Fiddling in the net for a while, i found a very nice article/document, probably the best i found so far, you can see it on here on linux-ha.org, it mentions a software called fake, so i quickly searched the packages provided by ubuntu, and found package fake, i installed it (apt-get install fake) and looked up where it resides, i found it in /usr/sbin/fake and it turns out to be a bash script! (YAY) ... having a quick look at it shows that its using the above mentioned send_arp ... its nice when you find consistent information on the net ;)

So the basic idea is that you have an ip you wish to keep up, lets say IP 192.168.1.1 ... and you have two boxes, A and B, each will have an IP, lets say A has 192.168.1.10 and B has 192.168.1.20 , what you do is that each box monitors the other box (lets say via ping or arping ..etc), Initially you have 192.168.1.1 on box A along side with 192.168.1.10 , whenever box B stops getting response from box A (on ip 192.168.1.10), it runs ifconfig to acquire 192.168.1.1 ! and additioally any other desired scripts and services, whenever 192.168.1.10 starts responding again (box A is up again, and its the default machine for 192.168.1.1), box B brings 192.168.1.1 down in order to allow box A to respond to requests again ... tada !

Saturday, April 5, 2008

Oracle RAC on ISCSI ... the small details

Lately i've been trying to install oracle 11g RAC on ISCSI (following this document from oracle ... a very good document ; i have to say.) but with little success, now i think ill be sharing what i've been trying to do ... other than following the document.

Starting with debian etch, moving to RHES, Centos and now trying with Enterprise Linux (Oracle Unbreakable), i figured out how much i hate rpm package management, how annoying RH key thing is ... well, maybe this is because i come from .deb based distro, or simply a personal preference.

Anyway, after installing the OS according to the steps described in the above mentioned document, i had to install 18 packages, and i really didn't want to search each and every one of them and install them manually, so i did the following:

* i copied the iso images, mounted them, copied rpms, created a repo, and installed them all in one shot, here is how to do this quickly :

#create directories to mount the 4 iso images i have mkdir -p /mnt/cd{1..4} #mount the iso images (located /root in my case) loopid=1 for i in `ls /root/Ent*`; do mount $i /mnt/cd$loopid -o loop=/dev/loop$loopid loopid=` expr $loopid + 1` sleep 2 done #copied the rpm files to create a local repo mkdir /tmp/repo find /mnt/ -name *.rpm -exec cp {} /tmp/repo/ \; #now generate file list ( i suppose its called that way!) createrepo /tmp/repo #add it to yum repos[localrepo2] cat << EOF > /etc/yum.repos.d/handcrafted2.repo [localrepo2] name=Enterprise releasever - My Local Repo baseurl=file:///tmp/repo enabled=1 gpgcheck=0 #gpgkey=file:///path/to/you/RPM-GPG-KEY EOF #update yum repo yum update

Question:anyone knows what the story with the 8 loop devices limit ?
now, lets get to install the 18 + iscsi-initiator packs, here are their names (in a friendly way):

binutils compat-libstdc++-33 elfutils-libelf elfutils-libelf-devel glibc-2.5 glibc-common-2.5 glibc-devel-2.5 gcc gcc-c++ libaio libaio-devel libgcc libstdc++ libstdc++-devel make sysstat unixODBC unixODBC-devel iscsi-initiator-utils

you can as well add iscsi-initiator-utils package, which will be needed later when configuring iscsi on the nodes.

all you have to do is to run "yum install [paste line here]", or you can copy it to a file, oracle-rac-pack-list for example, and run "yum install `cat oracle-rac-pack-list`", if for some reason (i had one!) you want them every package on a seperate line, just run "cat oracle-rac-pack-list | sed -e 's/ /\n/g' ".

Now we have the packs installed on the first node, what about the others, i decided not to repeat the steps above, so i thought about using sshfs, but it quickly turns out that its not included as a package with Enterprise Linux 5.1, i've decided to copy needed packs to the other nodes, so i did:

scp `cat oracle-rac-pack-list | sed -e 's/ /-\[0-9\]\*\n/g' | xargs -l find /mnt/ -name ` myusername@remote-ip-addr:~

Note:make sure there is a space at the end of the line in the file

but this apparently was not enough, missing dependencies ! i thought ill cut the crap and get them once into a file and post it here, and ill copy it later from here if i need it ... i don't like the solution, but i need to get going. so the list that emerged was the following packages:

libgomp glibc-headers elfutils-libelf-devel-static kernel-headers

Now you can use the same line from above to copy the files , just replace oracle-rac-pack-list with a file name containing the missing dependencies files.
Note:make sure there is a space at the end of the line

on the remote machine, go the user home directory (or to what ever empty directory you copied the .rpm files to) and run "rpm -i *.rpm", well you might have to delete a package glibc-2.5-18.iXXX.rpm since you will have 2, one for 386 and one for 686... pick one according to your arch. (this section is ugly! any suggestions how to do it in a cleaner way are very welcomed).

Next i configured the iSCSI initiator, i already had the iscsi target configured on a debian machine, so i wont go into how i did it here, maybe some other time, ive checked if i can see the volumes i created on it using the command:

iscsiadm -m discovery -t sendtargets -p iscsi-storage-priv

apparently i could see them, so i wanted to login to them, instead of writing multiple lines, i decided to use the output of the previous command, so i did:

iscsiadm -m discovery -t sendtargets -p iscsi-storage-priv| cut -f2 -d\ | xargs -l -I '{}' iscsiadm -m node -T {} -p XX.XX.XX.XX -l

notice that in the last command ive put the IP, for some reason using the hostname which was defined in /etc/hosts did not work (!), and of course, using iscsiadm discovery results assumes that you the volumes defined are all for your current services or your oracle nodes, meaning if you are using the iscsi server for other things, you will need to run your commands one by one for the desired volumes, or use grep of there is something in the names used explicitly for your oracle RAC storage.

Now to add them also quickly to be automatically targeted when system starts up, also similar to the above command i did:

iscsiadm -m discovery -t sendtargets -p iscsi-storage-priv| cut -f2 -d\ | xargs -l -I '{}' iscsiadm -m node -T {} -p XX.XX.XX.XX --op update -n node.startup -v automatic

Now getting the ocfs2! again, i hate rpm, and no, i dont want to set a local repo on two nodes, and so far, still sticking with copying the required rpms, here are the packs and dependencies:
ocfs2-2.6.18-53.el5 ocfs2-tools ocfs2-tools-devel e2fsprogs-devel glib2-devel ocfs2console

Now comes the OCFS2 configuration step, using ocfs2console tool, the configuration goes well on the first node, no problems, on the second node, i get an error message:

Could not start cluster stack. This must be resolved before any OCFS2 filesystem can be mounted

The search now begins for why this happens, i thought ill start by checking the sysctl values, so i do sysctl -a on both nodes and compare the results, i see nothing abnormal, anyway, i decide to copy the sysctl.conf file from the first node to the second one and use it, run sysctl -p, and thats what i did, still same problem.

checking /var/log/messages yields and interesting error:

modprobe: FATAL: Module ocfs2_nodemanager

but why, same packs should be on both nodes! so i started checking if i have different packs using "yum list| grep installed|wc" ... i have 726 packs on the "working node" and 737 packs on the "failing node", i need to do a comparison, so i create two lists and see the diffrence! the diffrence was that for somereason i didnt have ocfs2 kernle module installed on the second node, so installing ocfs2-2.6.18-53.el5.i686 did it! and the ocfs2console config completed successfully... i think :)

Installing asm stuff went fine, i installed oracleasm-support-2.0.4-1.el5.i386.rpm , oracleasm-2.6.18-53.el5-2.0.4-1.el5.i686.rpm and oracleasmlib-2.0.3-1.el5.i386.rpm ; the last one i had to download from oracle website.

Then i downloaded and extracted the oracle clusterware software and the oracle database, installed the cvuqdisk on both nodes following the document mentioned at the beginning of this post and on the first node, i did exec for ssh-agent and ssh-add, i did not set any passwords for now, so i was not prompted for a passphrase, then, on first node linux1 i did:

./runcluvfy.sh stage -pre crsinst -n linux1,linux2 -verbose

and it failed! Check: User equivalence for user "oracle" failed for node linux1, the same node im performing the test on, easy, i did :

cat id_rsa.pub >> authorized_keys

and i added a swapfile to fix another warning related to the swap size, i still had Total memory check failed, but i ignore this, one of the nodes has 1027200KB but the other node only has 512MB of ram, i pray ;) and continue.

the next test went fine, and now im at "20. Install Oracle 11g Clusterware Software", everything is going quit smoothly and im following the document from oracle ... the wizard brings the two script that i need to run as root, i run the first on both nodes ... successfulyy, the second one runs successfully on the first node and fails on the other! i look at the logs, and there you ... Failed to get IP for linux2.mydomain.tld ... :( ... sure , i have nothing like this in the dns, but why did it append the domain name ? anyway, i cd to /etc/sysconfig/ , vi network and remove the domain name form the HOSTNAME line ... and retry running the root.sh script! it exits quickly, outputs only two lines saying that Oracle CRS is already configured and that it will be running under init(1M) ! im now not sure if its working correctly or not!

i perform the following test described in the doc, and here what i get :

[oracle@linux1 ~]$ $ORA_CRS_HOME/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.linux1.gsd application 0/5 0/0 ONLINE ONLINE linux1 ora.linux1.ons application 0/3 0/0 ONLINE ONLINE linux1 ora.linux1.vip application 0/0 0/0 ONLINE ONLINE linux1 ora.linux2.gsd application 0/5 0/0 ONLINE ONLINE linux2 ora.linux2.ons application 0/3 0/0 ONLINE OFFLINE ora.linux2.vip application 0/0 0/0 ONLINE ONLINE linux2

aaaaahhhh ... there is something incorrect ... i cd to /u01/app/crs/bin and try to guess what executable can be used, i try ./onsctl start ... and its still trying to use the domainname attached to hostname ... i switch to root, do hostname linux2, exit root retry ... now here is what i get :

[oracle@linux2 bin]$ ./onsctl start globalInitNLS: NLS boot file not found or invalid -- default linked-in boot block used Number of onsconfiguration retrieved, numcfg = 0 globalInitNLS: NLS boot file not found or invalid -- default linked-in boot block used globalInitNLS: NLS boot file not found or invalid -- default linked-in boot block used Number of onsconfiguration retrieved, numcfg = 0 onsctl: ons started

looks good .. but retrying the $ORA_CRS_HOME/bin/crs_stat -t -v yields the same output as before.

a quick google search brings me to this blog post, and srvctl grabs my attention! well, i played around a little and did:

./srvctl stop asm -n linux2 $ORA_CRS_HOME/bin/crs_stat -t -v ./srvctl stop nodeapps -n linux2 $ORA_CRS_HOME/bin/crs_stat -t -v

then i switch to root and do:

/etc/init.d/init.crs stop /etc/init.d/init.crs start

and now $ORA_CRS_HOME/bin/crs_stat -t -v reports everything to be online :) ... i have no clue which if any of the above was sufficient, nor do i know if this will screw my installation later on ! but the oracle clusterware software installation reported finishing successfully.

I went ahead and started installing the oracle database, i got a warning about ip_local_port_range, again, although it hought i followed the doc step by step, it seemed this step was not done, and now i fear there is something else i did not do ... :( anyway, i fixed it using sysctl and wrote the values to /etc/sysctl.conf too ... and procceeded ... database installed, seemed to go successfully, examples installed, seemed to go successfully, TNS listiner too, now i started creating a database, following the doc, with ASM ... ASM creator working ... BINGO ! i get an error:

PRKS-1009: Failed to start ASM instance "+ASM2" on node "linux2", [CRS-0215: Could not start resource ora.linux2.ASM.asm".]

i go to linux2 node, and i try /etc/init.d/oracleasm listdisks ... i get non ... :( ... i try /etc/init.d/oracleasm scandisks ... it finishes successfully ... but still no good i get nothing when i do list disks ! checking for permission issue, i did:

cd app/ [root@linux2 app]# ls crs oracle oraInventory [root@linux2 app]# ls -alh total 20K drwxrwxr-x 5 root oinstall 4.0K Apr 27 18:57 . drwxr-xr-x 3 root root 4.0K Apr 9 15:34 .. drwxr-xr-x 35 root oinstall 4.0K Apr 27 18:57 crs drwxrwxr-x 5 oracle oinstall 4.0K Apr 28 16:24 oracle drwxrwx--- 4 oracle oinstall 4.0K Apr 28 15:37 oraInventory

As you can see, crs owner was root, in the document, it should have been oracle, so i did chown -R oracle:oinstall /u01/app , still this did not help! i look into logs, i find in /u01/app/crs/log/linux2/crsd/crsd.log :

2008-04-28 16:25:23.484: [ CRSRES][128224144] startRunnable: setting CLI values 2008-04-28 16:26:01.377: [ CRSAPP][128224144] StartResource error for ora.linux2.ASM2.asm error code = 1

fiddling more in the logs (which i dont understand their structure btw), i found this "warning"

Starting ORACLE instance (normal) WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 285212672 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 263954432 and used is 0 bytes. memory_target needs larger /dev/shm

so i manually change the size by setting it in /etc/fstab by inserting the line:

tmpfs /dev/shm tmpfs size=300m 0 0

and now things go fine again :), but i risk Linux deadlocking ;) ... now i try to create disk group and i get a message saying:

Could not mount the diskgroup on remote node linux2 using connection service linux2:1521:+ASM2. Ensure that the listener is running on this node and the ASM instance is registered to the listener. Recived the following error:

ORA-15032: not all alternations performed ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ORCL_DATA1"

i get the same error when i try to create group FLASH_RECOVERY_AREA ... so i check /etc/init.d/oracleasm listdisks ... and it yields nothing! i try to do scandisks, still same thing, so then i stop init.crs and start it again ... listdisks still brings nothing! then i did mount -a as root, scandisks and listdisks and i get the volumes .... good ... i click ok on the message, i get a message something like "no more date to read from socket" ... no idea if this is normal or not! i pray ;) and check the box beside ORCL_DATA1 and press next!

I followed the doc and everything went fine, till again, it complained about the size of the shm, it was too small, so i had to increase it manually ... and i continued and things went fine and the database installation started ...

Well, the database installation took too long, so i packed and left it and decided to continue on the next day ... what happened was that there was a power failure ( bad ... :( bad ... )the node linux1 did not finish the creation of the database ... so i ran dbca again, deleted the database, got a few warnings ... and started creating a new database, this time ive got an error that there is not enough disk space on the ASM !! i think that the data created by the first attemp is still there ... to make the long story short, the datbase installation failed and i decided i need to do things diffrently, starting by using better hardware and larger iscsi targets.

crazy little thing called love