Ganglia Yum 安装以及监控Hadoop 手记

最开始使用的源码安装,各种编译各种报错。然后发现epel中有yum源,于是就使用epel装了。下面是epel.repo的配置:

  1. [epel]
  2. name=Extra Packages for Enterprise Linux 6 - $basearch
  3. baseurl=http://download.fedoraproject.org/pub/epel/6/$basearch
  4. #mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-6&arch=$basearch
  5. failovermethod=priority
  6. enabled=1
  7. gpgcheck=1
  8. gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
  9. [epel-debuginfo]
  10. name=Extra Packages for Enterprise Linux 6 - $basearch - Debug
  11. baseurl=http://download.fedoraproject.org/pub/epel/6/$basearch/debug
  12. #mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-debug-6&arch=$basearch
  13. failovermethod=priority
  14. enabled=0
  15. gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
  16. gpgcheck=1
  17. [epel-source]
  18. name=Extra Packages for Enterprise Linux 6 - $basearch - Source
  19. baseurl=http://download.fedoraproject.org/pub/epel/6/SRPMS
  20. #mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-source-6&arch=$basearch
  21. failovermethod=priority
  22. enabled=0
  23. gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
  24. gpgcheck=1
还用到了几个Centos6自带的rpm 包,下面这个是我本地的源,bill.repo配置如下,主要是Dvd1和Dvd2的包,其他的配置请忽略:
  1. [centos6.6-d1]
  2. name=centos6.6-dvd1
  3. enabled=1
  4. baseurl=http://yum-bill/centos6.6/Packages/
  5. gpgcheck=0
  6. #baseurl=file:///mnt/centos6.6
  7. #baseurl=http://192.168.24.49/centos6.6/Packages
  8. #gpgcheck=1
  9. #gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
  10. [centos6.6-d2]
  11. name=centos6.6-dvd2
  12. enabled=1
  13. gpgcheck=0
  14. baseurl=http://yum-bill/centos6.6/dvd2/Packages/
  15. [cloud]
  16. name=cloudstack4.5.1
  17. enabled=1
  18. gpgcheck=0
  19. baseurl=http://yum-bill/cloudstack4.5.1/
  20. [openvswitch]
  21. name=openvswitch
  22. enabled=1
  23. gpgcheck=0
  24. baseurl=http://yum-bill/openvswitch/
  25. [ceph6]
  26. name=ceph6
  27. enabled=1
  28. gpgcheck=0
  29. baseurl=http://yum-bill/ceph6/

安装

        服务端:

        下面的ganglia*包含了 :ganglia ganglia-gmetad ganglia-gmond ganglia-web

  1. yum install rrdtool ganglia* pcre httpd php
        客户端:
  1. yum install ganglia-gmond
 配置
        服务端:
  1. #1、gmetad.conf的配置
  2. vi /etc/ganglia/gmetad.conf
  3. #去掉各种注释之后如下:
  4. data_source "hadoop-cluster" v1:8649 # 只修改了这个集群名称和服务器主机和端口,后面的都是默认值。集群名称和主机后面的gmond.conf需要用到
  5. setuid_username ganglia
  6. case_sensitive_hostnames 0
  7. #2、gmond.conf的配置,下面只列出修改过了配置。没列出来的即代表使用默认值
  8. vi /etc/gmond.conf
  9. cluster {
  10. name = "hadoop-cluster" # 这里和上面gmetad.conf 主机一致
  11. owner = "unspecified"
  12. latlong = "unspecified"
  13. url = "unspecified"
  14. }
  15. udp_send_channel {
  16. host = v1 // 使用host为单播,mcast_join为多播
  17. port = 8649
  18. ttl = 1
  19. }
  20. udp_recv_channel { // 如果使用单机广播,要删除“mcast_join”和“bind
  21. #mcast_join = 239.2.11.71
  22. port = 8649
  23. #bind = 239.2.11.71
  24. retry_bind = true
  25. # Size of the UDP buffer. If you are handling lots of metrics you really
  26. # should bump it up to e.g. 10MB or even higher.
  27. # buffer = 10485760
  28. }
  29. #3、开机启动相关
  30. # 开机运行采集进程
  31. chkconfig --levels 235 gmond on
  32. # 开机运行数据存储进程
  33. chkconfig --levels 235 gmetad on
  34. # 开机运行apache服务
  35. chkconfig --levels 235 httpd on
    客户端:
    
  1. # 在server端执行scp,将配置文件分发到client端。我这里发到了v2,v3,v4。也就是算上v1一共有4台机器
  2. scp /etc/ganglia/gmond.conf {ip}:/etc/ganglia/gmond.conf
  3. # 设置开机运行数据采集进程
  4. chkconfig --levels 235 gmond on
   

运行

    服务端:
         
  1. service gmond start
  2. service gmetad start
  3. service httpd start
    客户端:
    
  1. service gmond start
测试
    
  1. # 命令行打印当前活动client
  2. gstat -a
  3. # web显示当前client状态
  4. http://{your_ip}/ganglia

Apache密码验证

    通过web方式访问ganglia不需要密码,所以我们通过apache设置密码达到安全目的。
  1. htpasswd -c /etc/httpd/conf.d/passwords {your_name}
  2. cd /usr/share/ganglia
  3. vi .htaccess // 创建apache目录密码文件,并写入下面内容
  4. AuthType Basic
  5. AuthName "Restricted Files"
  6. AuthUserFile /etc/httpd/conf.d/passwords
  7. Require user {your_name}
  8. vi /etc/httpd/conf/httpd.conf
  9. <Directory />
  10. Options FollowSymLinks
  11. AllowOverride None
  12. </Directory>
  13. 修改为:
  14. <Directory />
  15. Options FollowSymLinks
  16. AllowOverride AuthConfig
  17. </Directory>
如果这个时候访问:http://v1/ganglia还是报错的话 (403 ERROR ),修改以下文件配置:
  1. vi /etc/httpd/conf.d/ganglia.con
  2. Alias /ganglia /usr/share/ganglia
  3. <Location /ganglia>
  4. Order deny,allow
  5. #Deny from all #将这行注释,写上下面那行
  6. Allow from all
  7. Allow from 127.0.0.1
  8. Allow from ::1
  9. # Allow from .example.com
  10. </Location
测试访问:
输入账号密码就可以了
进来之后是这样的:
可以看到4个节点都能监控到了
吸取个教训,以后但凡能通过yum装的,尽量用yum省时间。 安装篇差不多了,下面用它来监控Hadoop.
———————–—————–—————–—————–—————--Hadoop 监控分割线———————————–—————–—————–—————–
上面的监控是把4台机器都放到了一个组里面,和Hadoop并没有上面关系。下面要开始对Hadoop集群进行监控了,我目前4台机器搭建集群情况如下:
v1:Active Namenode/ResourceManager
v2:Standby Namenode/ResourceManager、DataNode
v3:DataNode
v4:DataNode

那么我这里会将原来的gmetad.conf和gmond.conf进行修改,同时还会修改Hadoop的hadoop-metrics2.properties的配置,这个文件修改后在Ganglia中可以看到很多Hadoop监控指标,超级爽!
修改如下配置:
    1.v1上的gmetad.conf
        原来的data_source只有一行,现在调整成两行,并且使用两个不同的端口,如下:
        
  1. data_source "hadoop-namenodes" v1:8649 v2:8649
  2. data_source "hadoop-datanodes" v3:8650 v4:8650 #注意这里是8650,后面datanode上的gmond.conf要用
  3. setuid_username ganglia
  4. case_sensitive_hostnames 0
    2.v1和v2的gmond.conf,这两个在这里我把它们当成hadoop-namenodes集群的配置
    原来的gmond.conf就改个cluser-name就好啦,别的例如端口不需要改,还是用8649
    
  1. /*
  2. * The cluster attributes specified will be used as part of the <CLUSTER>
  3. * tag that will wrap all hosts collected by this instance.
  4. */
  5. cluster {
  6. name = "hadoop-namenodes" #只需要修改这里
  7. owner = "nobody"
  8. latlong = "unspecified"
  9. url = "unspecified"
  10. }
  11. /* The host section describes attributes of the host, like the location */
  12. host {
  13. location = "unspecified"
  14. }
  15. /* Feel free to specify as many udp_send_channels as you like. Gmond
  16. used to only support having a single channel */
  17. udp_send_channel {
  18. #bind_hostname = yes # Highly recommended, soon to be default.
  19. # This option tells gmond to use a source address
  20. # that resolves to the machine's hostname. Without
  21. # this, the metrics may appear to come from any
  22. # interface and the DNS names associated with
  23. # those IPs will be used to create the RRDs.
  24. #mcast_join = 239.2.11.71
  25. host = v1
  26. port = 8649
  27. ttl = 1
  28. }
  29. /* You can specify as many udp_recv_channels as you like as well. */
  30. udp_recv_channel {
  31. #mcast_join = 239.2.11.71
  32. port = 8649
  33. #bind = 239.2.11.71
  34. /*
  35. * The cluster attributes specified will be used as part of the <CLUSTER>
  36. * tag that will wrap all hosts collected by this instance.
  37. */
  38. cluster {
  39. name = "hadoop-namenodes"
  40. owner = "nobody"
  41. latlong = "unspecified"
  42. url = "unspecified"
  43. }
  44. /* The host section describes attributes of the host, like the location */
  45. host {
  46. location = "unspecified"
  47. }
  48. /* Feel free to specify as many udp_send_channels as you like. Gmond
  49. used to only support having a single channel */
  50. udp_send_channel {
  51. #bind_hostname = yes # Highly recommended, soon to be default.
  52. # This option tells gmond to use a source address
  53. # that resolves to the machine's hostname. Without
  54. # this, the metrics may appear to come from any
  55. # interface and the DNS names associated with
  56. # those IPs will be used to create the RRDs.
  57. #mcast_join = 239.2.11.71
  58. host = v1
  59. port = 8649
  60. ttl = 1
  61. }
  62. /* You can specify as many udp_recv_channels as you like as well. */
  63. udp_recv_channel {
  64. #mcast_join = 239.2.11.71
  65. port = 8649
  66. #bind = 239.2.11.71
  67. retry_bind = true
    3.修改v3和v4的gmond.conf,这里需要调整cluster-name和端口。将3个端口都改成8650同时把udp_send_channel-host修改成v3。当成hadoop-datanodes来配置
  1. /*
  2. * The cluster attributes specified will be used as part of the <CLUSTER>
  3. * tag that will wrap all hosts collected by this instance.
  4. */
  5. cluster {
  6. name = "hadoop-datanodes" #修改名称
  7. owner = "nobody"
  8. latlong = "unspecified"
  9. url = "unspecified"
  10. }
  11. /* The host section describes attributes of the host, like the location */
  12. host {
  13. location = "unspecified"
  14. }
  15. /* Feel free to specify as many udp_send_channels as you like. Gmond
  16. used to only support having a single channel */
  17. udp_send_channel {
  18. #bind_hostname = yes # Highly recommended, soon to be default.
  19. # This option tells gmond to use a source address
  20. # that resolves to the machine's hostname. Without
  21. # this, the metrics may appear to come from any
  22. # interface and the DNS names associated with
  23. # those IPs will be used to create the RRDs.
  24. #mcast_join = 239.2.11.71
  25. host = v3 #修改
  26. port = 8650 #修改
  27. ttl = 1
  28. }
  29. /* You can specify as many udp_recv_channels as you like as well. */
  30. udp_recv_channel {
  31. #mcast_join = 239.2.11.71
  32. port = 8650 #修改
  33. #bind = 239.2.11.71
  34. retry_bind = true
  35. # Size of the UDP buffer. If you are handling lots of metrics you really
  36. # should bump it up to e.g. 10MB or even higher.
  37. # buffer = 10485760
  38. }
  39. /* You can specify as many tcp_accept_channels as you like to share
  40. an xml description of the state of the cluster */
  41. tcp_accept_channel {
  42. port = 8650 #修改
  43. # If you want to gzip XML output
  44. gzip_output = no
  45. }
    4.修改Hadoop的配置文件hadoop-metrics2.properties并且分发到另外3台机器,修改后的配置如下(这里只是把尾部的配置项目打开了,前面有些不相干的配置使用默认值):
  1. #
  2. # Below are for sending metrics to Ganglia
  3. #
  4. # for Ganglia 3.0 support
  5. # *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30
  6. #
  7. # for Ganglia 3.1 support
  8. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
  9. *.sink.ganglia.period=10
  10. # default for supportsparse is false
  11. *.sink.ganglia.supportsparse=true
  12. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
  13. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
  14. # Tag values to use for the ganglia prefix. If not defined no tags are used.
  15. # If '*' all tags are used. If specifiying multiple tags separate them with
  16. # commas. Note that the last segment of the property name is the context name.
  17. #
  18. #*.sink.ganglia.tagsForPrefix.jvm=ProcesName
  19. #*.sink.ganglia.tagsForPrefix.dfs=
  20. #*.sink.ganglia.tagsForPrefix.rpc=
  21. #*.sink.ganglia.tagsForPrefix.mapred=
  22. namenode.sink.ganglia.servers=v1:8649
  23. datanode.sink.ganglia.servers=v3:8650
  24. resourcemanager.sink.ganglia.servers=v1:8649
  25. nodemanager.sink.ganglia.servers=v3:8650
  26. mrappmaster.sink.ganglia.servers=v1:8649
  27. jobhistoryserver.sink.ganglia.servers=v1:8649
    5.重启v1的gmetad和4台机器的gmond服务,然后重启整个Hadoop集群。
    
  1. #v1上执行
  2. service gmetad restart
  3. #4台机器上都执行
  4. service gmond restart
    6.再次访问v1上的ganglia,就可以看到两个cluster了。并且有很多Hadoop监控的指标,很方便!
    访问:192.168.30.31/ganglia,我这里用的v1的ip.
    结果:
    
选中hadoop-datanodes后:
选中hadoop-namenods后:
查看hadoop的相关指标:
好了,到这里就差不多了。 足够监控Hadoop集群使用了,下面研究下整合到Nagios实现报警。