 |
|
|
 |
 |
|
|
|
|
|
 |
The definition of what constitutes Enterprise Management depends largely on your environment, but could include the following:
All of this depends on a number of factors, such as your environment, your products, and the scope of what you are trying to achieve. Regardless of this I'll run through each of these areas briefly to give you an idea of the things I think about when implementing an Enterprise Management solution using HP OpenView. I then provide you with my recommended 3-phase approach to effective Enterprise Management and finally I summarise my thoughts & areas for potential concern. | |
 |
Enterprise Management - The Building Blocks
| |
 |
Network Management & Monitoring
The ability to monitor your network devices & network links is the foundation of any Enterprise Management task. It is often overlooked or considered (assumed?) to be the responsibility of another team within the organisation. Despite this, network monitoring is a relatively easy task given the appropriate focus, time & resources. This would usually involve the identification & classification of network devices, configuration of SNMP traps, loading & configuring MIB files, and then configuring the alerting conditions with HP OpenView. An effective Enterprise Management system must include network management & fault reporting, otherwise the whole "enterprise" monitoring concept is undermined. What is the point in monitoring thousands of servers, all showing a status of "green" with no associated faults, if your underlying network is down? |
 |
Systems Management & Monitoring
The monitoring of servers is critical to the success of any Enterprise Monitoring strategy. The scope of this monitoring depends heavily on your environment, but you might want to consider the monitoring of Windows, HP-UX, Solaris, AIX and
Linux platforms, both physical & virtual. Monitoring of servers will typically involve the installation of HP OpenView monitoring
agents which pro-actively monitor the systems according
to well-defined & agreed monitoring baselines. Some examples of this are the
monitoring of filesystems, disks, critical processes, important logfiles, as
well as availability monitoring of the servers themselves. All servers of importance should be monitored effectively within your environment. |
 |
Database Management & Monitoring
Once you have your servers monitored using a
well-defined and agreed strategy, the next natural
progression should be to start to consider the next tier, for example databases such as MS SQL, Oracle, Sybase, or DB2. All
critical databases should be
monitored to some degree, whether this is via in-house scripting and
basic logfile/process monitoring, or via a more advanced/complete
mechanism such as that provided by the HP OpenView Database Smart
Plug-Ins (SPIs). You may also consider integration with tools such as
Oracle Enterprise Manager (Grid Control), but I don't recommend this
being the sole method of database monitoring for many reasons (such as SPOF). |
 |
Application Management & Monitoring
Now that you have considered network devices, servers and databases, the next thing to consider is application monitoring. This does not need to be complicated, it
could mean the monitoring of simple application processes, logfiles,
or the running of scripts. You should identify the critical
applications in use within your organisation (or consider the top 5
initially) and engage with those teams to ascertain what level
of monitoring would be useful. If they can identify current issues, or
where they have had problems/outages in the past, this can help drive
the requirements for providing some basic, but ultimately proactive monitoring in
the application space. Once you have engaged with a team in this manner
it might become apparent that more detailed monitoring is required,
which may involve using an HP OpenView SPI module, integrating with a
3rd party tool, or writing some more complex monitoring functionality. |
 |
To further enhance your monitoring
capability you might also consider integrating applications within your environment into your Enterprise Monitoring
system. For example, where an application is already
performing a monitoring function, you might want these
alerts integrated with your primary monitoring tool. This could take the form of an application forwarding alerts into HP OpenView, where the application is providing additional benefits such as local suppression, correlation or enhanced functionality. With any single-point integration you should
always consider the risks associated with such configurations and try
and mitigate these where possible - single-point integration can
sometimes mean single-point-of-failure! An alternative integration method can be to
utilise a local monitoring agent (e.g. HP OpenView), so that an
application agent sends alerts locally to the resident HP OpenView
agent, which in turn alerts to your central monitoring console using standard mechanisms. The benefit of this approach is that you can
eliminate the single-point-of-failure scenario, and also reduce network traffic. Examples of possible applications for integration are as follows: Systems Insight Manager
(Windows), EMC Control Centre (SAN), Cisco Call Manager (Voice),
SolarWinds (Networks), Oracle Enterprise Manager Grid Control (OEM).
You should investigate which applications are used in your company and review them thoroughly. |
 |
Incident management, in the context of
Enterprise Monitoring, relates to how alerts & faults highlighted in
the monitoring environment are escalated and recorded in your incident
management system (for example HP Service Centre, BMC Remedy, ServiceNow). Once an alert from your monitoring environment is produced it should be
escalated to the incident management system so that a record is created
for the failure and the relevant team can update the trouble-ticket with
information on cause, effect, impact and resolution/remediation details
(where possible). Initially this may be a manual process of raising a trouble-ticket to
the relevant system, but semi/full automation should be considered to
allow datacentre operations teams to right-click to raise
trouble-tickets based on the alert. This could progress naturally to
allow those alerts destined for a trouble-ticket only (i.e. no callout
required) to be raised as "auto-tickets" - thereby ensuring tickets are
created automatically, and reducing the need for datacentre operations
teams to even see such alerts, thereby leaving them to concentrate on
the more critical issues. Once such a process is underway, tested and working satisfactorily, you
could then consider automating trouble-ticket generation for all alerts
seen by the datacentre operations team, thereby reducing the chances of
alerts being missed. This should be a gradual, phased approach to
ensure the entire process works as your business requires. |
 |
Notification Management & Strategy
Once your Enterprise Monitoring solution has
matured you may naturally consider the use of a notification system.
Support teams within your organisation might also be requesting this
from an early stage, especially if you have improved the monitoring
capabilities for their particular domain! Your organisation might have some form of notification tool already is
use within the business, so this should be investigated and reviewed
accordingly. Examples where this might be used include helpdesk & incident
management, business continuity planning (BCP) and indeed, current or
previous monitoring systems. You might decide to leverage the use of the resident notification tool
(if it is fit for purpose), or you may consider a wider approach and
review whether a new system should be used, phasing out the old system
in due course, to create a new corporate notification system. From an Enterprise Monitoring point of view, you might want to configure
various HP OpenView alerts to automatically notify teams using SMS
messages to mobile phones, pagers etc. It is also possible to configure
responses from such tools, so that the recipient can respond to the
notification, escalate issues, or even take remediation steps remotely.
The benefits of automated notifications are numerous - faster alerting
of faults to the correct teams, improved accuracy of information
received (rather than a confusing telephone call), along with full
auditing of the notification process itself. Any notification strategy should obviously be planned, agreed and
implemented in phases to ensure it delivers exactly what is required,
and does not become a burden on resources, whether IT or people-based. |
 |
Performance & Capacity Management
Performance monitoring in this context does not simply mean monitoring servers for CPU or disk utilisation for example, it is more concerned with creating a function whereby you can deliver performance data to teams if they require it. This could be in the form of performance graphs/reports for reviewing testing cycle phases, for analysing environments during major faults/incidents, or for providing data evidence to support trending or capacity planning exercises. With the HP OpenView monitoring agents, they collect and store a core set of metrics by default, which can then be graphed using tools like HP OpenView Performance Manager. These tools are intended to be used on ad-hoc basis, as and when the need arises to produce performance graphs or on-demand reports. |
 |
So often senior management are desperate
for a Service Management view of their environment. They are
keen to be able to show the critical services within the organisation,
and show alerts & status to indicate service health/degradation, but this is sometimes considered prematurely. There is no benefit in showing services
as "green" (traffic light indicators are often a request!)
if the underlying infrastructure components are not managed and
monitored completely and effectively. Showing a service as "healthy" is inaccurate if you are not monitoring the network (or other critical
components). Once you have a mature Enterprise
Management and monitoring solution deployed, or at the very least the
structure and plans to deliver such a solution, then consideration can
be given to providing a service management capability. There are various tools that can help
with providing a view of the health of your core business services, but a
very simple method is to utilise the data held within your incident
management system, and represent this data graphically. The benefit of
using the incident management system itself is that this will record
alarms originating from your Enterprise Monitoring solution (e.g. HP
OpenView Operations Manager) but will also record incidents created from
other sources, including those raised by clients and users. |
 |
Reports are of no use if nobody is going to read them, but that does not mean reporting should be dismissed out of hand. Effective reporting on metrics or KPIs can help identify trends, show project benefits & ultimately justify projects & resourcing. Examples of some useful types of reporting include incident reports (tickets per team, for example), notification reports (frequency of out-of-hours callouts), performance & capacity reports (CPU utilisation, free disk space etc.) and ultimately service reports (indicating service levels, SLAs etc.). Reporting is not something that may need to be considered at the onset of an Enterprise Management programme, but it should be considered carefully and investigated completely when the requirement presents itself. | |
 |
| Enterprise Management - My Recommended Approach | |

|
Phase 1
- Review the current environment, including hardware, software, personnel, processes & procedures
- Define the project, the plan, the migration strategy (if applicable) and start to engage teams
- Purchase new software as required (for example agents, SPI modules, new products)
- Deliver the monitoring infrastructure - build, install, patch, review, refine, test, document
- Create standardised OS monitoring baselines for all platforms - plan, test, review, refine
- Deploy OS monitoring baselines to all platforms - plan, deploy, review, refine, document
- Monitor network devices (and any other SNMP based devices)
- Monitor databases - plan, define baselines, test, review, refine, deploy, document
- Define & document all processes & procedures |
 |
|
Phase 2
- Define incident management requirements & strategy and deliver appropriate solution
- Define notification requirements & strategy and deliver appropriate solution
- Monitor applications (basic)
- Monitor applications (advanced)
- Application integrations
- Investigate all other critical areas that require monitoring and plan/address accordingly |
 |
|
Phase 3
- Define patching strategy, product roadmaps, long-term plans etc.
- Review support - review agreements, consolidate as required, assess support effectiveness & suitability
- Review licensing - bulk purchases, license management, true-up exercises etc.
- Define performance management requirements & strategy and deliver appropriate solution
- Define service management requirements & strategy and deliver appropriate solution
- Define reporting requirements & strategy and deliver appropriate solution | | |
 |
Enterprise Management - Summary
| |
 |
In my experience, if you can achieve all of these things as part
of your Enterprise Management & Monitoring solution you are in a very good place, and you are in a very small minority! I have seen very few organisations deliver
a complete Phase 1 monitoring solution, even fewer have fully implemented a Phase 2
level solution, and I have not seen ANY organisation successfully deliver a complete
Enterprise Management solution covering all of the phases and scope described here.
Many organisations simply do not address all
components of the solution, either with disparate teams not
integrating with the overall solution or with core areas not being
considered at all.
However, if you take the time to ensure you consider all of these aspects within your environment, coupled with ensuring you have the right people on-board to drive the solution forward into reality, you will have a firm foundation on which to build your Enterprise Management & Monitoring programme.
| |
|
|
|
|
|
Copyright © Protocol Limited 2012 |
Registered in England No. 3182190 | VAT No. 677 7764 63 | |
|
|
|
 |
|
 |
|