Help
  • Explore Community
  • Get Started
  • Ask the Community
  • How-To & Best Practices
  • Contact Support
Notifications
Login / Register
Community
Community
Notifications
close
  • Forums
  • Knowledge Center
  • Events & Webinars
  • Ideas
  • Blogs
Help
Help
  • Explore Community
  • Get Started
  • Ask the Community
  • How-To & Best Practices
  • Contact Support
Login / Register
Sustainability
Sustainability

Ask Me About Webinar: Data Center Assets - Modeling, Cooling, and CFD Simulation
Join our 30-minute expert session on July 10, 2025 (9:00 AM & 5:00 PM CET), to explore Digital Twins, cooling simulations, and IT infrastructure modeling. Learn how to boost resiliency and plan power capacity effectively. Register now to secure your spot!

DCE performance troubleshooting guide

Data Center Expert Virtual Appliance (VM)

cancel
Turn on suggestions
Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Home
  • Schneider Electric Community
  • EcoStruxure IT Help Center
  • EcoStruxure IT Help Center Categories
  • Data Center Expert
  • Data Center Expert Virtual Appliance (VM)
  • DCE performance troubleshooting guide
Options
  • Mark as New
  • Mark as Read
  • Bookmark
  • Subscribe
  • Email to a Friend
  • Printer Friendly Page
  • Report Inappropriate Content
Invite a Co-worker
Send a co-worker an invite to the portal.Just enter their email address and we'll connect them to register. After joining, they will belong to the same company.
You have entered an invalid email address. Please re-enter the email address.
This co-worker has already been invited to the Exchange portal. Please invite another co-worker.
Please enter email address
Send Invite Cancel
Invitation Sent
Your invitation was sent.Thanks for sharing Exchange with your co-worker.
Send New Invite Close

Related Forums

  • EcoStruxure IT forum

  • APC UPS Data Center & Enterprise Solutions Forum

Previous Next

Invite a Colleague

Found this content useful? Share it with a Colleague!

Invite a Colleague Invite

EcoStruxure IT Support

Submit a support request for additional assistance with EcoStruxure IT software.

Request Support
Back to Data Center Expert Virtual Appliance (VM)
Options
  • Mark as New
  • Mark as Read
  • Bookmark
  • Subscribe
  • Email to a Friend
  • Printer Friendly Page
  • Report Inappropriate Content
1 Like
4614 Views

Link copied. Please paste this link to share this article on your social media post.

Trying to translate this page to your language?
Select your language from the translate dropdown in the upper right. arrow
Translate to: English
  • (Français) French
  • (Deutsche) German
  • (Italiano) Italian
  • (Português) Portuguese
  • (Русский) Russian
  • (Español) Spanish

DCE performance troubleshooting guide

Picard EcoStruxureIT
‎2019-11-20 06:13 AM

Last Updated: jdutra Cadet ‎2024-05-07 09:53 AM

 
This guide covers some of the symptoms you might see when a DCE server has a performance bottleneck, tools you can use to better understand the resource that is being strained, and steps to try to mitigate the issue.
 

Symptoms of performance problems

 

There are several symptoms you might see while you interact with DCE that could suggest that there is a potential performance problem in your environment. This list is by no means exhaustive. Some common symptoms are:

 

Missed sensor update values

The nbc.xml log from the DCE server contains ERROR level messages about dropped sensor updates coming from com.apc.isxc.vb.listeners.sensor.impl.SensorQProcessorRunnable or com.netbotz.server.services.repository.impl.RepositoryEventServiceImpl

Log in to the DCE web client and click Logs in the upper right corner to view the nbc.xml log.

 

Delay in receiving alarm data

Alarms come into the system significantly after they were triggered on the monitored device.

 

Server hang, crash, or timeouts

This error message is displayed on the DCE server: Hung_task_timeout error

Contact technical support to gather capture server logs.

See https://www.apc.com/us/en/faqs/index?page=content&id=FA303596

 

Performance analysis from DCE

 

Top

Top is a standard Linux diagnostic tool used to monitor system performance. Direct access to the DCE server is not allowed. Contact technical support to capture server logs that include a top_output.

Note: Prior to DCE 7.7, the top output from captured server logs is averaged across all CPU cores. Starting with 7.7, the output per core is available, which is more insightful.

 

Support looks at a few different values in the top output:

 

CPU load average

 

This lists a load average for the last one, five, and fifteen-minute period. If this number is abnormally high relative to the number of cores you have defined for the system, it is a good indicator that the system is running with a lot of CPU load. The exact cause of the CPU load won’t be clear from this data. The system could be CPU starved if this value remains high for an extended period of time.

 

It is expected that this value is elevated for some period of time after a system reboot or during a large discovery. You divide this value by the number of cores, and then multiply by 100 to get a percent utilization. Each physical core counts as 1; a hyperthreaded core counts as ½.

 

For example, an 8 core / 8 thread virtual machine should be able to sustain a load average of 8.0 without being considered oversubscribed. If you are using an 8 core, 16 thread configuration, your acceptable load average is more like 12.0 because not all 16 threads are backed by physical cores.

 

Mitigation in this case consists of either reducing the load on your DCE (fewer devices, longer poll period) or allocating more CPU resources to your DCE to get your load average to a more acceptable level.

 

Make sure to review the DCE sizing guide for insight on the best starting values for CPU configuration to use based on the system workload.

 

Wait average

 

This represents the amount of time your system is stopped waiting for the underlying storage device to service requests. DCE is extremely sensitive to IO path delays, so even a slightly elevated value for wait average that persists for any extended period of time can be an issue for the system.

 

Ideally you want to see this value listed by CPU core. If you see any one individual core with a %wa continually over 20, your storage is likely not keeping up with DCE. If the system is allowed to stay in this state for an extended period of time, you usually start to see the missed sensor update processing symptom listed above. Bear in mind that if you are reviewing the average top output instead of by core, this value can deceptively appear to be much lower due to averaging and the number of cores in the system.

 

Mitigation requires a deeper dive into your storage path. If you are using network storage, you want to review the latency and utilization of the storage array. If using local ESXi storage, you can review the Host performance data in VMWare. Usually, either reducing the load on the DCE by decreasing device count or increasing poll period will help.

If the storage is truly subpar, upgrading to SSDs, removing other load from the storage system, or improving the network path between the DCE and storage may be required. Reference the DCE sizing guide for more details on appropriate storage sizing.

 

SensorQstats

 

This is a statistic that the DCE server keeps track of. It represents the amount of sensor processing the server is doing every hour. This value can be monitored at:

http://<dce server ip>/nbc/compress/support/sensorqstats

 

The dataset can also be retrieved by technical support with a capture server logs gather request.

Regardless of where you view the data, this statistic will publish once an hour every hour. This metric is good to monitor because it shows whether the DCE is keeping up with the current work load or if it is falling behind.

 

These values are of particular interest:

 

Processed

This is the number of sensor updates that the server has processed in the last hour. This value is directly impacted by the number of devices in your system, your poll period, and the number of sensor changes that are occurring.

 

This value represents the total number of unique events that the system completed within that 1-hour period of time. It is best observed during steady system processing. Events like discovering a large quantity of new devices can skew this number for a period or two. Use this value when you review the DCE sizing guide to determine CPU / RAM / Storage sizing.

 

Dropped

This value should always be zero on a healthy system. Any non-zero value indicates a sensor data point that was dropped because a component of the system cannot keep up. When this value is not zero, we often see %wa elevated in top output.

 

Remember, DCE is very intolerant of storage latency. If the value for dropped is a recurring non-zero value, some amount of data is constantly lost. If there is a non-zero value occasionally, look into the system during those times; it is likely running near the edge of its capabilities and is pushed beyond its limits. Events such as a large alarm storm, a discovery pulling in a large number of devices, or similar high load events can all push the system temporarily into this state.

 

A properly configured system should always have zero drops. Anything dropped will be lost forever, so it’s important to monitor and adjust resources accordingly to prevent this.

 

Remaining

This value represents the amount of sensor data still in the queue to be processed when the qstats report was run. This is not dropped data; it is data that had not yet finished being processed. On smaller systems, this will likely always be zero. As the workload increases, this value could start to become non-zero.

 

By itself, having some non-zero values here is not cause for alarm. If you are regularly seeing non-zero values, or the value is growing in size every hour, it’s a sign that the system is starting to have trouble keeping up.

 

Performance analysis from DCE VM

 

The primary focus of this section is specific to DCE run as a virtual machine. Information that is not hypervisor-centric technically applies to DCE physical servers also.

Sometimes, there are delays for reasons not readily seen from the DCE or DCE OS point of view. In these cases, it helps to review the performance data from the hypervisor side of things to see if there are any performance issues.

 

Resource Limits

 

It is good to verify whether any resource limits are defined for the DCE virtual machine. Resource limits are a throttling mechanism that allow a VM administrator to restrict the amount of resources a virtual machine can consume. These limits can be imposed on CPU, RAM, and storage resources, effectively restricting the virtual machines use of these resources.

 

If there are resource limits in place, try removing them or raising them to a higher value. Monitor the system utilization values as a result of the change to monitor for improvements.

 

Disk Latency

 

To start, identify the DCE VM from within the hypervisor and review the details of the virtual machine. Specifically, look for the disk drive(s) of the DCE, and the storage backing that drive.

 

If your DCE has more than one disk drive, they should ALL be located on the same storage destination. Splitting DCE drives among multiple storage backings almost always results in decreased VM performance and should be avoided as a general rule.

 

VMWare

In VMWare, you can monitor the real-time disk performance results of the storage that is backing the DCE VM. The specifics of finding this data differs a bit between versions of VMWare, and whether you investigate from the ESXi locally or from within vCenter. All the versions have support for monitoring the disk latency.

 

Look for the Advanced Performance Monitoring section of the ESXi host running the DCE VM. In that section, you can view the real time latency of all IO operations that host is sending to disk.

 

Hyper-V

In Hyper-V, you can use the windows Performance Monitor to track the Latency of the target VM. Latency can be found under the Hyper-V Virtual Storage Device category when you add counters to the Performance Monitor. Just like VMware

 

DCE is very sensitive to disk latency. Make sure the latency value, in ms, is less than 1 for the datastore backing the DCE VM. While some short-lived spikes can be tolerated, it is best to make sure the steady state and average response time remains below 1ms.

 

If response times exceed 1ms, look for ways to lower that value. You can reduce the number of systems that also use that shared volume, isolate the DCE VM to be the only system using that volume, or upgrade the target volume to have more disks, faster disks, or preferably SSDs.

 

VMware Esxtop

 

Drilling into another level of the hypervisor, you can run esxtop, a real-time performance analysis tool provided by VMWare. This utility is very similar to Linux top, and its usage is the same.

 

To start, SSH must be enabled on the ESXi running your DCE VM, and you must have the proper credentials to SSH into the ESXi. This is a real time analysis, so the information gathered will only be applicable if your DCE is in the performance degraded state while you run this tool. For intermittent issues, you should run this tool and then cause the event that triggers the degraded system state.

 

As an example, the following steps cover how to perform a 30-minute esxtop capture from the ESXi. There is additional documentation about running esxtop interactively in Additional resources below.

To capture a 30-minute data set from esxtop:

 

  1. Enable SSH
  2. SSH to the ESXi server hosting the DCE VM.
  3. Run the command:
    esxtop -b -d 5 -n 360 -a |gzip >esxtopOutput.csv.gz

    The esxtop command should monitor the ESXi for 30 minutes and create a report of all the performance counters.

  4. After the collection completes, SCP the file from the ESXi host and put the output on a Windows machine. 
  5. From the Windows machine, use Performance Monitor to analyze the data set collected. 
    To do this:
    1. Launch Performance Monitor.
    2. From the left navigation, under Monitoring Tools, right click Performance Monitor and choose Properties.
    3. Under the Source tab, change Data Source to Log Files and point it to the extracted contents you gathered from running esxtop.
    4. Click OK.
    5. Right click the graph and choose Add Counters.

 

You can now choose which data from the log collection you want to graph to determine signs of stress from typical system resources: CPU, RAM, drives. Some values of interest are:

 

  • CPU %Used of the DCE VM
    • Returns data similar to the Linux top data collected before, just another way to reference it
    • CPU load average of the host
      • Like with top, gives you CPU insight into how much load the ESXi CPU is under
    • %VMwait
      • Percentage of time the VM is waiting for kernel activity, usually disk IO
    • DAVG / KAVG / GAVG
      • Stats for latency of disk commands.
      • DAVG : Latency at the device driver level
      • KAVG: Latency at the VMKernel level
      • GAVG: GAVG= DAVG + KAVG

For additional analysis of the esxtop data, see Additional resources below.

 

Hyper-V Performance Monitor

 

To track CPU usage in Windows Performance Monitor, follow these steps:

 

  1. Launch the Performance Monitor.
  2. In the left pane, expand Monitoring Tools and select Performance Monitor.
  3. Click the green plus icon (`+`) in the toolbar or right-click in the graph area and select Add Counters.
  4. In the Add Counters window, select Processor from the Performance object dropdown menu.
  5. Choose the specific counter you want to track, such as % Processor Time, to monitor overall CPU usage.
  6. Select the instance of the processor, for example, _Total_ for overall CPU usage, and click Add.
  7. Click OK. You'll see the CPU usage graph in real time in Performance Monitor.

 

Additional resources

 

Additional resources to help you better understand some of the performance tools, what they mean, and how to use them:

 

ESXTOP

 

ESXTOP quick overview

http://www.running-system.com/wp-content/uploads/2015/04/ESXTOP_vSphere6.pdf

ESXTOP metrics

https://www.virten.net/vmware/esxtop/

ESXTOP interpretation

https://communities.vmware.com/docs/DOC-9279

 

VMWare

 

VMWare KB: Troubleshooting ESXi virtual machine performance issues

https://kb.vmware.com/s/article/2001003

VMWare KB: Troubleshooting ESXi storage performance issues

https://kb.vmware.com/s/article/1008205

 

Hyper-V

 

       https://www.smikar.com/troubleshooting-hyper-v/

 

Questionnaire for performance escalation

 

Use these questions as a starting point for data that you should gather from the site if you suspect a DCE VM performance issue. If you open a case to diagnose this problem, technical support and engineering will request this data. Proactively gathering the data will help expedite issue resolution.

 

The questions are written to gain a better understanding of the environment hosting the DCE virtual machine. The goal is to understand the capabilities of the hypervisor, the storage supporting DCE, and resource utilization.

 

Configuration

 

VMWare

 

  • What version of VMware are you running on vCenter?
  • What version VMware are you running on the ESXi?
  • What VM Hardware version do you have applied to the DCE VM?
  • Are your ESXi hosts in a cluster supporting vMotion of your VMs for load balancing?
    • How many hosts are in the cluster?
    • Is DRS enabled such that VMs can migrate between ESXi?
    • How often is your DCE VM migrating?
  • Which ESXis are hosting the DCE VMs in question? (if multiple ESXis, please list)
  • What is the make, model, and hardware specs of the ESXi server?
    • Specifically interested in CPU type and quantity, RAM quantity
  • Are your DCE VMs configured with multiple drives?
    • If yes, are all the drives located on the same storage location?
  • Are there any resource limit restrictions being set on your DCE VM?
    • CPU Limits? CPU Shares?
    • Memory Limits? Memory Shares?
    • For all DCE VM disk drives: Disk Shares? Disk IOPs Limit?

 

Hyper-V

 

  • What version of Windows is running Hyper-V
  • What is the make, model, and hardware specs of the Hyper-V server?
    • Specifically interested in CPU type and quantity, RAM quantity
  • Are your DCE VMs configured with multiple drives?
    • If yes, are all the drives located on the same storage location?
  • Are there any resource limit restrictions being set on your DCE VM?
    • CPU Limits? CPU Shares?
    • Memory Limits? Memory Shares?
    • For all DCE VM disk drives: Disk Shares? Disk IOPs Limit?

 

Local Storage

 

  • Are the Hypervisors using local storage to run any of the VMs? If no, skip this section.
  • If VMs are using local hypervisor storage, what are the disk types (HDD / SSD)?
    • If HDD what are the RPM speed of disk?
    • If multiple disks are being used, what is the RAID scheme?
    • What is the size of your RAID Controller Cache?

 

Network Storage

 

If your DCE VM is leveraging network storage for their disk backing:

  • What are the make and model of the shared storage disk array?
  • What protocol is your network storage running (NFS / VMFS / SCSI )?
  • How many disks are there in the storage solution?
  • What are the drive types? SSD? HDD?
    • If HDD, what are the disk speeds?
    • How is the array provisioned (Single disk pool, multiple disk pools)?
      • If multiple pools, how many disks per pool?
    • What is the RAID configuration on the volume?
    • Is the DCE VM using an isolated volume or is it shared with other VMs?

 

Network Topology

 

Please describe the network topology where this DCE is deployed. Link speeds between nodes of the system are of specific interest.

 

Running System Data Collection

 

While running your typical DCE workload, use the esxtop tool to collect a snapshot of your system. Ideally, the collection should cover the period of time where you are experiencing the performance issue.

 

Esxtop collection

 

  1. Enable SSH and SSH to the ESXi server hosting DCE.
  2. Run the command: 
    esxtop -b -d 5 -n 360 -a |gzip >esxtopOutput.csv.gz 
  3. Monitor the ESXi for 30 minutes and create a report of all the performance counters.
  4. SCP the output from the ESXi and send to support.
 
Attachments
Was this article helpful? Yes No
No ratings

Link copied. Please paste this link to share this article on your social media post.

Didn't find what you are looking for? Ask our Experts
To The Top!

Forums

  • APC UPS Data Center Backup Solutions
  • EcoStruxure IT
  • EcoStruxure Geo SCADA Expert
  • Metering & Power Quality
  • Schneider Electric Wiser

Knowledge Center

Events & webinars

Ideas

Blogs

Get Started

  • Ask the Community
  • Community Guidelines
  • Community User Guide
  • How-To & Best Practice
  • Experts Leaderboard
  • Contact Support
Brand-Logo
Subscribing is a smart move!
You can subscribe to this board after you log in or create your free account.
Forum-Icon

Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.

Register today for FREE

Register Now

Already have an account? Login

Terms & Conditions Privacy Notice Change your Cookie Settings © 2025 Schneider Electric

This is a heading

With achievable small steps, users progress and continually feel satisfaction in task accomplishment.

Usetiful Onboarding Checklist remembers the progress of every user, allowing them to take bite-sized journeys and continue where they left.

of