疑難排解Hyperflex叢集中的效能問題

已更新: 2023 年 7 月 26 日

文件 ID:220651

無偏見用語

本產品的文件集力求使用無偏見用語。針對本文件集的目的，無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言，或引用第三方產品的語言，因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。

關於此翻譯

思科已使用電腦和人工技術翻譯本文件，讓全世界的使用者能夠以自己的語言理解支援內容。請注意，即使是最佳機器翻譯，也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責，並建議一律查看原始英文文件（提供連結）。

簡介

本文檔從來賓虛擬機器(VM)、ESXi主機和(SCVM)的角度介紹hyperflex環境中的效能影響

識別

要對Hyperflex環境中的效能進行故障排除，必須確定群集的型別、效能降級的操作、效能降級的頻率以及導致效能降級的效能影響級別。

在hyperflex集群中，在訪客VM、ESXI主機級別和儲存控制器VM級別存在多個影響級別。

群集型別

●混合節點：使用固態驅動器(SSD)進行快取，使用硬碟進行容量層。

● 全快閃記憶體節點：使用SSD驅動器或非易失性Memory Express(NVMe)儲存進行快取，使用SSD驅動器進行容量層。

● 全NVMe節點：使用NVMe儲存進行快取，使用容量層全NVMe節點通過快取為要求最苛刻的工作負載提供最高效能

效能圖表說明

Hyperflex系統具有監視效能的功能，圖表顯示存儲群集的讀寫效能。

IOPS

每秒輸入/輸出操作(IOPS)是用於測量電腦儲存裝置（包括HDD）的通用效能指標。此度量用於評估隨機I/O工作負載的效能。

IOPS performance chart. IOPS效能圖表。

吞吐量

該圖顯示了儲存群集中的資料傳輸速率（以Mbps為單位）。

Throughput performance chart. 吞吐量效能圖表。

延遲

延遲是衡量單個I/O請求完成所需的時間。發出請求和接收響應之間的持續時間（以毫秒為單位）。

Latency performance chart. 延遲效能圖表。

頻率

必須定義效能影響的頻率和持續時間，以審查對環境的可能影響。

如果效能一直受到影響，則需要檢查效能開始降低的位置，並檢查群集之間的任何配置更改或問題。

如果效能會間歇性影響，則需要檢查當時是否有操作或服務正在運行。

外部因素

群集的效能可能受到外部因素的影響，如快照和備份操作。

檢視以下連結以瞭解有關外部因素的更多資訊：

VMware vSphere快照：效能和最佳做法。

Cisco HyperFlex系統和Veeam備份和複製白皮書。

確定訪客VM級別的效能問題

這是hyperflex環境中最明顯的影響級別，它直接影響VM正在提供的服務，對直接受影響的使用者來說更是如此。

以下是用於識別常見作業系統上的效能的常見測試。

Windows

檢視可用工具以確定Windows來賓虛擬機器中的效能問題：

效能監控器

資源監視器

ESXi

在確定效能影響並檢視效能降低的可能原因之後，會進行一些效能檢查以提高效能。

檢查過度調配（分配給所有VM的vCPU總數不能超過ESXi主機上的可用物理核心總數）。
即使訪客作業系統不使用某些vCPU，使用這些vCPU配置VM仍會對ESXi帶來一些小型資源要求，這些要求轉換為主機上的實際CPU消耗。
過度分配記憶體還會不必要地增加VM記憶體開銷，並可能導致內存爭用，特別是在使用預留空間時。
驗證Balloon驅動器是否未保留記憶體的保留狀態，有關詳細資訊，請參閱此連結。

複習以排除ESX/ESXi虛擬機器效能問題。

PVSCSI檢查

半虛擬SCSI(PVSCSI)介面卡是高效能儲存介面卡，可為具有高磁碟IO要求的虛擬機器帶來更高的吞吐量和CPU利用率，建議使用PVSCSI適配器。PVSCSI控制器是一種支援虛擬化的高效能SCSI介面卡，能夠以最低的CPU開銷實現儘可能最低的延遲和最高吞吐量。

PVSCSI adapter. PVSCSI介面卡。

網路介面卡檢查

VMXNET 3是一種半虛擬化NIC，專為提高效能而設計，提供在現代網路中常用的高效能功能，例如巨型幀、多隊列支援（在Windows中也稱為接收端擴展）、IPv6解除安裝以及MSI/MSI-X中斷交付和硬體解除安裝。

確保介面卡型別為VMXNET3。

Network adapter. 網路介面卡。

RSS檢查

注意：此檢查僅適用於運行Windows作業系統的來賓虛擬機器。

接收端擴展(RSS)是一種網路驅動程式技術，可在多處理器系統中的多個CPU上有效分配網路接收處理。

Windows伺服器具有驅動程式配置，可在多個CPU之間分配核心模式網路處理負載。

驗證是否已啟用，在Windows PowerShell上運行此命令：

netsh interface tcp set global rss=enabled

為了啟用RSS檢視此鏈接

CPU熱插拔檢查

CPU熱插拔功能使VM管理員能夠將CPU新增到VM中，而無需關閉電源。這樣可以即時新增CPU資源，而不會中斷服務。在VM上啟用CPU熱插拔時，禁用vNUMA功能。

CPU hot plug disabled. 已禁用CPU熱插拔。

檢視通用作業系統和應用程式的最佳實踐：

Windows.

Windows Server 2022效能調整准則。

紅帽子。

通過優先順序和關聯性提高Linux進程效能的3個提示。

SQL Server。

在VMware上構建Microsoft SQL Server。

紅帽子。

效能調整指南。

確定主機級別的效能問題

要確定主機級別的效能影響，您可以檢視ESXI主機內建於ESXI虛擬機器監控程式中的效能圖表，並檢查受影響的主機數量。

您可以在vCenter的「監視器」頁籤中檢視效能圖表，按一下「效能」頁籤。

vCenter performance charts. vCenter效能圖表。

在這些圖表中，您可以檢視與CPU、記憶體和磁碟相關的效能圖表。請參閱此連結以瞭解圖表。

注意:CRC錯誤和MTU不匹配（尤其是在儲存網路中）會導致延遲問題。儲存流量必須使用巨型幀。

儲存I/O控制和隊列深度檢查

儲存I/O控制(SIOC)用於控制虛擬機器的I/O使用情況，並逐步實施預定義的I/O共用級別，這是在Hyperflex群集中禁用此功能所必需的。

Queue depth是儲存資源可隨時處理的暫掛輸入/輸出(I/O)請求數。

您可以使用這些步驟驗證SIOC是否已禁用，以及隊列深度配置。

確認SIOC在ESXi上運行並且隊列深度配置

步驟 1.通過SSH連線到HX ESXi主機，然後發出命令以列出資料儲存。

[root@] vsish -e ls /vmkModules/nfsclient/mnt
encrypted_app/
Prod/                                        <----- Datastore name 
Dev/
App/

步驟 2.使用資料儲存名稱並發出命令。

vsish -e get /vmkModules/nfsclient/mnt/
     
     
       /properties [root@] vsish -e get /vmkModules/nfsclient/mnt/Prod/properties mount point information { volume name:Prod server name:7938514614702552636-8713662604223381594 server IP:127.0.0.1 server volume:172.16.3.2:Prod UUID:63dee313-dfecdf62 client src port:641 busy:0 socketSendSize:1048576 socketReceiveSize:1048576 maxReadTransferSize:65536 maxWriteTransferSize:65536 reads:0 readsFailed:0 writes:285 writesFailed:0 readBytes:0 writeBytes:10705 readTime:0 writeTime:4778777 readSplitsIssued:0 writeSplitsIssued:285 readIssueTime:0 writeIssueTime:4766494 cancels:0 totalReqsQueued:0 metadataReqsQueued(non IO):0 reqsInFlight:0 readOnly:0 hidden:0 isPE:0 isMounted:1 isAccessible:1 unstableWrites:0 unstableNoCommit:0 maxQDepth:1024 <-------- Max Qdepth configuration iormState:0 <-------- I/O control disabled latencyThreshold:30 shares:52000 podID:0 iormInfo:0 NFS operational state: 0 -> Up enableDnlc:1 closeToOpenCache:0 highToAvgLatRatio:10 latMovingAvgSmoothingLevel:2 activeWorlds:55 inPreUnmount:0 }

步驟 3.在輸出中查詢行

iormState:0 0= disabled 2= enabled

行maxQDepath必須是1024

步驟 4.必須為其餘的Datastore重複相同的步驟

禁用SIOC

要禁用SIOC，請運行以下步驟。

步驟 1.使用HTML客戶端登入到vsphere。

步驟2.從下拉選單中選擇「儲存」，然後在左側窗格中選擇適用的HX資料儲存。

Select datastore. 選擇datastore。

步驟 3. 在Datastore的右窗格頂部部分，選擇configure頁籤。

Configure tab. 「配置」頁籤。

步驟 4. 在右側窗格中間的「更多」下，選擇「常規」，然後在右側向下滾動到「DataStore功能」，然後按一下「編輯」

Edit datastore capabilities. 編輯datastore功能。

如果未選中Disable Storage I/O Control and Statistics collection單選按鈕，請選中它。

Disable storage I/O control. 禁用儲存I/O控制。

如果選中了「禁用儲存I/O控制和統計資訊收集」單選按鈕，則在「啟用儲存I/O控制」和「統計資訊收集」之間切換，「禁用儲存I/O控制和統計資訊收集」。

Storage I/O control disabled. 已禁用儲存I/O控制。

步驟 5.根據需要對所有其他資料儲存重複步驟1到4。

修改MaxQDepth

要修改maxQDepath，請對每個資料儲存發出下一個命令。

vsish -e set /vmkModules/nfsclient/mnt/
     
     
       /properties maxQDepth 1024

檢查Rx_no_Buff

具有大量網路流量的Hyperflex伺服器或具有微突發的網路流量可能導致以rx_no_bufs形式出現的資料包丟失。

要識別此問題，請在ESXi主機上運行這些命令以檢查rx_no_buf計數器。

/usr/lib/vmware/vm-support/bin/nicinfo.sh | egrep "^NIC:|rx_no_buf"
NIC: vmnic0
rx_no_bufs: 1
NIC: vmnic1
rx_no_bufs: 2
NIC: vmnic2
rx_no_bufs: 2
NIC: vmnic3
rx_no_bufs: 71128211 <---------Very high rx_no_bufs counter
NIC: vmnic4
rx_no_bufs: 1730
NIC: vmnic5
rx_no_bufs: 897
NIC: vmnic6
rx_no_bufs: 24952
NIC: vmnic7
rx_no_bufs: 2

請等待幾分鐘，然後再次運行該命令，並檢查rx_no_bufs計數器是否沒有增加。

如果這些計數器的值較低(< 1,000)，則由於預設隊列配置導致的資料包丟失很少，可能不需要進行最佳化。
如果這些計數器的值較高(> 10,000)，則由於此隊列配置，將會產生一些影響，並且調整可能有所幫助。
如果這些計數器的數量非常高(> 1,000,000)，則影響會比較顯著，強烈建議增加隊列。
如果rx_no_bufs正在主動遞增，則意味著資料包通過網路一直到達虛擬化層，然後丟棄該資料包。

如果您看到這些值的計數器，請與Cisco TAC聯絡以調整vNIC配置以獲得更好的效能。

檢查ESXI級別的最佳實踐和其他檢查。

VMware vSphere 7.0的效能最佳實踐。

確定效能問題儲存控制器虛擬機器(SCVM)級別

群集運行狀況

驗證群集是否正常。

hxshell:~$ sysmtool --ns cluster --cmd healthdetail
Cluster Health Detail:
---------------------:
State: ONLINE                       <---------- State of the cluster 
HealthState: HEALTHY                <---------- Health of the cluster 
Policy Compliance: COMPLIANT
Creation Time: Tue May 30 04:48:45 2023
Uptime: 7 weeks, 19 hours, 45 mins, 51 secs
Cluster Resiliency Detail:
-------------------------:
Health State Reason: Storage cluster is healthy.
# of nodes failure tolerable for cluster to be fully available: 1
# of node failures before cluster goes into readonly: NA
# of node failures before cluster goes to be crticial and partially available: 3
# of node failures before cluster goes to enospace warn trying to move the existing data: NA
# of persistent devices failures tolerable for cluster to be fully available: 2
# of persistent devices failures before cluster goes into readonly: NA
# of persistent devices failures before cluster goes to be critical and partially available: 3
# of caching devices failures tolerable for cluster to be fully available: 2
# of caching failures before cluster goes into readonly: NA
# of caching failures before cluster goes to be critical and partially available: 3
Current ensemble size: 3
Minimum data copies available for some user data: 3
Minimum cache copies remaining: 3
Minimum metadata copies available for cluster metadata: 3
Current healing status:
Time remaining before current healing operation finishes:
# of unavailable nodes: 0

hxshell:~$

此輸出顯示由於節點不可用而導致群集不正常。

hxshell:~$ sysmtool --ns cluster --cmd healthdetail
Cluster Health Detail:
---------------------:
State: ONLINE                   <-------State of the cluster
HealthState: UNHEALTHY          <-------Health of the cluster 
Policy Compliance: NON-COMPLIANT
Creation Time: Tue May 30 04:48:45 2023
Uptime: 7 weeks, 19 hours, 55 mins, 9 secs
Cluster Resiliency Detail:
-------------------------:
Health State Reason: Storage cluster is unhealthy.Storage node 172.16.3.9 is unavailable.                  <----------- Health state reason
# of nodes failure tolerable for cluster to be fully available: 0
# of node failures before cluster goes into readonly: NA
# of node failures before cluster goes to be crticial and partially available: 2
# of node failures before cluster goes to enospace warn trying to move the existing data: NA
# of persistent devices failures tolerable for cluster to be fully available: 1
# of persistent devices failures before cluster goes into readonly: NA
# of persistent devices failures before cluster goes to be critical and partially available: 2
# of caching devices failures tolerable for cluster to be fully available: 1
# of caching failures before cluster goes into readonly: NA
# of caching failures before cluster goes to be critical and partially available: 2
Current ensemble size: 3
Minimum data copies available for some user data: 2
Minimum cache copies remaining: 2
Minimum metadata copies available for cluster metadata: 2
Current healing status: Rebuilding/Healing is needed, but not in progress yet. Warning: Insufficient node or space resources may prevent healing. Storage Node 172.16.3.9 is either down or initializing disks.
Time remaining before current healing operation finishes:
# of unavailable nodes: 1

hxshell:~$

此輸出顯示了由於重建導致的不正常群集。

Cluster Health Detail:
---------------------:
State: ONLINE
HealthState: UNHEALTHY
Policy Compliance: NON-COMPLIANT
Creation Time: Tue May 30 04:48:45 2023
Uptime: 7 weeks, 20 hours, 2 mins, 4 secs
Cluster Resiliency Detail:
-------------------------:
Health State Reason: Storage cluster is unhealthy.
# of nodes failure tolerable for cluster to be fully available: 1
# of node failures before cluster goes into readonly: NA
# of node failures before cluster goes to be crticial and partially available: 2
# of node failures before cluster goes to enospace warn trying to move the existing data: NA
# of persistent devices failures tolerable for cluster to be fully available: 1
# of persistent devices failures before cluster goes into readonly: NA
# of persistent devices failures before cluster goes to be critical and partially available: 2
# of caching devices failures tolerable for cluster to be fully available: 1
# of caching failures before cluster goes into readonly: NA
# of caching failures before cluster goes to be critical and partially available: 2
Current ensemble size: 3
Minimum data copies available for some user data: 3
Minimum cache copies remaining: 2
Minimum metadata copies available for cluster metadata: 2
Current healing status: Rebuilding is in progress, 58% completed.
Time remaining before current healing operation finishes: 18 hr(s), 10 min(s), and 53 sec(s)
# of unavailable nodes: 0

這些命令顯示群集運行狀況的整體摘要，並讓您知道是否有影響群集運行的因素，例如，是否有列入黑名單的磁碟、離線節點，或者群集是否正在恢復。

參與輸入/輸出的節點

效能可能會受到未參與輸入和輸出操作的節點的影響，要檢查參與I/O的節點，請發出以下命令。

提示：從5.0(2a)版本起，diag user可用於允許使用者具有更多許可權，以通過在Hyperflex版本4.5.x中引入的priv命令列無法訪問的受限資料夾和命令進行故障排除。

步驟 1.在儲存控制器VM上進入diag shell。

hxshell:~$ su diag
Password:
 _   _ _                      _  _             _____ _                      ___
| \ | (_)_ __   ___          | || |           |  ___(_)_   _____           / _ \ _ __   ___
|  \| | | '_ \ / _ \  _____  | || |_   _____  | |_  | \ \ / / _ \  _____  | | | | '_ \ / _ \
| |\  | | | | |  __/ |_____| |__   _| |_____| |  _| | |\ V /  __/ |_____| | |_| | | | |  __/
|_| \_|_|_| |_|\___|            |_|           |_|   |_| \_/ \___|          \___/|_| |_|\___|


Enter the output of above expression: -1
Valid captcha

步驟2.發出此命令以驗證參與I/O操作的節點，IP的數量必須等於群集上的已收斂節點數量。

diag# nfstool -- -m | cut -f2 | sort | uniq
172.16.3.7
172.16.3.8
172.16.3.9

內部服務檢查

清潔器

Cleaner的主要目標之一就是識別系統中的死儲存塊和活動儲存塊，並刪除死儲存塊，釋放它們所佔用的儲存空間。這是一個後台工作，其攻擊性是根據策略設定的。

通過發出下一個命令，可以檢查清除程式服務。

bash-4.2# stcli cleaner info
{ 'name': '172.16.3.7', 'id': '1f82077d-6702-214d-8814-e776ffc0f53c', 'type': 'node' }: OFFLINE                <----------- Cleaner shows as offline 
{ 'name': '172.16.3.8', 'id': 'c4a24480-e935-6942-93ee-987dc8e9b5d9', 'type': 'node' }: OFFLINE
{ 'name': '172.16.3.9', 'id': '50a5dc5d-c419-9c48-8914-d91a98d43fe7', 'type': 'node' }: OFFLINE

若要啟動清除程式進程，請發出以下命令。

bash-4.2# stcli cleaner start                                                                                  
WARNING: This command should be executed ONLY by Cisco TAC support as it may have very severe consequences. Do you want to proceed ? (y/n): y
bash-4.2# stcli cleaner info
{ 'type': 'node', 'id': '1f82077d-6702-214d-8814-e776ffc0f53c', 'name': '172.16.3.7' }: ONLINE
{ 'type': 'node', 'id': 'c4a24480-e935-6942-93ee-987dc8e9b5d9', 'name': '172.16.3.8' }: ONLINE
{ 'type': 'node', 'id': '50a5dc5d-c419-9c48-8914-d91a98d43fe7', 'name': '172.16.3.9' }: ONLINE           <---------All nodes need to be online
bash-4.2#

注意：執行此命令必須經過Cisco TAC批准。

再平衡

定期重新平衡儲存群集。它用於在可用儲存更改之間重新調整儲存資料的分佈，並恢復儲存群集的運行狀況。

重新平衡在集群中運行，原因不同：

物理資源（節點/磁碟）已關閉，HX正在將這些Vnode重新定位到群集中的其他物理資源。
整個群集中的單個驅動器並未全部獲得可比較的利用，因此在HX群集中的資料可用性（資料配置）方面建立了一些熱點。
即使群集運行正常，如果區域合規性不存在，也可以運行重新平衡。
將新節點新增到現有群集後，新增的節點在加入現有群集後立即執行新寫入。

驗證群集是否已啟用重新平衡。

hxshell:~$ stcli rebalance status
rebalanceStatus:
    percentComplete: 0
    rebalanceState: cluster_rebalance_not_running
rebalanceEnabled: True     <---------Rebalance should be enabled 
hxshell:~$

注意：任何與再平衡相關的操作都必須經過Cisco TAC批准。

磁碟故障

為正確操作，群集必須沒有任何列入黑名單的磁碟或離線資源。

您需要檢查HX Connect介面中的群集上是否有任何列入黑名單的磁碟。

Blacklisted disk. 已列入黑名單的磁碟。

在CLI上檢查每個匯聚節點上的任何離線資源。

sysmtool --ns cluster --cmd offlineresources 
UUID                                Type         State      InUse      Last modified            
----                                ----         -----      -----      -------------            
000cca0b019b4a80:0000000000000000   DISK         DELETED    YES          <------- Offline disk                       
5002538c405e0bd1:0000000000000000   DISK         BLOCKLISTED NO          <------- Blacklisted disk                         
5002538c405e299e:0000000000000000   DISK         DELETED    NO                                  
Total offline resources: 3, Nodes: 0, Disks: 3

驗證是否有任何列入黑名單的資源。

hxshell:~$ sysmtool --ns disk --cmd list | grep -i blacklist
Blacklist Count: 0
Blacklist Count: 0
Blacklist Count: 0
Blacklist Count: 0
State: BLACKLISTED
Blacklist Count: 5
Blacklist Count: 0
Blacklist Count: 0

您需要使用此命令檢查每個收斂節點中是否有任何故障磁碟。

admin:~$ cat /var/log/springpath/diskslotmap-v2.txt
0.0.1:5002538e000d59a3:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302248:HXT76F3Q:SATA:SSD:3662830:Inactive:/dev/sdj    <---------Inactive disk
1.0.2:5002538c40be79ac:Samsung:SAMSUNG_MZ7LM240HMHQ-00003:S4EGNX0KC04551:GXT51F3Q:SATA:SSD:228936:Active:/dev/sdb
1.0.3:5002538e000d599e:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302243:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdc
1.0.4:5002538e000d59a0:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302245:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdd
1.0.5:5002538e000eb00b:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302480:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdi
1.0.6:5002538e000d599b:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302240:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdf
1.0.7:5002538e000d57f6:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M301819:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdh
1.0.8:5002538e000d59ab:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302256:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sde
1.0.9:5002538e000d59a1:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M302246:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdg
1.0.10:5002538e0008c68f:Samsung:SAMSUNG_MZ7LH3T8HMLT-00003:S4F3NY0M200500:HXT76F3Q:SATA:SSD:3662830:Active:/dev/sdj
0.1.192:000cca0b01c83180:HGST:UCSC-NVMEHW-H1600:SDM000026904:KNCCD111:NVMe:SSD:1526185:Active:/dev/nvme0n1
admin:~$

沒有任何磁碟故障的節點示例。

hxshell:~$ sysmtool --ns cluster --cmd offlineresources
No offline resources found              <-------- No offline resources 

hxshell:~$ sysmtool --ns disk --cmd list | grep -i blacklist
hxshell:~$                              <-------- No blacklisted disks
hxshell:~$ cat /var/log/springpath/diskslotmap-v2.txt
1.14.1:55cd2e404c234bf9:Intel:INTEL_SSDSC2BX016T4K:BTHC618505B51P6PGN:G201CS01:SATA:SSD:1526185:Active:/dev/sdc
1.14.2:5000c5008547c543:SEAGATE:ST1200MM0088:Z4009D7Y0000R637KMU7:N0A4:SAS:10500:1144641:Active:/dev/sdd
1.14.3:5000c5008547be1b:SEAGATE:ST1200MM0088:Z4009G0B0000R635L4D3:N0A4:SAS:10500:1144641:Active:/dev/sde
1.14.4:5000c5008547ca6b:SEAGATE:ST1200MM0088:Z4009F9N0000R637JZRF:N0A4:SAS:10500:1144641:Active:/dev/sdf
1.14.5:5000c5008547b373:SEAGATE:ST1200MM0088:Z4009GPM0000R634ZJHB:N0A4:SAS:10500:1144641:Active:/dev/sdg
1.14.6:5000c500854310fb:SEAGATE:ST1200MM0088:Z4008XFJ0000R6374ZE8:N0A4:SAS:10500:1144641:Active:/dev/sdh
1.14.7:5000c50085424b53:SEAGATE:ST1200MM0088:Z4008D2S0000R635M4VF:N0A4:SAS:10500:1144641:Active:/dev/sdi
1.14.8:5000c5008547bcfb:SEAGATE:ST1200MM0088:Z4009G3W0000R637K1R8:N0A4:SAS:10500:1144641:Active:/dev/sdj
1.14.9:5000c50085479abf:SEAGATE:ST1200MM0088:Z4009J510000R637KL1V:N0A4:SAS:10500:1144641:Active:/dev/sdk
1.14.11:5000c5008547c2c7:SEAGATE:ST1200MM0088:Z4009FR00000R637JPEQ:N0A4:SAS:10500:1144641:Active:/dev/sdl
1.14.13:5000c5008547ba93:SEAGATE:ST1200MM0088:Z4009G8V0000R634ZKLX:N0A4:SAS:10500:1144641:Active:/dev/sdm
1.14.14:5000c5008547b69f:SEAGATE:ST1200MM0088:Z4009GG80000R637KM30:N0A4:SAS:10500:1144641:Active:/dev/sdn
1.14.15:5000c5008547b753:SEAGATE:ST1200MM0088:Z4009GH90000R635L5F6:N0A4:SAS:10500:1144641:Active:/dev/sdo
1.14.16:5000c5008547ab7b:SEAGATE:ST1200MM0088:Z4009H3P0000R634ZK8T:N0A4:SAS:10500:1144641:Active:/dev/sdp  <------All disks are active
hxshell:~$

可用記憶體

使用此命令檢查可用記憶體，可用記憶體必須大於2048 MB(可用內存+快取)。

hxshell:~$ free –m                       
              total        used        free      shared  buff/cache   available
Mem:       74225624    32194300    38893712        1672     3137612    41304336
Swap:             0           0           0
hxshell:~$

如果可用+快取記憶體小於2048，則有必要標識正在生成「記憶體不足」條件的進程。

注意：您可以使用top命令識別消耗大量記憶體的進程，但是，任何更改都必須經過TAC批准，請與Cisco TAC聯絡以排除OOM條件故障。

空間結束條件

在HX Connect容量檢視中，儲存群集空間利用率的最佳實踐是不超過76%。HX Connect容量檢視的使用率超過76%，將導致效能下降。

如果儲存群集遇到ENOSPC情況，清理程式將自動以高優先順序運行，這會在群集中產生效能問題，優先順序由群集空間使用情況決定。

如果儲存群集達到ENOSPC WARN條件，清除程式會通過增加I/O數量來收集具有ENOSPC設定條件的垃圾，從而增加其強度，它以最高優先順序運行。

您可以使用此命令檢查群集上的ENOSPCINFO狀態。

hxshell:~$ sysmtool --ns cluster --cmd enospcinfo
Cluster Space Details:
---------------------:
Cluster state: ONLINE
Health state: HEALTHY
Raw capacity: 42.57T
Usable capacity: 13.06T
Used capacity: 163.08G
Free capacity: 12.90T
Enospc state: ENOSPACE_CLEAR    <--------End of space status
Space reclaimable: 0.00
Minimum free capacity
required to resume operation: 687.12G
Space required to clear
ENOSPC warning: 2.80T           <--------Free space until the end of space warning appears 
Rebalance In Progress: NO
Flusher in progress: NO
Cleaner in progress: YES
Disk Enospace: NO

hxshell:~$

檢視Cisco HyperFlex白皮書中的「容量管理」，確定管理Hyperflex群集空間的最佳實踐。

效能圖表故障排除

有時hyperflex效能圖表不顯示資訊。

Hyperflex performance charts. Hyperflex效能圖表。

如果遇到此行為，則需要檢查統計資訊服務是否正在群集中運行。

hxshell:~$ priv service carbon-cache status
carbon-cache stop/waiting

hxshell:~$ priv service carbon-aggregator status
carbon-aggregator stop/waiting

hxshell:~$ priv service statsd status
statsd stop/waiting

如果進程未運行，請手動啟動服務。

hxshell:~$ priv service carbon-cache start
carbon-cache start/running, process 15750

hxshell:~$ priv service carbon-aggregator start
carbon-aggregator start/running, process 15799

hxshell:~$ priv service statsd start
statsd start/running, process 15855