2016年7月6日 星期三

Oracle on Nutanix ABS (Acropolis Block Service)


     Nutanix already release 4.7 last week when .NEXT event.  There are some new features that AOS 4.7 announced. Today , I upgrade my Nutanix NX-1465G4 AOS to version 4.7. The most exciting function I want to test is ABS. 

      I worked for EMC for 7 years. I knew all about SAN, iSCSI, NAS storage. When I was in EMC solution center , I need to maintain all demo environment. I have two brocade SAN switches, two MCdata and one MDS SAN switches. What’s trouble me is the zoning configurations . I need to maintain hundreds of zoning configuration. I need to check WWPN by myself. If something wrong , my demo servers won’t see its’ LUNs. Some of the demo environments will be infected . 

     After iSCSI storage come out , thing’s got easier. I didn’t need to maintain zoning information again. But I still need to take care about multi-path.  I need redundant paths to prevent one of the NICs failed. If I have two NICs in my server and I also have a iSCSI storage with two controller with two front-end ports . One LUN I may get 4-8 paths. So I need some MPIO to help me to manage those paths.  In EMC , will use Powerpath to manage multi-path. 

     Everything seems good , we have multiple path solutions ? But sometimes customers will ask do not install any software (Powerpath, DMP ….) on my servers. Then just one way to resolve this issue , using Native MPIO. 

   This issue just view from the front-end side . In the backend , VNX or other storage . They have active/passive configuration for a LUN. So you need to rebalance the LUNs in the back-end . When I was in VMware , I went an customer site in China . They complained about performance is sucks in VMware. When I check their EMC storage , I found all of their LUNs just using SPA as active , and SPB is standby. 

    Those issues are traditional SAN/iSCSI problem. Until now, traditional storages design still like this. 
   When Nutanix come out the ABS (Acropolis Block Services), I’m wondering why my company go back to support iSCSI SAN solution? Everyday we talked the disadvantage about SAN , and we go support physical servers connect to Nutanix.  After I check Josh Odgers blog ( http://www.joshodgers.com/2016/06/15/whats-next-2016-acropolis-block-services-abs/ ) article about ABS.  We made all iSCSI connections easier and simple then before.  All the ABS internal working knowledge are there. If you have any interesting in ABC, please go Josh Odgers blog.

   My role in Nutanix is BCA (Business Critical Application) , my responsibility is make sure all BCA can run well in Nutanix platform .  The first come to my mind is “high availability” . Most of my customers when they migrate their BCA to Nutanix , the first question is HA. Do Nutanix ABS provide high availability protection ? When the node failed, do Nutanix ABC can failover its LUNs to another node ? The answer is yes . 

   So I did this testing and using Oracle database as my critical application.  
Here is my testing spec- 
   - Nutanix NX-1465G4 with 256 GB RAM each node (Total 4 Nodes)
   - Oracle Linux 6.7 with Oracle 12c Installed
   - Oracle database using ASM configuration 
   - Nutanix AOS 4.7 
   - Swingbench (generate Oracle database loading) 

Scenario 1- Simulated one node failed (failover)
   I was running Swingbench with 300 users connection to generate load. When swingbench TPS keep stable running, then I shutdown Node D. You can see following video , there is no any error generate from swingbench . All the connections are still working fine. There is a short CPU IO wait when failover just few seconds. Node D’s LUN failover to Node A.


Scenario 2 - Simulated fail node back to cluster (failback) 
 Using scenario 1, I power-on the Node D. Simulate the Node D is repaired and join cluster again. You will see ABS will automatic fail back LUN from Node A to Node D. The loading will rebalance between nodes.


Conclusion- 
    ABS make iSCSI simple, and support physical machines connect. It reduce the multi-path issue in traditional iSCSI storage and also automatically load balance between nodes. 
When I shutdown Node D, it’s LUN failover to Node A. This make node A more loads. So you can see the response time and CPU IO wait is changed . But after Node D join back , the performance running as before. That means we can scale-out when your BCA need more I/O or more quick response time. Not like traditional storage , when you have bottleneck in controllers , you need to replace all the storage. Nutanix ABS will be your best iSCSI storage for your critical application. Make simple is the key to success !!

 Albert Chen 2016/07/06