Client For :
Insurer
Service :
Technical
Overview:
Join me as I share my firsthand experience and insights gained in navigating the complexities of financial operations (FinOps) Hashtag#FinOps Hashtag#Cloud Hashtag#FinOpsEngineering Hashtag#ITServiceManagement
FinOps Journey
Join me as I share my firsthand experience and insights gained in navigating the complexities of financial operations (FinOps).
After migrating from on premise to cloud, cost advisors recommended significant cost optimisation opportunities. However, this revelation sparked a critical question: How much of this substantial sum is realistically achievable within the current setup, restrictions, and other limitations?
We concentrated on the following areas because we believed they offered the most significant benefits:
Shutting down unused or idle instances on all stages
Right-sizing instances, especially in development environments
Purchasing reservation plans for cloud resources
In addition, we explored further possibilities for cost optimization, including:
Automating the identification and management of underutilized instances
Conducting a thorough cleanup of unattached disks
Deleting inactive databases and streamlining network IPs
Removing unused load balancers
Selecting alternative disk classes
Decreasing data traffic costs
Utilizing elastic volumes
We conducted a comprehensive FinOps survey seeking to gain a holistic understanding of the cloud landscape. The survey was conducted on applications that were operated on various public clouds. The services used are mainly around infrastructure such as VMs, network, storage, backup, etc. The diversity of applications varies, from business-critical to non-business-critical.
Purchasing Reservations
1/3 of applications (apps) plan to move within the next years to another target platform. These figures reminded us to be careful not overbuying reservation on specific CSP services but have a well-established Business Capacity Management in place.
Right Sizing
1/3 of all apps have supplier requirements. Additionally, a few apps have limitations due to CPU/RAM usage, emphasizing the impact of resource constraints on app performance. Moreover, 1/4 of apps have specific I/O throughput requirements, highlighting the importance of efficient data transfer for optimal app performance.
Start Stop Overall incl. Production and Pre-Production
In summary, it has been found that 3/4 of apps have various limitations to be stopped. But half of apps utilize VMs only for testing purpose, most of them being launchable on demand. However, only 1/4 of all apps operate without any restrictions to stop their services.
Start Stop Development Environment
In smaller regions, there might be a risk that some resources are not available in larger volumes at a specific time without a capacity reservation. Therefore, we wondered what would happen if teams had no dev environment. Overall, 3/4 of teams could go on for more than 2 days without dev. The impact would rather be on the rise of the backlog.
Let’s summarize the key elements to set right saving opportunity expectations in your company:
Ensure clear Business Capacity Management is in place to understand your application lifecycle for purchasing the right number of reservations. Also, consider the return on investment within your risk management strategy. Purchasing reservation plans offers an attractive return on investment, which compensates for the risk of over-purchasing reservations.
In theory its recommended to right size first resources before purchasing reservation plans. Practically this is difficult as right sizing needs to be done accurately with heavy involvement of the DevOps team to ensure operational stability of the applications. Therefore, it is recommended to buy reservations first but added a margin into the commitments.
Most standard software has minimum hardware requirements for their applications. Consider this when evaluating recommendations that appear promising, as there is a risk of losing warranty for your standard software.
Some applications are RAM or CPU focused, which is maybe not consider in recommendations. This can lead to degraded performance due to a lack of RAM or CPU when you downsize a VM which has for example low idling on CPU but high on RAM. So choose the right VM family.
For 'burstable' VM family types, try to understand exactly the terms and conditions.
Before halting your cloud services ensure you have checked the various limitations and their impact on other services. However, all teams have VMs that are only used sporadically without any restrictions. These VMs are quick wins to start the optimization.
Depending on where your cloud services are hosted, there might be a risk of non-availability of services on a specific weekday if demands exceed the capacities of smaller regional data centers. Although this may seem highly improbable, the discussions around this topic were very valuable and created a sense of risk management appetite within an organization.
Finally, I would like to address the question of how much cost optimization was achievable compared to the proposed options. Due to the survey and effective measures implemented, we were able to implement around half of the recommendation, mainly on right sizing on development environment and with the purchase of reservations over all stages. Of course, with additional resources and expertise there are more opportunities to implement the recommendations further up. Also important is, that FinOps principles must be implemented at the design phase of your development process.
Let me know about your experiences and let’s connect via LinkedIn for a further knowhow exchange.
Daniel Berger