U.S. Citizenship and Immigration Services (USCIS) was one of the first federal agencies to adopt cloud computing back in 2014. Much of the agency’s cloud movement was “lift and shift” – simply moving workloads as-is to the cloud – so IT operations and configurations were generally not optimized for the cloud.
To get products in the hands of end-users most quickly, USCIS opted for a decentralized governance approach that shifted most operations activities “left” to development teams. Finding the right balance between governance and developer autonomy is always a tricky balance. This approach benefitted USCIS in many ways, but one downside was “cloud sprawl” – the existence of over-provisioned, over-scheduled, underutilized, or orphaned cloud resources.
As a fee-funded agency, USCIS was hit hard by the COVID-19 pandemic. With USCIS offices closed, fee collection dropped precipitously, and cost savings became the agency’s number one priority to avoid furloughs and keep the agency operational. As detailed in a recent Meritalk story, USCIS leaders understood that addressing USCIS’s lack of cloud governance and standards was a likely area for achieving cost savings.
They needed to inventory and remediate cloud sprawl while continuing to allow staff to take advantage of the flexibility and scalability that cloud computing offers.
What is RCA?
To address cloud sprawl, USCIS Office of Information Technology leadership turned to Simple Technology Solutions (STS) and Robotic Cloud Automation (RCA).
RCA is a suite of serverless cloud automation solutions that use AWS managed services, native tagging capabilities, and Lambda scripts to:
1. Detect and monitor cloud sprawl
2. Report and notify affected parties; and
3. Automatically remediate
RCA establishes governance guardrails and enforces usage standards across the enterprise resulting in significant cost savings and better visibility into cloud spend and opportunities to control cost. In comparison to other managed services and third-party tools, RCA actually remediates instances/resources/environments that are over-provisioned, underutilized, orphaned, or generally out of compliance with agency standards.
Examples of How RCA Works
RCA works by automatically identifying the following orphaned resources:
● Elastic Block Storage (EBS) volumes
● Elastic Network Interfaces (ENI)
● Amazon Machine Images (AMI)
● Elastic Load Balancer (ELB)
● ElasticCache
● Snapshots
● Old instance types
For detected orphaned resources, account teams receive a notice via Simple Notification Service to address the issue. If nothing is done within 14 days, these resources are automatically shut down or deprovisioned. Leveraging Trusted Advisor, RCA also automatically identifies resources for rightsizing based on Memory, CPU, and disk space. Lastly, RCA applies the Data Lifecycle Tool to S3 buckets to ensure data is stored at the right tier based on archiving policy and usage.
“USCIS’s design-cost principles and policies are manifested in the Lambda scripts,” explained Aaron Kilinski, principal and chief technologist at STS. “Not only do the scripts ensure good cloud hygiene across the enterprise, but they also enable USCIS to take advantage of the operational agility and economic advantages of AWS’s consumption-based model.”
Results
Working with STS and AWS, USCIS leaders developed a cost-management strategy, as well as a 90-day Operation Cloud Control (OCC) project to realize immediate savings and establish processes to lock in those savings moving forward. RCA was the pinnacle of OCC. Moving to reserved instances alone saved more than $2.5 million during 2020-2021. In total, RCA and OCC projects saved more than $4 million during the same time frame.
RCA is a powerful standard and enforcement tool for any CIO managing a multi-tenant cloud environment. In addition to reducing cloud waste and establishing governance guardrails, RCA also establishes and enforces a “cloud hygiene baseline” to avoid future cloud sprawl and non-compliance.
Right now, STS is offering special, end-of-fiscal-year pricing on RCA proof of concept pilot programs. These can be executed quickly and deliver a report detailing where and how you can recover wasted cloud costs.