Balancing speed and completeness, catering to diverse skill levels, avoiding vendor lock-in, and measuring success are key challenges in platform engineering.
Platform engineering is a specialized discipline that focuses on creating scalable, reliable, and efficient building platforms for developers. Unlike DevOps, which is more about the deployment and operation of applications, platform engineering is about building the underlying infrastructure and tools that developers use. Platform engineers are the unsung heroes, creating the tools that others need to be the best. In this article, we will explore some lessons learned from the journey of platform engineering, the top challenges faced by platform engineering teams, and strategies to overcome them.
Adoption vs. Completeness:
Problem: Our devs can’t wait a year
One of the biggest challenges for platform engineering teams is finding the right balance between building a comprehensive, feature-rich platform and getting it into the hands of users quickly. A platform that is too basic may not meet the needs of developers, while an overly complex platform can take too long to build and hinder early adoption. The key is to prioritize features that offer the most value to the largest number of users and roll them out in a way that encourages early engagement.
Solution: Focus on incremental value
Instead of aiming for a fully-featured platform from the start, focus on delivering incremental value to users. Identify the most pressing use cases and build features that solve those specific problems. This approach speeds up the development cycle and encourages early adoption. Users are more likely to engage with a platform that solves their immediate needs, even if it’s not yet feature-complete.
Diverse Backgrounds in Operations:
Problem: Not every dev is a ‘Nix nerd
A platform may be used by both novice and expert developers, requiring a design that caters to different skill levels without becoming too simplistic or too complicated. Creating a platform that is intuitive for beginners while offering depth and flexibility for experienced developers is a challenge.
Solution: Two-layer design
Consider implementing a two-layer API design to cater to a diverse user base. The foundational layer should provide the raw functionality needed for complex use cases, giving experienced developers the flexibility they seek. For beginners, a web GUI for basic tasks can be provided, while more experienced developers can utilize the command line for complete configuration. This approach ensures that all engineers have access to powerful tools without feeling left out.
Vendor Lock-In:
Problem: I like this vendor; I don’t “LIKE them like them”
Platform engineering teams often rely on third-party services, which can lead to vendor lock-in. This dependency can make it difficult to switch providers or adopt new technologies, limiting the platform’s adaptability.
Solution: Vendor abstraction or open source
To mitigate the risks of vendor lock-in, consider employing abstraction layers or wrappers around third-party services. This allows for switching vendors with minimal impact on the platform or its users. Another option is to adopt open source tools as the developer platform, ensuring that time spent on adoption is a good investment and avoiding the risk of rising SaaS costs.
Measuring Success:
Problem: It’s hard to prove the benefits of platform engineering
Determining the success of a platform is not straightforward. Traditional metrics like uptime or latency are important but don’t provide a complete picture. The challenge is to identify the right set of Key Performance Indicators (KPIs) and use them to guide ongoing improvements to the platform.
Solution: Consider DORA metrics
DORA metrics, developed by Google Cloud, provide a better way to measure the effectiveness of platform engineering. These metrics focus on developer velocity, measuring how easy it is for developers to write, test, and ship code. By answering questions about deployment frequency, lead time, change failure rate, and mean time to recovery, teams can gain insights into how well they are enabling developers.
Real-World Case Studies:
Stitch Fix, Uber, and Netflix
Real-world case studies provide valuable insights into successful platform engineering. Stitch Fix created a platform for data scientists, focusing on adoption rather than completeness. Uber’s platform engineering team handled ongoing concerns about technical debt and developer enablement, improving organizational efficiency. Netflix moved to polyglot development by creating a developer platform, allowing teams to work in the best language for their challenges.
Conclusion:
Platform engineering is a crucial discipline that empowers developers and improves organizational efficiency. Balancing speed and completeness, catering to diverse skill levels, avoiding vendor lock-in, and measuring success are key challenges. By focusing on incremental value, implementing a two-layer design, mitigating vendor lock-in, and using DORA metrics, platform engineering teams can overcome these challenges and create successful developer platforms.
Leave a Reply