Securing Netflix Studio on Scale Netflix TechBlog

Once we’ve worked together to integrate our SSO with Wall-E, we’ve established a pretty exciting pattern of adding security requirements as filters. We thought back to our checklist through the lens: which of these is compatible enough between applications to add as a necessary filter? Our web application firewall (WAF), DDoS prevention, security header verification, and sustainable logging all fit the bill. One by one, we’ve seen our checklist requirements dust off, and from ‘individual app developer-owned’ to ‘wall-to-owned’ (and consistently implemented!).

At this point, it was clear that we had achieved the vision at the original request of the AppSec team. We’ve finally been able to add so much security leverage to Wall-E The lion’s share of the “Internet-facing” checklist for studio applications is boiled down to one item: Will you use Wall-E?

A small part of our Go-External Security Questionnaire and checklist for studio apps before and after Wall-E.

Early adoption of Wall-E was selected by the Application Security team. At the time, the Cloud Gateway team had to work closely with application developers to provide seamless migration without disturbing users. This joint effort took weeks on both sides. During our initial consultation, it was clear that developers preferred to prioritize product work over security or infrastructure improvements. Our meetings usually end like this: “Security is advised that we talk to you, and we like the idea of ​​improving our security posture, but we aim to meet the goal. Let’s talk again next quarter.” There were problems that we knew we needed to overcome in order to meet this initial adoption challenge:

  1. Setting up Wall-E for an application took a lot of time and effort, and the hands-on method won’t scale.
  2. In Netflix’s “no context control” culture, security improvements alone weren’t enough to get organic.

We were under pressure to improve our adoption numbers and decided to focus on setup friction first by improving the developer experience and automating the onboarding process.

Developers in the Netflix streaming world create customer-friendly Netflix experiences from hundreds of microservices, which can be reached by complex routing rules. Next to Netflix Studio, in content engineering, each team develops unique products with easy routing needs. To support that different model, we did another thing that seemed simple at the time but over the years it has had a big impact: we asked app teams to integrate with us by creating a version-controlled YAML file. Originally intended as a simplified and developer-friendly way to collect domain names and convert some routing rules into a versatile package, but we quickly realized that we had stumbled upon a powerful model: we were collecting developers Intention.

An interactive Wall-E configuration wizard, and a brief declaration format for routing, resource, and authentication decisions for an application

This small change was a kind of magic, and completely reversed our relationship with the development teams: since we had a short, standard definition of the app they wanted to release, we could actively automate many setups. Specify a domain name? Wall-E can ensure that it exists automatically, that DNS and TLS are configured correctly. Repetition of this experience eventually leads to other intent-based streamlining, such as asking about the intended user population and related applications (to select OAuth configurations and claims). We can now tell developers that installing Wall-E will only take a few minutes and our tooling will automate everything.

As all of these pieces come together, teams of apps are spotted outside the studio. For a typical paved road application without any unusual security complications, a team can go from “git init” to a production-ready, fully authenticated, Internet accessible application in less than 10 minutes.. Automation of infrastructure setup saves developers days, if not weeks, by reducing risk enough to integrate security reviews Every application. The developers didn’t necessarily care that the main motivating factor was security: what they saw in practice was that Wall-E user apps could come to users quickly and repeat quickly.

This has created a quality cycle that makes the core engineering product teams incredibly excited: more users make measured platform investments more valuable, but they bring more ideas and clarity to feature ideas, which attracts more users. It sets the tone for the next year of development along two tracks: fixing adoption blockers, and turning more “developer intent” into product features just to manage things for them.

For the adoption, both the security team and our team were asking the developers the same question: Is there anything that prevents you from using Wall-E? Every time we get the answer to this question, we try to figure out how to deal with it. Some application teams related to almost all blocker systems (usually due to historical reasons) were solving both custom authentication and application routing in a custom way. Examples include Legacy MTLS and various webhook schemes. With Wall-E as a clean, sustainable, paved road choice, we finally had enough of a carrot to move these groups away from supporting unique, potentially risky features. The price offer simply “does not allow us to help you migrate and you only have to deal with incoming traffic that has already been properly proven” and offload any responsibility for authentication, WAF integration and monitoring and DDOS security Towards the platform” Overall, we cannot increase the value of organizing a single paved road product organizationally to address such concerns. This creates an amazing clarity and strategic pressure that helps to align The actual service that the teams manage Certificates and skills that define them. 2–4 The difference between the “right-ish” way and a Unmarried The paved road is a strong one.

Also, with less exceptions and clearer criteria for apps that should take this paved road, our AppSec Engineering and User Focused Security Engineering (UFSE) teams can automatically adopt safety guidelines to provide a more appropriate automated lookup for adoption. Each leader’s security risk dashboard now includes a Wall-E acceptance metric, and almost all the recommended apps have chosen to accept it. Wall-E now has more than 350 applications, and is adding about 3 new manufacturing applications (mostly Internet-oriented). Every week.

Automated Guidance data shows the percentage of applications that have been accepted to use Wall-E. The shock in the number of recommended apps for adoption is real: since adoption blockers were discovered and finally resolved, and as we set the standard within the company, our automated recommendations reflect these developments.

As adoption continues to grow, we’ve seen a variety of signals from developer-team ownership to platform-ownership to developer intent for better performance. A particularly pleasing example is UI hosting: it has repeatedly emerged as a remarkable exception to our “complete authentication” goal, and is often the only thing that requires a single page app (SPA) UI team to run real-time on-call for cloud examples and infrastructure. Will. It has finally evolved into an opinionated, declarative resource service that abstracts to static file hosting for application teams: developers get faster static resource deployments, tighten security around UI applications, and less cloud instance to manage Netflix as a whole (and pay!). Wall-e became a necessity for the best UI developer experience, and it became more and more accepted.

A manufactured approach also meant that we could efficiently enable many complex but “like” features to enhance the developer experience, such as the Atlas Metrics free and integration with our Request Tracing Tool Edgar.

You may have noticed a word hidden in the conversation there … “Platform”. Netflix has a developing productivity firm: teams dedicated to helping other developers become more effective. A big part of their job is to gather the developer’s intent and automate the necessary touchpoints across our system. As these teams came to see Wall-E as a clear answer to many of their customers, they began to integrate their tools to configure Wall-E even from top-level developer interfaces. They The harvest was coming. In fact, it removes authentication and traffic routing (and everything Wall-E manages) from a specific product that developers need to think about and create. Likes About, just a fact that developers can believe and usually ignore. In 2019, 100% of developers have basically manipulated the Wall-E app configuration. In 2021, that interaction has changed dramatically: now more than 50% of app configuration on Wallace is done by automated tools (which is working on a high-level abstraction for developers).

This scale and standardization again raises the value: our internal risk measurement forecast shows mandatory annual savings in risk and event response costs across the Wall-E portfolio. These apps have fewer, less serious, and less exploitable bugs than non-Wall-E apps, and we rarely need urgent feedback from app owners (we call this Non-paging-a-mid-a-a-a-service). In addition to the developer’s time in the initial application setup and redundant services, it additionally adds to the team-month productivity sequence each year.

Looking back at the basic needs that led us down this road (“Simplify any security process […]“And”Increase the overall security bar […]”), Extending the evolution and early success of Wall-E from being part of the platform cement. Going forward, more and more apps and developers can benefit from this security assurance when they need to think more or less about them. This is a result of which we are very proud.

To summarize, here are some of the things we take with us on this journey:

  • If you can One thing To manage a large product protection portfolio, do bulletproof authentication; Especially as property Architecture
  • Security teams and central engineering teams can and should have a collaborative, mutually supportive partnership
  • “Production” is an ability (eg: clearly defined; offers defined values; branded; measured), even for internal equipment, useful to accept and find more value
  • A certain product makes the “paved road” clean; A boolean “use / not use” is strongly preferred for various options with subtle caution
  • Hitch the safety wagon for developer productivity
  • The intent to reap is strong; This allows many teams to add value

We see incredible strength in this type of security / infrastructure partnership, and we’re thrilled to enjoy these wins for our next goal: to build a full-fledged gateway API to become a truly service-oriented developer to our partner development team. This will allow us to focus our next milestone on the challenges that come our way: the 1000 applications behind Wall-E.

If this kind of thing is exciting for you, we are hiring for these two groups: Senior Software Engineer and Engineering Manager of Application Networking; And Senior Security Partner and AppSec Senior Software Engineer.

Special thanks to Cloud Gateway and Infosek team members past and present, especially Sunil Agarwal, Mikey Cohen, Luke Koseowski, Will Rose, Dilip Kancharla, Grant Callaghan, our partner on studio and developer productivity and the ancient Wall-e recipient who provides valuable feedback and ideas. And we also slipped to the Queen for reference to the song; Tell us when you find them all.

Source link

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button