Exploring Data – Netflix. By Gim Mahasintunon on behalf of Data …. | By Netflix Technology Blog

Gim Mahasintunon on behalf of Data Platform Engineering.

Supporting a rapidly growing base of engineers from diverse backgrounds using different data stores can be challenging in any organization. Netflix’s internal teams strive to provide leverage by investing in easy-to-use tooling that integrates the user experience and incorporates best practices.

In this blog post, we are thrilled to share that we have one such tool: Netflix Data Explorer. Data Explorer gives our engineers quick, secure access to data stored in the Cassandra and Dynamite / Radis data stores.

GitHub’s Netflix Data Explorer

We started this project several years ago when we were taking on a lot of new dynamite customers. Dynamite is a high-speed in-memory database that provides highly available cross-datacenter replicas when storing semantics, such as RADIS. We wanted to reduce adoption barriers so that users don’t have to know datastore-specific CLI commands, avoid incorrectly executed commands that could negatively affect performance, and allow them to access the clusters they use every day.

Since the project was launched we have seen similar needs in our other datastores. Our most notable footprint in the fleet was Cassandra, a great candidate. Users often had questions about how they should place their copy, create tables using an appropriate compaction technique, and create CQL queries. We knew we could give our users an enhanced experience, and at the same time, address many common questions in our support channels.

We’ll explore some of the Data Explorer features, and along the way, we’ll highlight some ways to enable the OSS community while still being used for some unique Netflix-specific uses.

By directing users to a single web portal for all their data stores, we can significantly increase user productivity. In addition, in a production environment with hundreds of clusters, we can reduce the data stores available to those authorized for access; This can be supported in the OSS environment by applying the cluster access control provider responsible for fetching proprietary information.

Browse your accessible clusters in different environments and regions

Writing Create Table Statements can be a terrifying experience for new Cassandra users. So to help reduce the intimidation factor, we’ve created a schema designer that lets users navigate to a new table.

The schema designer lets you create a new table using any primitive or collection data type, then select your partition key and clustering columns. It also provides tools for viewing disk storage layouts; Browse supported sample questions (to help design efficient point questions); Guide you through the process of choosing a compaction strategy and many other advanced settings.

Dragging and dropping your way to a new Cassandra table

You can quickly perform point questions against your cluster in Explore mode. Explore Mode supports the full CRUD of the record and lets you export the result set to CSV or download it as a CQL insert statement. Exporting CQL can be a handy tool for quickly copying data from a PROD environment to your test environment.

Explore mode gives you quick access to table data

Binary data is another popular feature used by our popular engineers. Data Explorer will not fetch binary value data by default (since static data can be large). Users can recover these fields with their desired encoding.

Choosing how you want to decode blob data

Skilled point queries are available in Explore mode, but you may have users who still need CQL flexibility. Enter query mode to have a powerful CQL IDE with features like autocomplete and helpful snippets.

Example of free-form Cassandra question with self-help

There are also gardals to help prevent users from making mistakes. For example, we will redirect the user to a particular workflow to delete a table if they try to run a “drop table …” command that ensures that the operation is performed safely with additional validity. (See our integration with Metrics later in this article.)

When you submit questions, they will also be saved in the Recent Question View – when you try to remember the section you created long weekend ago.

While C * may have a feature-rich and more extensive installed base, we also have plenty of good things for Dynamite and Redis users. Note, Terms Dynamite And Redis Used interchangeably until clearly separated.

Since Redis is an in-memory data store, we must avoid operations that inadvertently load all keys into memory. We perform scan operations across all nodes in the cluster, making sure we do not press on the cluster.

Scanning for keys in dynamite clusters

In addition to the common string keys, Dynamite supports a rich collection of data types, including lists, hashes and sorted and unsorted sets. The UI supports creating and manipulating these collection types as well.

Edit a redis hash value

When we were building Data Explorer, we started to get some signals that the ease of use and productivity gains that we saw internally would also benefit outside of Netflix. We have tried to balance the coding of some of the most commonly applied rigorous-educated best practices while maintaining flexibility to support different OSS environments. To that end, we have created several adapter layers in the product where you can provide custom applications as needed.

The application was archived to enable OSS so that users could provide their implementation for discovery, access control, and data store-specific connection settings. Users can choose between built-in service providers or provide a custom provider.

The image below shows the server-side architecture. The server is a Node.js Express application written in typescript and the client is a single page app written in Vue.js.

Level of data explorer architecture and service adapter

A time commitment to installing a new tool in any real-world environment. We’ve got it, and to help you with that initial setup, we’ve included a dockered demo environment. It can create apps, pull pictures for Cassandra and Reddys, and run everything in a docker container so you can dive. Note, the demo environment is not intended for production use.

Data Explorer ships with many default behaviors, but since the two production environments are not the same, we provide a mechanism to override the defaults and specify your custom values ​​for different settings. These can range from a port number to a production environment in which features can be disabled. (The ability to throw a Cassandra table, for example.)

To further enhance your configuration file creation experience, we’ve created a CLI tool that provides you with a series of prompts to follow. The CLI tool is the recommended method for creating your configuration file, and you can run the tool again at any time to create a new configuration.

CLI lets you create a custom configuration

You can create multiple configuration files and easily switch between them when working in different environments. We have GitHub instructions on how to work with multiple configuration files.

It’s no secret that Netflix is ​​a big proponent of microservices: we have discovery services to detect Cassandra and dynamite clusters in the environment; Access-control services that identify who owns a data store and who can access it; And LDAP service for information about logged in users. There is a good chance of similar services in your environment.

To help enable this type of environment, we have a number of pre-canned configurations with override values ​​and adapter layers.


The first example of this adapter level is how the application finds it Discovery Information – These are the name and IP address of the cluster you want to access. CLI lets you choose from a few simple options. For example, if you have a process that can update a JSON file to disk, you can select “File System”. If, instead, you have a REST-based microservice that provides this information, you can select “Custom” and enter a few line codes needed to bring it up.

Choosing to discover our data store cluster by reading a local file

The matrix

Another example of this service adapter level is integration with an external metric service. We gradually improved the UI by displaying keyspaces and table metrics using a metric service adapter. These metrics provide insight into which tables are being used at a glance and help our customers make an informed decision when dropping a table.

Without matrix support
With alchemical metric support

OSS users can enable snippets The matrix Support via CLI. All you have to do is type in the code to get the metrics.

CLI enables customization of enhanced features

Although internationalization was not an obvious goal, we discovered that in some cases Netflix-specific messaging provided additional value for our internal users. Basically, this is how the resource bundle handles different locales.

We are providing en-NFLX.ts internally and en-US.ts externally. Enterprise customers can enhance their user experience by creating custom resource bundles (en-ACME.ts) that link to other tools or improve default messages. Only a small percentage of UI and server-side exceptions currently use these message bundles সাধারণত usually to somehow enhance the message (e.g., provide links to internal slack channels).

We invite you to test the project and let us know how it works for you. By sharing Netflix Data Explorer with the OSS community, we hope it will help you explore your data and inspire some new ideas.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button