Kubernetes is an open-source platform for container orchestration. You can use it to deploy a highly resilient, self-healing infrastructure using automation and infrastructure as code (IaC). Kubernetes includes features for zero downtime deployments, scaling, automatic rollout and rollback of updates, and service discovery. Kubernetes is designed to help you manage container deployments at scale via REST API. It can be used in any environment, including public and private clouds, and enables flexible distribution of workloads and resources. All in all, Kubernetes in production is a great solution for container orchestration. However, it has a pretty steep learning curve, especially when you start running Kubernetes in production.
According to Portworx’s 2019 Container Adoption Survey, 87% of respondents reported running container technologies. This is a 32% increase from two years prior. Of these users, nine out of ten report using deployments in production, a 67% increase from two years prior.
Of the 501 IT professionals surveyed, two-thirds also reported using one or more orchestration tools in addition to Kubernetes. This was at least partially reflected in a growing interest in hybrid and multi-cloud environments.
Similar to Portworx’s findings, the popularity of the platform was also found in a biannual survey performed by the Cloud Native Computing Foundation. This survey found that 78% of organizations using Kubernetes were running it in production.
Many organizations get started with Kubernetes by creating a proof of concept (POC) deployment to an on-premises machine or a few cloud instances. This is a good way for engineers to become familiar with Kubernetes and to test how long simple configurations take.
These POCs are typically not suitable for production deployments. This is because components for management and monitoring at scale are not accounted for. When you are ready to move to production, you should consider incorporating the following components to ensure a successful and secure deployment.
The easiest way to ensure a successful move to production Kubernetes is with a managed service. Managed Kubernetes as a service (KaaS) offerings are services that take care of some or all of Kubernetes maintenance and configuration for you. This can include migration, dedicated support, hosting, use of pre-configured environments, or full operations.
These services are designed for organizations that want to outsource some of the labor involved with Kubernetes. This could be because they don’t have or want to hire a team with the necessary expertise and experience. Or, because they want to minimize the lower-level tasks that their teams are responsible for.
Using a KaaS provider can grant access to Kubernetes expertise and guidance that are difficult to access otherwise. Additionally, some providers can offer enterprise-grade tooling and support that are otherwise missing due to Kubernetes being open source.
Monitoring your deployment is critical to ensure that configurations remain correct, performance meets standards, and traffic remains secure. Without continuous monitoring or logging you cannot diagnose issues that occur or identify areas for improvement. You also cannot ensure compliance with regulations.
When monitoring, you should focus on ensuring the collection of granular metrics across your entire infrastructure. This means growing comfortable with an Application Performance Management tool with Kubernetes support, such as Stackify Retrace.
As part of your monitoring, you need to set up logging capabilities on every layer of your architecture. The logs generated enable you to ingest system data with security tooling, audit system functionality, and analyze performance. When setting up log tooling, you should prioritize tools that centralize log data and can provide custom visualizations of data for easy interpretation, such as Retrace.
Production deployments require configuration of resource requests and limits. Requests are soft limits while limits are hard maximums on the amount of resources a container can use. These specifications enable you to control how containers can consume resources and ensure that resources are distributed efficiently.
You should set requests and limits for storage, memory, and CPU to prevent individual containers from hogging your resources. These configurations can ensure that your services remain available and limit the damage that can be caused by compromised containers.
Your etcd cluster is where all configuration and status information about your deployment is stored. If the data in your cluster is lost or compromised, your deployment cannot function. While operating etcd in an external cluster can add some resiliency and security, this is not foolproof.
To truly preserve your data, you should be taking regular backups and saving these backups to an external host. You can do this by taking snapshots or by copying the database snapshot file from your etcd member directory. If you are hosted on public cloud resources, taking a snapshot of your storage volume is the easiest method.
When setting up your etcd clusters you need to take care when setting your election timeout and heartbeat parameters. Election timeout determines how long an etcd follower waits before initiating a new election when communication with the leader is cut off. A heartbeat is a signal sent by your etcd leader to the followers to confirm an active connection.
These parameters are key because the interval set determines how long your etcd cluster may be non-operational before a new leader is assigned. In general, it is recommended that your heartbeat match the round trip time for communications between leader and follower. Your timeout interval should be around 10 heartbeats.
Production deployments contain significant amounts of valuable and often sensitive data. Additionally, compromised deployments may allow attackers access to some or all of your system resources. Because of this, proper security is essential.
To secure your deployment, you need to account for infrastructure access, networking, container security, and application security. This should involve applying role-based access controls (RBACs), multi-factor authentication (MFA), network encryption, vulnerability scanning, and endpoint protections.
You should evaluate and prioritize security throughout your development, configuration, and deployment processes. You should also leverage your monitoring to ensure that security is maintained and that configurations are not changed without your awareness.
When running Kubernetes in production, there are various aspects you should consider. This is a complex, feature-rich platform, and production operations should be treated with care. If you are experiencing in-house talent shortage, you can outsource this task to KaaS providers. If you are running this in-house, you should follow Kubernetes best practices, and pay special attention to cluster monitoring and logging, request configurations, and etcd data protection.
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]