Blog

Best Practices on Securing Big Data clusters

Best Practices on Securing Big Data clusters

If you are a developer, Machine Learning Engineer, Big Data Engineer or data scientist using Amazon EMR clusters, you face fast-changing workloads and data security issues. With AWS EMR Security feature, engineers can now deploy data security out of the box when setting up Data lake

Amazon EMR is a managed cluster platform from Amazon which simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS. Amazon EMR enables organizations to spin up a cluster with multiple instances in a matter of a few minutes. It also enables you to process various data engineering and business intelligence workloads through parallel processing. By doing this, to a great extent, you can reduce the data processing times, effort, and costs involved in establishing and scaling a cluster. One reason why a customer chooses AWS EMR is because of its Security Features. Customers who follow regulated principals like financial services, healthcare industries, Manufacturing units, etc.. choose EMR as part of their Data strategy. AWS EMR remains the best choice in these industries mainly to adhere to PCI and HIPPA regulations.

On a high-level AWS EMR provides Encryption at rest and Encryption in transit

Encryption at rest

There are several ways to encrypt data at rest, EMR by default uses EMR File systems (EMRFS) to read data from AWS S3 (Data lake) and to encrypt data in AWS S3 AWS has below options

SSE-S3, SSE-KMS, CSE-KMS/CSE-C all these encryptions come out of the box on the console when setting up the Data lake environment.

Encryption in Transit

Amazon EMR security configurations enable customers to choose a method for encrypting data in transit using Transport Layer Security (TLS) (as described in the EMR documentation).

Either of the following can be done
• Manually create PEM certificates, zip them in a file, and reference from Amazon S3.
• Implement a certificate custom provider in Java and specify the S3 path to the JAR.

Leave A Comment

Your email address will not be published. Required fields are marked *

Top