Posts

Showing posts from March, 2019

Data Quake

Image
Data Quake. That's what it is. Dave Wells  have just gave this great definition that clearly describes what's happening in the data management world during the recent years. I am greatly enjoying Dave’s session today at Enterprise Data World summit and couldn't resist writing down the summary. Everything that we did in the last decade becomes wrong now. We have used to believe that application logic can run faster and do better if it sits inside the database layer. Now this architecture is being considered a wrong choice. Same goes for data normalization or strong schema. Some people even say that data warehouses are dead. We need to rethink everything. Data schema used to be defined during the design phase. Now we define schema-on-read, after the data have been persisted. Good news - I have always believed that and Dave have just mentioned - there is no schema-less data. Despite the fact that we do not get to design the schema anymore, for Big Data we need to un

Serverless ETL: Read, Enrich and Transform Data with AWS Glue Service

Image
More and more companies are aiming to move away from managing their own servers and moving towards a cloud platform. Going server-less, offers a lot of benefits like lower administrative overhead and server costs. In the server-less architecture, developers work with event driven functions which are being managed by cloud services. Such architecture is highly scalable and boosts developer productivity. AWS Glue service is an ETL service that utilizes a fully managed Apache Spark environment. Glue ETL that can clean, enrich your data and load it to common database engines inside AWS cloud (EC2 instances or Relational Database Service) or put the file to S3 storage in a great variety of formats, including PARQUET. I have recently published 3 blogposts on how to use AWS Glue service when you want to load data into SQL Server hosted on AWS cloud platform. 1.  Serverless ETL using AWS Glue for RDS databases 2. Join and Import JSON files from s3 to SQL Server RDS instance  Part 1