Our client, a major American music corporation, is a production house of media and entertainment data, operating in more than 60 countries. As part of its key business operations, the client works on a comprehensive music catalogue to handle various forms of music consumption. It also frequently handles data from its key distribution partners (Spotify, Apple, Pandora, Rhapsody, You Tube and others) to obtain timely insights on country wise sales and revenue data, from its proprietary albums and recording labels. To handle the massive incoming Music and Entertainment Data,
the client uses AWS Redshift Data Warehouse Solution for storage and Amazon S3 for Data Lakes support.
While trying to achieve its business needs, the client faced the following challenges:
- Increased Costs: With Redshift, the billing is done based on hourly usages and the pricing fixation based on cluster size, making it costly for low query volumes. As the incoming M&E data from streaming partners
was distributed across nodes on a periodic basis, it was unable to handle the increased data volume.
- Frequent Performance Latencies: The existing system caused a time lag of nearly 24 hours due to increased data retrieval, processing time and
client waiting hours. Gaining timely insights from high volume streaming data thus became difficult.
- Increased Complexity: Redshift’s critical drawback is that it required constant low-level database tuning of the virtualized hardware and database configurations. Its “data clusters” required additional
expertise and flexibility on its resource management capacity.
- Difficulty in Scalability: Due to the exponential increase in data volumes from streaming partners, the existing system couldn’t scale up and process data faster, even in the presence of additional investments. This had a proportionate impact on the downstream processes, i.e., supporting deep dive analyses of Sales Data (LYSD - Last year same day),
which have an impact on the key tactical business decision making process.
To handle the above challenges, the client was on the lookout for a technical offshore partner with the right business expertise, to conduct effective re-platform operations and migration of its data, from the legacy infrastructure to a much more scalable and cost optimized Data Warehouse solution, i.e., its movement from AWS to Google Cloud Platform. This offered increased
speed in data processing and insights generation, for better business agility.
The Music and Entertainment Data, currently stored in Redshift Data Warehouse and Amazon S3 Data Lake, followed the migration process on an incremental setup to Google Big Query, and the Data
Lake support was offered using Google Cloud Storage.
Google Big Query offered the following advantages:
- Improved Features and Decreased Costs: Big Query has the capacity to can support handling and storage of massive data sets, as it favors an RDBMS based SQL system. With its effective resource management options, it could provide abstract details of the underlying hardware, database and other configurations. Its pricing is fixed based on the amount of data processed with these queries,
and not on storage volume, decreasing the overall costs significantly.
- Reduced Latencies: While offering superior usability, performance, and cost for its analytical use-cases, especially at scale, Big Query has quicker
response times and better performance options.
- Manageability and Usability: Google Big Query supports data optimization for fast queries, effective resources utilization distributed across time, query response times of a few minutes, and minimal database tuning. It also offers superior performance in terms of data distribution across a defined number of nodes. Big Query’s performance fluctuates substantially, as the same query against the same
data set will run twice (or ½) as fast on different days, especially for SQL-like queries made against multi-terabytes of data sets.
The other key components used, as part of GCP:
- Google Compute Engine: Implemented as an infrastructure-as-a-service (IaaS) solution that provides the features of virtual machine instances
for hosting the workloads of M&E data volume, for the music consumers.
- Google Cloud Storage: a data lake solution synonymous with Amazon S3, to store large, unstructured
data sets of the incoming media and entertainment data.
- Google Cloud Dataflow: a data processing service intended for ETL Operations and Analytics, to support real-time
big data processing of the unstructured business data.
- Performance Improvement of up to 75%, due to the adoption of robust and cost optimized Google Big Query, which offers simplistic performance and offers pricing based on the number of queries processed.
- Reduced data latency: Post migration to Google Cloud,
the tasks could be completed within 1 hour, a considerable reduction in time.
- Significant reduction in downtime: As per the industry best practices, the client’s current infrastructure could be kept at par with the data scale and query patterns. The current M&E data has been running
on the new system Google Big Query, for the past 1 year, with minimal downtime hours.
- Support for near real-time updates of over 4 hours Based on inputs obtained from over 10-15 partners and over 70 Priority partners,
supporting downloads & other live streaming options.