Writing Samples

Thanks for checking out a bit of my work. I’ve added commentary to some of these pieces for context. Let me know what you think! The following is web copy for a non-profit organization. I also…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Implementing Change Data Capture using GCP DataStream

Datastream is a serverless and easy-to-use change data capture (CDC) and replication service. It allows you to synchronize data across heterogeneous databases and applications reliably, and with minimal latency and downtime.

Datastream supports streaming from Oracle and MySQL databases into Cloud Storage. The service offers streamlined integration with Dataflow templates to power up-to-date materialized views in BigQuery for analytics, replicate your databases into Cloud SQL or Cloud Spanner for database synchronization, or leverage the event stream directly from Cloud Storage to realize event-driven architectures.

Benefits of Datastream include:

Create instance as shown above

If log days are not set to 7 , use below steps to 7 days.

d. [mysqld]

e. log-bin=mysql-bin

f. server-id=1

h. log-slave-updates=true

With this step, mysql configuration is complete. Later, we will add ips of data stream to allow traffic from mysql instance.

Step 3 : Creating connection profiles with data stream.

Connections needs to be created as source, sink/target to be configured to create streaming pipeline. Below are the connections to be created for this use case –

a. Mysql connection profile — mysql source connection

IP address needs to be whitelisted to mysql. Below are the ips –

a. Storage connection profile — GCS as sink/target

a. Add IP addresses to Mysql instance in add network, to allow traffic

Create stream pipeline

Once connection profiles are created, we can create stream and configure change data capture –

Configuring steps –

Continue to configure source -

Select connection profile and run test to check connectivity –

Test passed is required to configure connection –

Continue to configure stream for tables/objects/schemas –

Define sink/target destination –

Configure destination –

Create stream –

Once stream created, it wont run until this is started. This can be done using create and start as well. Once stream started , it will be shown like this –

GCS target –

Stream stored into GCS as below –

Once feed is getting captured in GCS, a follow up job can be created and run to load these changes to one of GCP DB service. One way of implementation is to create Dataflow job to capture changes from GCS and write to BQ , Spanner or Cloud SQL. Google has provided these jobs in templates which can be created and used quickly to build pipelines. This job will run like a real time job and write changes to target table/db.

Add a comment

Related posts:

Securing the SDLC

Many security professionals might come across this task in their carrier to develop a secure SDLC framework for their organizations. Having years of experience in different areas in security field…

5 elements of newsworthiness

In order for something to be considered news, it must adhere to one or more of five qualities. These elements of newsworthiness are proximity, significance, timeliness, prominence, and human…