BigQuery: The story of my life
Me being a toddler:
I am BigQuery. My genesis is from Google. I am a cloud-based service that combines a NoSQL-style data store with “SQL-like” querying capabilities.
My mom said, before I was born, querying massive datasets was a hurdle without proper hardware and infrastructure. But now I solve this problem by enabling interactive analysis of massively large datasets working in conjunction with Google Cloud Storage. I am a fully managed, NoOps, low-cost analytics database, where you can deploy petabytes of data, use familiar SQL to query and also get pay-as-you-go model.
When I am a crazy teen:
I had 4 remarkable basic concepts which still help me through the life.
Don’t you want to know what they all are? Chill, Let me explain those to you…
Projects are top-level containers in Google Cloud Platform. Each project has a friendly name and a unique ID.
Each table has a schema that describes field names, types, and other information. In addition to tables containing data stored in managed storage, I also support both views, which are virtual tables defined by a SQL query, and external tables, which are tables defined over data stored in.
Datasets allow you to organize and control access to your tables. Because tables are contained in datasets, you’ll need to create at least one dataset before loading data into me.
Jobs are actions you construct and I execute on your behalf to load data, export data, query data, or copy data. Since jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status.
Now, a responsible grown-up:
Interacting with me is quite an ease. All you have to do is follow these 3 ways,
(i) Loading and exporting data:
In most cases, you load data into my Storage. If you want to get the data back from me, you can export the data. You can also set up a table as a federated data source which allows you to use a query to transform your data as you load it.
(ii) Querying and viewing data:
Once you load your data to me, there are a few ways to query or view the data in your tables:
- Calling thejobs.query() method
- Calling thejobs.insert() method with a query configuration
- Calling thetabledata.list() method
- Calling thejobs.getQueryResults() method
(iii) Managing data:
In addition to querying and viewing data, you can manage my data in the following ways:
- Listing projects, jobs, tables and datasets
- Getting information about jobs, tables and datasets
- Defining, updating or patching tables and datasets
- Deleting tables and datasets
My recent Relationship with Congruent:
Phase 1 with Congruent:
My first task with congruent was to receive data ingest from specific tables of Salesforce.
- Congruent developed the necessary routines using APIs to retrieve data from Salesforce and transfer it to me, using the ‘chunking’ approach.
- They also made it possible to configure the time of the day, when the download and export process would run.
Boo-ya!!! Nothing can hide from me, even if new fields are added to the Salesforce table, such fields would also get imported into me.
Phase 2 with Congruent:
The scope of this project with Congruent is to:
- Download from an FTP site, extract and upload the Segment files and Crosswalk files to me.
- Process the data from these two inputs to identify households which are potential buyers of a product segment and generate an ‘Audience Builder User’ table in me.
I will notify the users through their mail id regarding the success/failure of the procedure on completion of the each and every process.
A Start from an end:
I am mesmerizing a lot of data enthusiasts with my lightning-fast analytics database. Customers find my performance liberating, allowing them to experiment with enormous datasets without compromise. Finally, I found my purpose in this big data world. Have you?!