
Me being a toddler:
I am BigQuery. My genesis is from Google. I am a cloud-based service that combines a NoSQL-style data store with “SQL-like” querying capabilities.
My mom said, before I was born, querying massive datasets was a hurdle without proper hardware and infrastructure. But now I solve this problem by enabling interactive analysis of massively large datasets working in conjunction with Google Cloud Storage. I am a fully managed, NoOps, low-cost analytics database, where you can deploy petabytes of data, use familiar SQL to query and also get pay-as-you-go model.
When I am a crazy teen:
I had 4 remarkable basic concepts which still help me through the life.
- Projects
- Tables
- Datasets
- Jobs
Don’t you want to know what they all are? Chill, Let me explain those to you…
- Projects:
Projects are top-level containers in Google Cloud Platform. Each project has a friendly name and a unique ID.
- Tables:
Each table has a schema that describes field names, types, and other information. In addition to tables containing data stored in managed storage, I also support both views, which are virtual tables defined by a SQL query, and external tables, which are tables defined over data stored in.
- Datasets:
Datasets allow you to organize and control access to your tables. Because tables are contained in datasets, you’ll need to create at least one dataset before loading data into me.
- Jobs:
Jobs are actions you construct and I execute on your behalf to load data, export data, query data, or copy data. Since jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status.
Now, a responsible grown-up:
Interacting with me is quite an ease. All you have to do is follow these 3 ways,
(i) Loading and exporting data:
In most cases, you load data into my Storage. If you want to get the data back from me, you can export the data. You can also set up a table as a federated data source which allows you to use a query to transform your data as you load it.
(ii) Querying and viewing data:
Once you load your data to me, there are a few ways to query or view the data in your tables:
Querying data
- Calling thejobs.query() method
- Calling thejobs.insert() method with a query configuration
Viewing data
- Calling thetabledata.list() method
- Calling thejobs.getQueryResults() method
(iii) Managing data:
In addition to querying and viewing data, you can manage my data in the following ways:
- Listing projects, jobs, tables and datasets
- Getting information about jobs, tables and datasets
- Defining, updating or patching tables and datasets
- Deleting tables and datasets
My recent Relationship with Congruent:
Phase 1 with Congruent:
My first task with congruent was to receive data ingest from specific tables of Salesforce.
- Congruent developed the necessary routines using APIs to retrieve data from Salesforce and transfer it to me, using the ‘chunking’ approach.
- They also made it possible to configure the time of the day, when the download and export process would run.
Boo-ya!!! Nothing can hide from me, even if new fields are added to the Salesforce table, such fields would also get imported into me.
Phase 2 with Congruent:
The scope of this project with Congruent is to:
- Download from an FTP site, extract and upload the Segment files and Crosswalk files to me.
- Process the data from these two inputs to identify households which are potential buyers of a product segment and generate an ‘Audience Builder User’ table in me.
I will notify the users through their mail id regarding the success/failure of the procedure on completion of the each and every process.
A Start from an end:
I am mesmerizing a lot of data enthusiasts with my lightning-fast analytics database. Customers find my performance liberating, allowing them to experiment with enormous datasets without compromise. Finally, I found my purpose in this big data world. Have you?!