Automate CSV Import from Drive to BigQuery via Apps Script

Introduction

Importing CSV files into BigQuery is a common task for data analysts and engineers who need to transform raw data into a query‑able warehouse. While the BigQuery UI and the bq command‑line tool can handle one‑off loads, automating the process becomes essential when files are regularly updated or stored in Google Drive. Google Apps Script offers a lightweight, server‑less way to bridge Drive and BigQuery, allowing you to schedule imports, handle errors, and keep your datasets fresh without leaving the Google ecosystem. In this article we will walk through the complete workflow: preparing the CSV, configuring the script, executing the load job, and applying best‑practice tips to ensure reliable, scalable imports.

Preparing the CSV and Google Drive

Before writing any code, make sure the CSV file follows BigQuery’s formatting rules. Each row should be delimited by a newline, fields by commas (or another delimiter you specify), and the file must be encoded in UTF‑8. If the source data contains headers, decide whether to keep them (BigQuery can skip them) or remove them during preprocessing.

  • Store the file in a dedicated folder on Google Drive. This makes it easy to reference the file ID programmatically.
  • Set appropriate sharing permissions – the service account that runs the Apps Script needs at least read access to the file.
  • Optional: Split large files into chunks under 5 GB (the maximum size for a single load job) to avoid quota limits.

Creating the Apps Script Project

Open Google Apps Script (script.google.com) and start a new project. Add the BigQuery advanced service and enable the BigQuery API in the Google Cloud console. The core function will:

  • Locate the CSV by its Drive file ID.
  • Build a jobConfiguration object that defines the destination dataset, table name, schema (or autodetect), and load options such as skipLeadingRows.
  • Call BigQuery.Jobs.insert to start the load job.

Example snippet (simplified for clarity):

function loadCsvToBigQuery() {

  var fileId = ‘YOUR_FILE_ID’;

  var datasetId = ‘my_dataset’;

  var tableId = ‘my_table_’ + Utilities.formatDate(new Date(), ‘GMT’, ‘yyyyMMdd’);

  var job = {

    configuration: {

      load: {

        sourceUris: [‘drive://’+fileId],

        destinationTable: {projectId: ProjectApp.getActiveProject().getId(), datasetId: datasetId, tableId: tableId},

        autodetect: true,

        skipLeadingRows: 1,

      },

    },

  };

  BigQuery.Jobs.insert(job, ProjectApp.getActiveProject().getId());

}

Running, Monitoring, and Automating the Load Job

After deploying the script, you can run it manually from the Apps Script editor or set up a time‑driven trigger (e.g., daily at 02:00 UTC) to automate recurring imports. Monitoring is essential: the Jobs.get method returns the job’s status, errors, and statistics such as rows processed and bytes loaded.

  • Success handling – log the job ID and timestamp, optionally send an email confirmation.
  • Error handling – inspect job.status.errors, retry on transient failures, and alert stakeholders if the load repeatedly fails.
  • Cleanup – move processed CSVs to an archive folder or delete them to keep Drive tidy.

Best Practices and Troubleshooting Tips

To keep your pipeline robust, follow these guidelines:

  • Schema management – while autodetect is convenient, defining an explicit schema prevents unexpected type changes when new columns appear.
  • Partitioning and clustering – load data into partitioned tables (e.g., by date) to improve query performance and reduce cost.
  • Quota awareness – monitor BigQuery load job quotas; batching multiple small files into one load can avoid hitting limits.
  • Testing with sample data – run the script on a subset of rows first to verify column mapping and data quality.
  • Logging – use Logger.log() or Stackdriver Logging for detailed audit trails, especially when the script runs unattended.

Conclusion

By leveraging Google Apps Script, you can turn a manual CSV upload into a fully automated, auditable pipeline that moves data from Drive straight into BigQuery. The process starts with clean, properly formatted CSV files, continues with a concise script that builds a load job configuration, and ends with systematic monitoring and best‑practice safeguards. Once the script is scheduled, new data lands in your warehouse on a predictable cadence, freeing you to focus on analysis rather than data wrangling. Implement the steps outlined above, adapt them to your organization’s naming conventions and security policies, and you’ll have a reliable, scalable solution for keeping BigQuery up‑to‑date with the latest CSV exports.

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Digital Malayali