88204

Move data from hive tables in Google Dataproc to BigQuery

Question:

We are doing the data transformations using Google Dataproc and all our data is residing in Dataproc Hive tables. How do i transfer/move this data to BigQuery.

Answer1:

Transfer to BigQuery from Hive seems to have a standard pattern:

<ul><li>dump your Hive into Avro files</li> <li>Load those files in BigQuery</li> </ul>

See an example here: <a href="https://stackoverflow.com/questions/46958916/migrate-hive-table-to-google-bigquery/47038501#47038501" rel="nofollow">Migrate hive table to Google BigQuery</a>

As mentioned above, take care about the types compatibility between Hive/Avro/BigQuery.

And for the first time I guess it would not hurt to do some validations by comparing that the tables on both Hive and BigQuery have the same data: <a href="https://github.com/bolcom/hive_compared_bq" rel="nofollow">https://github.com/bolcom/hive_compared_bq</a>

Recommend

  • Error retrieving Avro schema for id 1, Subject not found.; error code: 40401
  • How to fetch Kafka source connector schema based on connector name
  • BQ Load error : Avro parsing error in position 893786302. Size of data block 27406834 is larger than
  • oauth2client.client.AccessTokenRefreshError: invalid_grant Only in Docker
  • Google BigQuery: creating a view via Python google-cloud-bigquery version 0.27.0 vs. 0.28.0
  • BigQuery : is it possible to execute another query inside an UDF?
  • Get Most Recent Column Value With Nested And Repeated Fields
  • Laravel 5.2 Auth::check() on exception pages (layouts)
  • jQuery and Uploadify session in the php file
  • How to get file download speed (transfer rate) with php?
  • How to select table rows/complete table?
  • Why is django manage.py syncdb failing to create new columns on my development server?
  • Is there any way to call saveCurrentTurnWithMatchData without sending a push notification?
  • Angular2 - Template reference inside NgSwitch
  • OSX - always hide certain files
  • Clear fused location provider's location for testing
  • Update Google Maps traffic layer without page reloading
  • Trying to get the char code of ENTER key
  • Android Studio 1.3 RC3. Google Play services out of date. Requires 7571000 but found 6774470
  • android google indoor map
  • Why use database factory in asp.net mvc?
  • Android Google Maps API v2 start navigation
  • Insert new calendar with SyncAdapter- Calendar API Android
  • How can I enlarge video fullscreen without the affected interface project in as3?
  • How to use carriage return with multiple line?
  • copying resource to sdcard gives a damaged file in android
  • Seeking advice on Jetty HttpClient Hang
  • How to rebase a series of branches?
  • Control modification in presentation layer
  • WinForms: two way TextBox problem
  • Adding custom controls to a full screen movie
  • Google cloud sdk not working when python points python3
  • R: gsub and capture
  • Confusion with PayPal's monthly billing cycle
  • How do I rollback to a specific git commit
  • Is there a mandatory requirement to switch app.yaml?
  • using HTMLImports.whenReady not working in chrome
  • What are the advantages and disadvantages of reading an entire file into a single String as opposed
  • Busy indicator not showing up in wpf window [duplicate]
  • Android Heatmap on canvas or ImageView