


Using the Remote API directly or writing a handler to access datastore entities one query at a time, you can run a parallelizable script or map reduce job to pull the data to where you need it. Unfortunately this has the same issues as #2.Ī New Approach – Export Data via Google Cloud Storage.This approach can be painfully manual and often requires significant infrastructure elsewhere (eg. Writing a map reduce job to push the data to another server.All official Google instructions point to this approach. This tool has pretty much remained the same (without any further development) since App Engine’s early days. Large datasets can easily take 24 hours to download and often fail without explanation. Using the Remote API Bulk Loader. Although convenient, this official tool only works well for smaller datasets.They are often promoted as making it easy to access datastore data, but the reality can be very different when dealing with big data. These options that have been around for some time. I’m excited to share a very effective approach based on Google Cloud Storage and Datastore Backups, along with a method for converting the data to other fomats! Existing Options For Data Export Since we have many TBs of data in Datastore, we’ve been actively looking for a solution to this for some time. While there are several options and tools, so far none have been feasible for large datasets (10GB+). Over the last two years, one of our biggest challenges at Pulse has been how difficult it can be to export large amounts of data for migration, backup, and integration with other systems. While Google App Engine has many strengths, as with all platforms, there are some some challenges to be aware of.
