Subscribe to DSC Newsletter

Information science work process with huge geospatial datasets?


I am generally new to the docker approach so please hold on for me.

The objective is to ingest huge geospatial datasets to Google Earth Motor utilizing an open source replicable approach. I got everything chipping away at my neighborhood machine and a Google Register Motor yet might want to make the approach available to others also.

The vast static geospatial documents (NETCDF4) are as of now put away on Amazon S3 and Google Distributed storage (GEOTIFF). I require two or three python based modules to change over and ingest the information into Earth Motor utilizing a charge line interface. This needs to happen just once. The information transformation isn't substantial and should be possible by one fat case (32GB Slam, 16 centers takes 2 hours), there is no requirement for a bunch.

My inquiry is the manner by which I should manage extensive static datasets in Docker. I thought of the accompanying choice yet might want to know best practices.

1) Utilize docker and mount the amazon s3 and Google Distributed storage pails to the docker compartment.

2) Duplicate the huge datasets to a docker picture and utilize Amazon ECS

3) simply utilize the AWS CLI

4) utilize Boto3 in Python

5) A fifth choice that I am not yet mindful of

The python modules that I utilize are a.o.: python-GDAL, pandas, earth-motor, subprocess.

Thnk you


Views: 214

Reply to This

Replies to This Discussion

How do you wish to query the datasets?

Does PostgresGeo give you the queries you want? It is the most pleasant GEO database API I've seen.

Reply to Discussion


Follow Us


  • Add Videos
  • View All


© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service