Anda di halaman 1dari 5

How to access Google Sheet data using

the Python API and convert to Pandas


dataframe
Daniel Barker
Apr 28, 2018 · 3 min read

For many data science/visualization projects I work on, setting up a


SQL database (or something similar) is overkill. On the opposite side of
the spectrum, using local Excel les makes things more di cult to
share and replicate. Google Sheets are often an excellent middle-
ground, providing an easy-to-use collaborative platform with a familiar
Excel-like interface.

Accessing Google sheet data using OAuth and the Google Python API is
a straightforward process, thanks to the (per usual) excellent Google
documentation. First, we need to setup OAuth credentials on our
Google Drive account in order to access the worksheet.
Next, we need to install the Google API client libraries for Python. We
can do this in an (ideally, in an activated Python virtual environment)
using pip.

Obviously, in a tutorial about accessing Google sheet data, you’re going


to need a sheet to work with — I’ll be using my “Volcanic Wines” sheet
(based on the awesome volcanic activity data available from the
Smithsonian). For the next steps, you’re going to need the sheet ID,
which you can get from the URL, and then the name of the worksheet.

Get the spreadsheet ID from the Google Docs URL

Get the Google Sheet name of interest

Now, create a new Python script to retrieve the data (make sure the
‘client_secret.json’ le is saved in the same working directory as the
script, or provide an explicit path). Update the spreadsheet ID and
worksheet names in the code below with the relevant values for your
spreadsheet.

1 from __future__ import print_function
2 from apiclient.discovery import build
3 from httplib2 import Http
4 from oauth2client import file, client, tools
5 import pandas as pd
6  
7  
8 SPREADSHEET_ID = # <Your spreadsheet ID>
9 RANGE_NAME = # <Your worksheet name>
10  
11  
12 def get_google_sheet(spreadsheet_id, range_name):
13     """ Retrieve sheet data using OAuth credentials and Goo
14     scopes = 'https://www.googleapis.com/auth/spreadsheets.
15     # Setup the Sheets API
16     store = file.Storage('credentials.json')
17     creds = store.get()
18     if not creds or creds.invalid:
19         flow = client.flow_from_clientsecrets('client_secre
20         creds = tools.run_flow(flow, store)
21     service = build('sheets', 'v4', http=creds.authorize(Ht
22  
23     # Call the Sheets API
24     gsheet = service.spreadsheets().values().get(spreadshee
25     return gsheet
26  
27  
28 def gsheet2df(gsheet):
29     """ Converts Google sheet data to a Pandas DataFrame.
30     Note: This script assumes that your data contains a hea
31
32     Also note that the Google API returns 'none' from empty
33     below to work, you'll need to make sure your sheet does
34     or update the code to account for such instances.
35
36     """

Final Python code for accessing Google sheet data and converting to Pandas dataframe

Run the script, and you should get your sheet data returned as a
dataframe — stay-tuned for an upcoming set of tutorials that will walk
through the creation and deployment of a Plotly Dash web app using
this Volcanic Wine data!

Final Pandas dataframe returned from Google Sheet