Anda di halaman 1dari 17

Extracting Data

Extracts are saved subsets of a data source that you can use to improve performance, upgrade
your data to allow for more advanced capabilities, and analyze offline. You can create an extract
by defining filters and limits that include the data you want in the extract. After you create an
extract you can refresh it with data from the original data source. You can either fully refresh the
data, replacing all of the extract contents; or you can increment the extract; which only adds
rows that are new since the last refresh.
Extracts can:
Improve performance. For file based data sources such as Excel or Access, a full extract
takes advantage of the Tableau data engine. For large data sources, a filtered extract
can limit the load on the server when you only need a subset of data.
Add functionality to file based data sources, such as the ability to compute Count
Distinct.
Provide offline access to your data. If you are traveling and need to access your data
offline, you can extract the relevant data to a local data source.
Creating an Extract
Using Extracts
Refreshing Extracts
Adding Rows from a File
Upgrading Legacy Extracts
Optimizing Extracts
Updating Extracts on Tableau Server
Tableau Data Extract Command-Line Utility
Tableau Data Extract API

Creating an Extract
1. Select a data source on the Data menu and then select Extract Data to open the Extract
Data dialog box.

2. Optionally define filters to limit the data that will be extracted. Any fields that are hidden
in the Data window will be automatically excluded from the extract. Click the Hide All
Unused Fields button to quickly remove them from the extract.
To add filters, click the Add button under the Filters list.
3. Specify whether to Aggregate data for visible dimensions. When you select this
option the measures are aggregated using their default aggregation. Aggregating the
data can minimize the size of the extract file and increase performance.
When you choose to aggregate the data you can also choose to Roll up dates to a
specified date level such as Year, Month, etc.

The examples below show how the data will be extracted for each aggregation option.
Original
Data

Each record is shown as a
separate row. There are 7 rows
in the data source.
Aggregate
Data
(no roll up)

Records with the same date and
region have been aggregated
into a single row.There are 5
rows in the data source.
Aggregate
Data
(roll up
dates to
Month)

Dates have been rolled up to the
Month level and records with
the same region have been
aggregated into a single
row.There are 3 rows in the data
source.
4. Select the number of rows you want to extract. You can extract All, the Top N rows, or a
Sample from the data source. Tableau first applies any filters and aggregation and then
extracts the number of rows from the filtered and aggregated results.
The number of rows options depend on the type of data source you are extracting from.
For example, not all data sources support sampling so that option is not always
available.
5. When finished, click Extract.
6. In the subsequent dialog box, select a location to save the extract into and give the file a
name. Then click Save.
Depending on the size of your data source, extracting data can take a long time. However, after
you have extracted the data and saved it to your hard drive, performance will improve.

Using Extracts
After you create an extract, the current workbook begins using the extract. However, the extract
connection is not saved with the workbook until the next time you save. That means, if you close
the workbook without saving first, the workbook will connect to the original data source the next
time you open it.
You may want to create an extract with a sample of the data so you can set up the view and
then switch to the whole data source, thus avoiding long queries every time you place a field on
the shelf. You can toggle between using the extract and using the entire data source by
selecting a data source on the Data menu and then selecting Use Extract.
You can remove an extract at anytime by selecting a data source on the Data menu and then
selecting Extract > Remove. When you remove an extract you can choose to Remove the
extract from the workbook only or Remove and delete the extract file, which will delete the
extract from your hard drive.
You can see when the extract was last updated and other details by selecting a data source on
the Data menu and then selecting Extract > History.


Refreshing Extracts
When the underlying data changes, you can refresh the extract by selecting a data source on
the Data menu and then selecting Extract > Refresh. Extracts can be configured to be fully
refreshed, replacing all of the data with whats in the underlying data source, or incrementally
refreshed, adding just the new rows since the last refresh.
Full Extracts
By default, extracts are fully refreshed. That means that every time you refresh the extract, all of
the rows are replaced with the data in the underlying data source. While this kind of refresh
ensures you have an exact copy of what is in the underlying data source, it can sometimes take
a long time and be expensive on the database depending on how big the extract is.
If the extract is not set up for incremental extract, selecting refreshing the extract will fully
refresh the extract. If youre publishing the data source to Tableau Server, you can specify the
type of refresh in the Scheduling & Passwords dialog box.
Incremental Extracts
Rather than refreshing the entire extract, you can set it up to only add the rows that are new
since the last time you extracted data. For example, you may have a data source that is
updated daily with new sales transactions. Rather than rebuild the entire extract each day, you
can just add the new transactions that occurred that day. Then once a week you may want to do
a full refresh just to be sure you have the most up to date data.
Follow the steps below to set up an extract to be incrementally refreshed.
1. Select a data source on the Data menu and then select Extract.
2. In the Extract Data dialog box, select All rows as the number of Rows to extract.
Incremental refresh can only be defined when you are extracting all rows in the
database. You cannot increment a sample extract.
3. Select Incremental refresh and then specify a column in the database that will be used
to identify new rows. For example, if you select a Date field, refreshing will add all rows
whose date is after that last time you refreshed. Alternatively, you can use an ID column
that increases as rows are added to the database.

4. When finished, click Extract.
The steps above can be used to define a new extract or configure an existing extract for
incremental refresh. If you are editing an existing extract, the last refresh is shown so you can
be sure you are updating the extract with the correct data.
If you publish the data source to Tableau Server you can specify a schedule for incremental
refresh as well as full refresh in the Schedules & Passwords dialog box.
Extract History
You can see a history of when the extract was refreshed by selecting a data source on
the Data menu and then select Extract > History.
The Extract History dialog box shows the date and time for each refresh, whether it was full or
incremental, and the number of rows that were added. If the refresh was from a file, it also
shows the source file name.


Adding Rows from a File
You can add new data to an extract from a file. For example, you may take an extract from a
data warehouse that has the past ten years worth of data. However, new data has been kept in
an Excel workbook. You can add the new data to the extract so that you can analyze the most
recent information against the historical data.
Follow the steps below to add data from a file.
1. Select a data source on the Data menu and then select Extract > Add Data From File.
2. Browse to and select the file that has the new data.
3. Specify any Joins or Custom SQL necessary. The columns in the file must match the
columns in the extract.
4. When finished, click OK.
The new rows are added to the extract. You can see a summary of the number of rows that
were added by selecting a data source on the Data menu and then selecting Extract > History.
When you refresh this extract, the data will be replaced with the data from the original data
source.

Upgrading Legacy Extracts
If you have data extracts that were created before version 6.0, you should upgrade the extracts
to use the data engine. When you open the workbook, you are given the option to upgrade the
extracts.

You can also upgrade the extracts by selecting a data source on the Data menu and then
selecting Upgrade Extract.

Optimizing Extracts
To improve performance when working with extracts you can optimize the extract. Optimizing an
extract creates secondary structure in the extract that speed up future queries.
Optimize the extract by selecting a data source on the Data menu and then selecting Extract
> Optimize.
The following types of optimizations are made:
Materialized Calculated Fields
Calculated fields are computed in advance and stored in the extract. In future queries, Tableau
can look up the already computed value rather than running the computation again. The
following types of calculated fields ARE NOT materialized:
Calculations that use unstable functions such as NOW() and TODAY()
Calculations that use external functions such as RAWSQL and R
Table calculations
In addition, if the formula for a materialized calculation changes or the calculation is deleted
from the data source, the materialized calculation is dropped from the extract until the extract is
optimized again.
Acceleration Views
When a workbook contains filters that are set to show only relevant values, computing the
available values for that filter can be an expensive query. For these filters, Tableau must
evaluate the other filters in the workbook first and then compute the relevant values based on
their filter selections. To speed up these queries, a view can be created that computes the
possible filter values and caches them for faster lookup later.

Updating Extracts on Tableau Server
You have the following options for updating extracts published to Tableau Server or
Tableau Online:
You can add the extract or a workbook that connects to it to a refresh schedule in
Tableau Server or Tableau Online (cloud-based data sources only).
You can update the extract in Tableau Desktop and then republish it.
You can add to or refresh the extract in Tableau Server or Tableau Online without first
adding to or refreshing the extract in Tableau Desktop.
The remainder of this topic describes the third option.
Refreshing Extracts Using Tableau Desktop
Before you attempt to update an extract, verify the following:
The data source was originally published as an extract.
Tableau Desktop is connected to the published data source, as indicated by the Tableau
Server icon next to the data source name in the Data window:

To refresh an extract on Tableau Server or Tableau Online from Tableau Desktop, right-
click the data source in the Tableau Desktop Data window, selectTableau Data Server, and
choose one of the following options:
Refresh from Source
Refreshes the entire extract using the data in the original data source.
This command is available only for extracts that include a connection to the original data
source. If you connected directly to a Tableau Data Extract file (.tde) and then published
it, the connection to the original data source is not included.
Append from File
Updates the extract from the contents of a file.
If you do not see the Tableau Data Server option, your data source may not be on Tableau
Server or Tableau Online (in which case it will not show the icon above). If you see the
Tableau Data Server option, but both commands are unavailable, the data source exists on
the server, but it is not an extract.
It is also possible to update an extract on Tableau Server using a command-line utility.
See Tableau Data Extract Command-Line Utility .

Tableau Data Extract Command-Line Utility
The Tableau Data Extract Utility is installed with Tableau Desktop. You can use this utility at
a Windows command prompt to refresh or add files to extracts published to Tableau Server
or Tableau Online.
To run the utility:
1. Open the Command Prompt as an administrator and change to the Tableau Desktop bin
directory. For example:
cd C:\Program Files\Tableau\Tableau 8.1\bin
2. Use either of the following commands, adding parameters from the command-options
tables below.
o tableau refreshextract
o tableau addfiletoextract
Note: When using the utility, always specify tableau on the command line or in scripts,
never tableau.exe.
Syntax and parameters for the tableau refreshextract command
Use tableau refreshextract to refresh an extract on Tableau Server or Tableau Online.
Refreshing an extract updates an existing extract with any modifications that have been
made to the data source since the last refresh.
To see help for this command, at the Windows command prompt, type the following
command:
tableau refreshextract --help
All options have a full form that you use with a double hyphen (for example, --server).
Some options also have a short form that you use with a single hyphen (for example, -s). If
the value for an option contains spaces, enclose it in quotation marks.
tableau refreshextract command options
Short Form Full Form Description

--source-username
<username>
A valid username for the data source connection.
Use this option with --source-password, or use
--original-file instead of the username and password options.

--source-password
<password>
The password for the data source user.

--original-file
<path and
Path and filename information for the data source that is to be refreshed on the server.
filename>

--force-full-
refresh
If the data source is set up for incremental refreshes, use this option to force a full extract refresh. If this
option is not included, an incremental refresh is performed. Not all data sources support incremental
refresh.
-s <server
http address>
--server <URL>
The URL for the Tableau server on which the data is published. To specify Tableau Online,
use http://online.tableausoftware.com.
-t <siteid> --site <siteid>
In a multiple-site environment, specifies the site to which the command applies. For Tableau Online, use this
argument if your username is associated with more than one site. For Tableau Server, if you do not specify a
site, the default site is assumed.
The site id is independent of the site name, and it is indicated in the URL when you view the site in a
browser. For example, if the URL for the page you see after signing in to Tableau Online is
https://online.tableausoftware.com/t/vernazza/views
the site id is vernazza.

--datasource
<datasource>
The name of the data source, as published to Tableau Server or Tableau Online.

--project
<projectname>
The project to which the data source belongs. If this option is not included, the default project is
assumed.
-u <username>
--username
<username>
Valid Tableau Server or Tableau Online user.
-p <password>
--password
<password>
The password for the specified Tableau Server or Tableau Online user.

--proxy-username
<username>
The username for a proxy server.

--proxy-password
<password>
The password for a proxy server.
-c <path and
filename>
--config-file
<path and
filename>
Path and filename information for a file containing configuration options for the command. See Using a
Config File below for details.
Sample tableau refreshextract command
The following command refreshes an extract named CurrentYrOverYrStats that has been
published to an on-premiseTableau Server. This command specifies the following:
The name of your Tableau Server.
Server user name and password.
Project name.
The name of the data source to refresh, along with the data source username and
password.
C:\Program Files\Tableau\Tableau 8.1\bin>tableau refreshextract --server
https://our_server_name --username OurServerSignIn --password OurServerPwd --
project "New Animations" --datasource "CurrentYrOverYrStats" --source-
username OurDatabaseSignIn --source-password OurDatabasePassword
The following command refreshes an extract named CurrentYrOverYrStats that has been
published to Tableau Online. This command specifies the following:
Tableau Online user and password.
Tableau Online site and project names.
The data source, which in this case is hosted by a cloud-based data source provider (for
example, Salesforce.com), and the username and password to sign in to the hosted data
source.
C:\Program Files\Tableau\Tableau 8.1\bin>tableau refreshextract --server
https://online.tableausoftware.com --username email@domain.com --password
OurServerPwd --site vernazza --project "New Animations" --datasource
"CurrentYrOverYrStats" --source-username
database_user@hosted_datasource_provider.com --source-password db_password
Syntax for tableau addfiletoextract
Use tableau addfiletoextract to append file content to an extract that has been
published to Tableau Server or Tableau Online. This command combines the two files.
If you want simply to update an existing extract with the latest changes, use
the refreshextract command instead. Using addfiletoextract to update an existing
extract will duplicate data instead.
To see help for this command, at the Windows command prompt, type the following
command:
tableau addfiletoextract --help
All options have a full form that you use with a double hyphen (for example, --server).
Some options also have a short form that you use with a single hyphen (for example, -s). If
the value for an option contains spaces, enclose it in quotation marks.
tableau addfiletoextract command options
Short Form Full Form Description

--file <path and
filename>
Path and filename information for the data file containing data to append. The file can be from Excel,
Access, a Tableau data extract, or a delimited text file. It cannot be password protected. Use UNC format if
the file is on a network share. For example,\\server\path\filename.csv
-s <server
http address>
--server <URL>
The URL for the Tableau server on which the data is published. To specify Tableau Online,
use http://online.tableausoftware.com.
-t <siteid> --site <siteid>
In a multiple-site environment, specifies the site to which the command applies. For Tableau Online, you
must include this argument if your username is associated with more than one site. For Tableau Server, if
you do not specify a site, the default site is assumed.

--datasource
<datasource>
The name of the data source, as published to Tableau Server or Tableau Online.

--project
<projectname>
The project to which the data source belongs. If this option is not included, the default project is
assumed.
-u <username>
--username
<username>
Valid Tableau Server or Tableau Online user.
-p <password>
--password
<password>
The password for the specified Tableau Server or Tableau Online user.

--proxy-username
<username>
The username for a proxy server.

--proxy-password
<password>
The password for a proxy server.
-c <path and
filename>
--config-file
<path and
filename>
Path and filename information for a file containing configuration options for the command. See Using a
Config File below for details.
Sample tableau addfiletoextract command
C:\Program Files\Tableau\Tableau 8.1\bin>tableau addfiletoextract --server
https://our_server_name --username OurServerSignIn --password OurServerPwd --
project New Animations --datasource "CurrentYrOverYrStats" --file
"C:\Users\user1\Documents\DataUploadFiles\AprMay.csv"
C:\Program Files\Tableau\Tableau 8.1\bin>tableau addfiletoextract --server
https://online.tableausoftware.com --username email@domain.com --password
OurServerPwd --site vernazza --project "New Animations" --datasource
"CurrentYrOverYrStats" --file
"C:\Users\user2\Documents\DataUploadFiles\AprMay.csv"
Using a Config File
You can use a plain text editor, such as Notepad or Text Edit, to create a config
(configuration) file that you can use with either tableau refreshextract or tableau
addfiletoextract. A config file can be useful if you expect to update the same data source
regularly over time. Instead of having to type the same options each time you run a
command, you specify the config file. A config file also has the advantage of not exposing
user names and passwords on the command line.
Create the Config File
For example, say you created a file called config.txt and saved it to your Documents folder.
And in the file, you included the parameter information shown below.
For an extract published to an on-premise Tableau Server:
server=https://our_server_name
username=OurServerSignIn
password=OurServerPwd
project=New Animations
datasource=CurrentYrOverYrStats
For an extract from a hosted data source, published to Tableau Online:
server=https://online.tableausoftware.com
username=email@domain.com
password=OurPassword
project=New Animations
datasource=CurrentYrOverYrStats
source-username=database_user@hosted_datasource_provider.com
source-password=db_password
Reference the Config File from the Command Line
After you create the config file, you run the tableau refreshextract or tableau addfiletoextract
command, pointing to the config file as the only option you use on the command line.
For example, to refresh the extract specified in the sample in the Create the Config
File section, you would run the following command (making sure that you are working in the
bin directory for your version of Tableau Desktop):
C:\Program Files\Tableau\Tableau 8.1\bin>tableau refreshextract --config-file
C:\Users\user1\Documents\config.txt
Syntax Differences for Config Files
The syntax for specifying options inside a config file differs from the syntax you use on the
command line in the following ways:
Option names do not begin with dashes or hyphens.
You use an equals sign (with no spaces) to separate option names from option values.
Quotation marks are not necessary (or allowed) around values, even when they include
spaces (as for the project option in the example shown earlier).
Use Windows Task Scheduler to Refresh Extracts
You can use Windows Task Scheduler, in combination with the Tableau Data Extract
Command-Line Utility, to automate regular updates to Tableau Desktop data sources from
within your corporate firewall. You can configure a task to occur once per day, week, or
month, or after a specific system event. For example, run the task when the computer
starts.
To learn more, see the Task Scheduler How To... page in the Microsoft TechNet library.

Tableau Data Extract API
Use the Tableau Data Extract API to connect to data that is not currently a supported
Tableau data source. With the Tableau Data Extract API, you create a program that
accesses and processes your data. You then use that program to create a Tableau Data
Extract (TDE) file.
The Data Extract API is available for developers on Windows and Linux platforms. Go to
the Get the Data Extract API page on the Tableau website, and choose the appropriate
version for your platform and programming language:
Data Extract API Python 32-bit
Data Extract API Python 64-bit
Data Extract API - C/C++/Java 32-bit
Data Extract API - C/C++/Java - 64bit
Notes for Developers
The Data Extract API includes a sample program, makeorder, coded in each supported
language to demonstrate a typical usage scenario: creating an extract containing product
orders. The application creates the extract order.tde with several columns of different types.
The general flow of the sample programs is:
1. Open an Extract object to create a new file.
2. Define the extracts schema using a TableDefinition.
3. Add the Extract table.
4. Insert rows.
5. Close all objects.
It is important to free memory by closing all objects, and it is particularly critical to ensure
Extract objects are cleaned up properly, particularly in non-native execution environments.
See the note sections below for language-specific details.
String columns in a Data Extract can be 8- or 16-bit and can be sorted according to many
available collations. By default, strings are sorted according to their binary representation,
though this can be changed on a per-table or per-column basis.
Python Notes
Objects in the Data Extract API are automatically closed by _del_ when necessary. While
garbage collection handles the vast majority of concerns related to releasing resources, it is
important to note that the virtual machine provides no guarantee that any particular object
will ever be freed. While most objects are merely memory, Extract objects represent
physical files created when close is invoked. Therefore, it is not safe to rely on garbage
collection to close Extract objects. We recommend using with statements to ensure Extract
instances are cleaned up. Alternatively, you can explicitly call close.
Java Notes
Data Extract API objects are automatically closed by finalize() as necessary. The Java
Virtual Machine does not guarantee that any particular object is ever garbage collected.
While most objects are merely memory that can be safely reclaimed by the operating
system at JVM shutdown, Extract objects represent physical files that are created
when close() is invoked. Therefore, it is important to invoke Extract.close() for all
Extract instances. We recommend using the try-with-resources construct introduced in Java
7. For earlier versions of Java, you must call close() explicitly.
C++ Notes
Data Extract API objects should be managed according to standard memory management
best practices, such as using stack variables or smart pointers. As in other languages, all
objects have a Close() method to free internal resources. Close() is invoked by the
destructor when necessary. However, it is important to note that Extract::Close() may
throw an exception, so it is safer to call it explicitly, rather than allowing an exception to
potentially escape the destructor.
C Notes
Objects in the Data Extract C API are managed through opaque TAB_HANDLEs. Every
created object must be closed. It is advisable to free objects in the reverse order of creation.
Using the Data Extract API in Microsoft Visual Studio 2010
Follow these steps to build a C or C++ project in Visual Studio 2010:
1. Extract the non-Python package to an installation directoryfor
example, C:\dataextract-8.0. This directory is the $(InstallRoot).
2. Add $(InstallRoot)\bin to the %PATH% environment variable.
3. Open Visual Studio and create a new console project.
4. Open the Property Manager and create a new Property Sheet for use with all targets in
the project. Add these paths:
Path Description
$(InstallRoot)\include C/C++|General|Additional Include Directories
$(InstallRoot)\lib VC++ Directories|Library Directories
dataextract.lib Linker|Input|Additional Dependencies
5. Add MakeOrder.c or MakeOrder.cpp from $(InstallRoot)\docs\samples.
6. Compile and run the application.
Using the Data Extract API in Eclipse
Follow these steps to build a Java project in Eclipse:
1. Extract the non-Python package to an installation directoryfor
example, C:\dataextract-8.0. This directory is the $(InstallRoot). Verify that the
package is compatible with your Java platform; 64-bit libraries do not work with a 32-bit
JVM and vice-versa.
2. Open Eclipse and create a new project. Add dataextract.jar and jna.jar to the build
path as External JARs.
3. Right-click on the projects src package and import $(InstallRoot)\docs\samples as
a local file system resource. Verify
thatsamples/com/tableausoftware/demos/MakeOrder.java is selected before
dismissing the dialog.
4. In the Run Configuration for MakeOrder, add an Environment
Variable PATH=$(InstallRoot)\bin.
Using the Data Extract API with Python
Follow these steps:
1. Verify that your installed Python is version 2.x, where x is 7 or higher. Version 3 is not
supported. Also verify that it matches the package you have downloaded: the 64-bit
module is incompatible with 32-bit Python and vice-versa.
2. Extract the Python package, open a command prompt, and navigate to the directory that
contains setup.py.
3. Run setup.py to install the module into site-packages.