How To Use Tableau Prep

 

Tableau Prep (2018.1.2)
Note
: All the information in this post is based on the Tableau Prep (2018.1.2) that was released on June 12 ,2018. Functionality and options available might change as time progresses. Tableau Prep 2018.1.2 is now available to everyone with the Creator license or current maintenance. You can download it  here.

Tableau Prep is a new product from Tableau designed to help the users combine, shape and clean their data for analysis. It’s seamlessly integrated with Tableau Analytical workflow, making it easy to go from data prep to analysis. A lot of our data needs some preparation before starting our analysis and visualization, or maybe we want to combine it with multiple data sources. These tasks can be done in tableau desktop itself, however it can become a bit messy with the changes we need to apply to data. The workbook performance may not be up to the mark because of lot of calculated fields and filters applied on data as part of data manipulation for analysis. Tableau Prep allows us to perform all these tasks in a much simpler way, to easily replicate the workflow in future and to share it with others.  

Tableau prep works in a step by step basis where we connect to dataset(s) we want to prepare then use options to filter, rename, join, create and aggregate etc, and then verify the flow, and at the end output the data to Tableau extract or publish it to Tableau Server or Tableau Online. We can easily track, review and edit the steps independently in the work flow.

Tableau Prep Workspace
The Tableau Prep workspace consists of the Connections pane and three coordinated areas that help to interact and explore data. 

Connections Pane: On the left side of the workspace is the connections pane which shows the databases and files you are connected to. Add connections to one or more databases and then drag the tables you want to work with into the Flow pane (To the right) 


Tableau Prep 1

 

Flow pane: At the top of the workspace is Flow pane which is a visual representation of our operation steps as we prepare our data and aligns from left to right. This workflow explains what operation is performed and in what order and how data is being manipulated. Files can be dragged from connections pane to Flow pane to join, pivot, manipulate, aggregate data, and generate output files (. tde, .tds, and .hyper ) .These output files can be used in Tableau Desktop for further analysis and visualization.

Tableau Prep 2

 

Profile pane : In the center of the workspace is the Profile pane. The Profile pane shows you the structure of your data at any point in the flow. The structure of your data can be represented in different ways depending on the operation you want to perform on your data or the step that you select in the Flow pane. This is where you will do most of your data cleaning.

At the top of the Profile pane is a toolbar that shows you the cleaning operations that you can perform for each step in your flow.  A drop-down menu also appears on each card in the Profile pane where you can select the different operations that you can perform on the data.

For example:

  • Search, sort, and split fields
  • Filter, include, or exclude values
  • Find and fix null values
  • Rename fields
  • Clean up data entry errors using group and replace or quick cleaning operations
  • Use automatic data parse to change data types
  • Rearrange the order of your field columns by dragging and dropping them where you want them

Tableau Prep keeps track of any changes you make, in the order you make them, so you can always go back and review or edit those changes if needed. Use drag and drop to re-order those operations to experiment and apply changes in a different order.

Tableau Prep 3

Click the arrow on the upper right of the pane to expand and collapse the Changes pane for more room to work with the data in the Profile pane.

Data grid : At the bottom of the workspace is the Data grid, which shows you the row level detail in your data. The values displayed in the Data grid reflect the operations defined in the Profile pane. You can filter, keep only and exclude individual field values in this grid, and drag and drop fields to change the display order.

Tableau Prep 4

 

How to Use Tableau Prep 

Use Case: For a Superstore to analyze product sales and profits over the last four years for the company using Tableau. For this Superstore data has been collected and tracked differently for each region. The data is entered differently for different regions for which lot of data cleaning is required.

 

        Connecting to Data

Let us use the sample Superstore data files located here:

 (Windows) C:\ProgramFiles\Tableau\TableauPrep<version>\help\Samples\en_US\Superstore Files

1. Sales data files for the different regions are stored in different formats, and the orders from the South are in multiple files. Since South has multiple files lets collate those files first.

Tableau Prep 5

 

 

 

 

 

 

 

2. On the Connections pane, click the Add connection   Button Photos  button. We must create an input step to start a flow. We can include multiple input steps and multiple data files as required.

Tableau Prep 6

3. Since these files are .csv files, select Text files in Connections pane.

Tableau Prep 7

4. Navigate to the directory for superstore files and select the first file orders_south_2015.csv to add it to the flow.

Tableau Prep 8

Once the file is connected, the Tableau Prep workspace opens, and it is divided into two main sections. The Flow pane at the top and the Input pane at the bottom. In Tableau Flow Pane the first step is created. 

 

Tableau Prep 9

Flow Pane: We can interact with Data visually and build flow. For single Tables Tableau Prep creates an input step when the file is added through connections or we can drag and drop the tables to add to the pane.

Input Pane: The Input pane contains configuration options about how the data is ingested. It also shows you the fields, data types, and sample values for your data set.

5. To add remaining 3 files, click on the Multiple files tab in input files section.

Tableau Prep multiple Files

6. Select Wild Card Union Option.

To select multiple files in the same parent/child directory at once, wild card option is useful. However, the files should be with similar name and structure.

Tableau Prep 11
We can see that all files in the south folder are included by using wild card union option.

7. The files for the other regions are all single table files, so you can select all the files at once and add them to your flow. (Drag and drop to Flow pane / Use connections button to select from the folder as shown in step 2)

Tableau Prep 12

 

Tableau Prep 13

 

Exploring Data

To prepare analysis -ready data, understanding the data is an important step to spot any issues. We can look over the data and make quick fixes in the input pane.

1. In the Flow pane, click the Orders_Central Input step to select it.

Tableau Prep 14

2. In the Input pane data fields are available and we can scroll down to spot issues.

Tableau Prep 15

 

Identified Issues Sample.

1. The order dates and ship dates are separated out into fields for month, day, and year.

Tableau Prep 16

2. Some of the fields have different data types than the same fields in other files.

Order date in Orders_Central file is number, however it’s Date& time in Orders_West file as shown below: 

Tableau Prep 17Tableau Prep 18

We need to clean data and fix issues. But we can’t do it in input step yet. Keep inspecting for other files as well and spot issues to make changes in later steps.

 

Cleaning & Shaping Data

In Tableau Prep, examining and cleaning data is an iterative process. To clean, combine and shape data we need to add steps in the flow. Different step types (clean, add, split, aggregate, merge, rename and so on) are used based on the operation required for the data.

Lets clean Orders_central  

1.  Select Orders_central file in the flow pane and click on the + symbol, a pop-up window opens with option as shown below. Click on Add Step.

Tableau Prep 19

When we add a cleaning step to the flow, the workspace changes and we can see the details of our data.

Tableau Prep 20

The Profile pane shows the structure of our data, summarizing the field values into bins so that we can quickly see related values and spot outliers and null values. This is where we will perform most of our cleaning tasks.

 

Adding new filed (Using calculated filed)

This data set is missing a field for Region. Since the other data sets have this field we will need to add it so that we can combine our data later. We will need a calculated field for that.

1. Profile pane toolbar -> Create Calculated filed

Tableau Prep 24

2. Name the filed as Region and enter “Central”, click Save.

Tableau Prep 22

 

Tableau Prep 23

 

Fixing Dates  (Order Year, Order Month, and Order Day fields into -> format "MM/DD/YYYY".)

In Orders_Central file order date and ship date fields are separate fields for date, year and month. We want to combine them into two single fields, one for Order Date and one for Ship Date so they align with the same fields in the other data sets.

1. Profile pane toolbar -> Create Calculated filed

Tableau Prep 24

2. Name the calculated field Order Date. Then enter the following calculation into the Calculation editor and click Save:

MAKEDATE ([Order Year], [Order Month], [Order Day])

Tableau Prep 25

Order Date filed is created

Tableau Prep 26

 

Since we have separate Order Date field, we can delete Order Year, Order Month, and Order Day fields.

3. In the In the Profile pane, in the search box, type Order. Tableau Prep scrolls all the fields with the name Order included in it.

Tableau Prep 27

4. Ctrl+click (Command+click on Mac) to select the fields for Order Year, Order Month, and Order Day. Then right-click on the selected fields and select Remove Field from the menu to remove them.

Tableau Prep 28

Repeat 2-4 steps for Ship Date

Tableau Prep 29

 

Reviewing Changes

To shape and clean the data we have performed may tasks. To review the changes we made, click the arrow on the left side on the profile pane to open it. We can see the list of changes we made with details.

Tableau Prep 30

 

Combining Data

After making sure that the files have similar fields, you want to union the files together to add the rows from each file into a single table.

1. Drag Orders_west file to Orders_East and use Union step.

Tableau Prep 31

 

2. Add Orders_Central’s cleaned extract to the Union1.

Tableau Prep 32

 

3. Drag the orders_south_2015 step to the new Union step. Drop it on Add to add it to the existing union. Now all our files are combined into a single table. In the Flow pane, select the Union step to see your results.

Tableau Prep 37

 

4. Now all our files are combined into a single table. In the Flow pane, select the Union step to see results.

Tableau Prep 34

We notice that Tableau Prep automatically matched up the fields that had the same names and types. We also see that the colors assigned to the steps in the flow are used in the union profiles to indicate where the field came from and appear in the colored band across the top of each field to show you if that field exists in that table.

5. We notice that a new field called Table Names was added that lists the tables where all the rows in the union come from.

Tableau Prep 35

6. We see a check box at the top of the Union profile to see only mismatched fields, and you see under Resulting Fields on the left that you have a few, so you want to see what's there. In the Union profile pane, select the Show only mismatched fields check box.

Tableau Prep 36

 

 

Running Flow & Generating Output Data

1. In the Flow pane, select Union1, click the plus + icon and select Add Output.

Tableau Prep 37

2. In the left pane select Save to file.

Tableau Prep 38

 

3. Click the Browse button, then in the Save Extract As dialog, enter a name for the file, for example Orders_Superstore, and click Accept.

4. In the Output type field, select the output type. Depending on the version of Tableau Desktop you use you can choose from the following options:

         Tableau Data Extract (. hyper), Tableau Data Extract (.tde), Comma Separated Values (.csv).

Tableau Prep 39

5. Click the Run Flow button to generate your output.

Tableau Prep 40

Tableau Prep 41

 

Tableau Prep 42

 

The Analysis-ready extract is ready for Tableau Desktop (.tde )

 

 


 

Published on 07/25/2018

Authored by Jamie Deadmond