ETL - BI - Users (Ops Package)

Last Updated 5/26/2021

Overview
Location
Schedule
Details
 Packages
    master
    user-stage-dw
    tableau-sync
 Python Modules
    ad_load_dw.py
    tableau_load_dw.py
    tableau_update.py
    tableau_helper.py    
    ad_helper.py
    dw_helper.py
Change Log
Troubleshooting Notes

Overview:

The project that stages BI users is the data warehouse's link to Azure AD and uses Microsoft Graph to stage all members of "BI-" AD groups into staging tables in the warehouse and also to sync that membership to Tableau Online.

The ETL has two main components.  A python package that pulls down data from Azure AD (Microsoft Graph) and from Tableau, and then a python package that syncs membership lists in Tableau.

The primary purpose of this job is to generate user lists for user dashboard and to sync membership with Tableau.  Uses include:

  1. Populate staging table schemas with azure ad and tableau membership data and sync back to Tableau
  2. dataMarq procs to build usership table 
    1. sp_d_bi_users
      Build d_bi_users table that stores membership in various BI groups

 Schemas in staging that are populated with this data include:

  • Base schema: ad (for azure ad data), tab (for tableau data)
  • CDC schema: N/A

Location:

The project – ETL-BI-Users– contains all the packages and python code and resides in the BI-Ops folder in the SSIS catalog.

Schedule:

The jobs in this project run as part of a single SQL Agent jobs (full dataMarq schedule).  

ETL – BI - Users – Everyday at 12:30pm and 8:30pm

Project and Package Details:

The packages in ETL – BI - Users work with the both the Microsoft Graph API and the Tableau API and use python modules to interact with these web services. 

Project Parameters

dbEnvr: INT, STG, PRD – which DB should be loaded with the data. Passed to python program as –dbEvnr argument

dwConnStr: connection to datamarq

dwStgConnStr: connection to datamarq_staging

gitEnvr: DEV or PROD. Because these jobs use python packages they python code is stored in our git folder on the BI shared drive.  For testing, we are able to point at the python code in DEV or in PROD.

groupFilter: This is passed to the Microsoft Graph API to only pull membership information for groups that match this filter. Generall we pass "BI-" to get all BI groups but can do specific groups for testing.

logLevel: DEBUG, INFO, WARNING, ERROR, CRITICAL – This is passed to the python scripts as an argument to set the log level for the jobs

pythonPath: The path to the python executable – saved in the D:/ drive on servers but usually in C:/ if executing locally/testing.

reportsOnly: True or False, passed to the AD stage job to only stage reports for the other jobs that use these same python codes.

Packages

MASTER

Overview

This primary orchestration package, it executes other "worker" packages in the project.

Package Parameters

None

Package Variables

None

Package Flow

  1. Execute the user_stage_dw package
  2. Execute the tableau_sync package
  3. Execute the user table stored procedure
    bi.sp_d_bi_users 

USER-STAGE-DW

Overview

This package has two tasks that fire of the python scripts that stage the data from Azure AD and Tableau.

Package Parameters

None

Package Variables

None

Package Flow

  1. Fire the python module ad_load_dw.py that stages all the BI groups and members into the staging database. Note the use of the project parameters and the need to encapsulate every command in quotation marks  


    "\"//marqnet.mu.edu/depts/ITS/BI/git/" + @[$Project::gitEnvr] + "/DW/ETL-BI/ad-load-sync/ad_load_dw.py\"  \"-dbEnvr\" \"" + @[$Project::dbEnvr]  + "\" \"-groupFilter\" \"" + @[$Project::groupFilter] + "\" \"-log\" \"" + @[$Project::logLevel]   + "\""

  2. Fire the python module tableau_load_dw.py that stages all Tableau groups and members into the staging database. Note the use of the project parameters and the need to encapsulate every command in quotation marks  


    "\"\\\\marqnet.mu.edu\\depts\\ITS\\BI\\git\\" + @[$Project::gitEnvr] + "\\DW\\ETL-BI\\ad-load-sync\\tableau_load_dw.py\"  \"-dbEnvr\" \"" + @[$Project::dbEnvr]  +  "\" \"-reportsOnly\" \"" + @[$Project::reportsOnly]  + "\" \"-log\" \"" + @[$Project::logLevel] + "\""

TABLEAU-SYNC

Overview

This package has two tasks, it execute the python module that syncs BI group users and members to Tableau, and also re-stages the Tableau membership detail after the sync is done so the staging tables are aligned.  Note that it is membership in the BI-Tableau AD group that gets a person added to Tableau as a view.  Also, to be made a group in Tableau, a group has to be listed on the staging table tab.tab_auth_groups.   Only groups that are listed on this table will be made groups in Tableau Online.

Package Parameters

None

Package Variables

None

Package Flow

  1. Fire the python module tableau_update.py that syncs Azure AD group membership and users into Tableau Online. Note the use of the project parameters and the need to encapsulate every command in quotation marks  
    "\"\\\\marqnet.mu.edu\\depts\\ITS\\BI\\git\\" +  @[$Project::gitEnvr]  + "\\DW\\ETL-BI\\ad-load-sync\\tableau_update.py\"  \"--dbEnvr\" \"" + @[$Project::dbEnvr] +  "\" \"--log\" \"" +  @[$Project::logLevel]  + "\""
  2. Fire the python module tableau_load_dw.py that stages all Tableau groups and members into the staging database. This realigns what is in the staging database after the sync. Note the use of the project parameters and the need to encapsulate every command in quotation marks  
    "\"\\\\marqnet.mu.edu\\depts\\ITS\\BI\\git\\" + @[$Project::gitEnvr] + "\\DW\\ETL-BI\\ad-load-sync\\tableau_load_dw.py\"  \"--dbEnvr\" \"" + @[$Project::dbEnvr]  +  "\" \"--reportsOnly\" \"" + @[$Project::reportsOnly]  + "\" \"--log\" \"" + @[$Project::logLevel] + "\""

 

Python Modules

Qualtrics data is accessed through a web service which dataMarq achieves through the use of python modules.  These all reside in the ETL-Qualtrics solution – and therefore in the git repository – alongside the SSIS packages, which call them.  They also require a config file that is stored in the BI shared drive, config folder. 

Qualtrics API resource: https://api.qualtrics.com/api-reference/

ad_load_dw.py

This program loads the BI group data (users, groups and group members) into the ad staging schema in the data warehouse.

Repo link

 Parameters

dbEnvr – INT, STG, PRD

groupFilter – This is passed to the Microsoft Graph API to only pull membership information for groups that match this filter. Generall we pass "BI-" to get all BI groups but can do specific groups for testing.

logLevel – the level to log to the log folder for the job (WARNING, INFO, ERROR, DEBUG)

 Quasi code flow

  1. Set the config file path based on the dbEnvr. Configs are stored in the BI shared drive folder "config".
  2. Get the refresh token to authenticate to Microsoft graph.  This calls the get_token function in the ad_helper.py module to refresh the token on each run.The tokens are stored in the BI shared drive folder "tokens"
  3. Get the groups list using the token and web endpoint by calling the get_group_dict fuction in the ad_helper.py module.  This calls the webservice and gets every group that matches the given groupFilter.
  4. Get the membership lists from Azure AD by calling the get_update_list function in the ad_helper.py module to get a list of members for each BI group.
  5. Connect to the staging database in the data warehouse by calling db_connect in the dw_helper.py module
  6. Get the tables, fields and values to be inserted into the ad staging tables.  The tables and fields are stored in the config file, the values come from the lists generate in the step above.  
  7. Truncate tables and load all values just pulled down from Azure AD into the staging tables for group, users and group members.  This is done by calling functions from the dw_helper.py module, including trunc_table, insert_rows and various "update" functions.

tableau_load_dw

This program loads the user and group data (users, groups and group members) from Tableau into the tab staging schema in the data warehouse.

Repo link

 Parameters

dbEnvr – INT, STG, PRD

reportsOnly – True or False, passed to the API to only load reportst for the reports only load

logLevel – the level to log to the log folder for the job

 Quasi code flow

  1. Set the config file path based on the dbEnvr
  2. Create a connection to Tableau Online using the tableau_connect function in the tableau_helper.py module
  3. Populate lists of views and permission using the functions in the tableau_helper.py module (get_workbook_views, get_workbook_permissions)
  4. If the reportsOnly variable is "False", populate userList and groupList with lists from Tableau API
  5. Create a list of user details for insertion into the staging table
  6. Create a list of groups and member details for insertion into the staging table
  7. Connect to the staging database in the data warehouse by calling db_connect in the dw_helper.py module
  8. Get the tables, fields and values to be inserted into the tableau staging tables.  The tables and fields are stored in the config file, the values come from the lists generate in the step above.  
  9. Truncate tables and load all values just pulled down from Tableau into the staging tables for group, users and group members.  This is done by calling functions from the dw_helper.py module, including trunc_table, insert_rows and various "update" functions.

tableau_ad_sync.py (decomissioned)

 

No longer in use.

tableau_update.py

This program sync data from the ad staging tables for users and groups members to Tableau Online.  It only syncs groups that are identified for upload to Tableau and group members who have been identified as Tableau viewers.

Repo link

 Parameters

dbEnvr– INT, STG, PRD

logLevel – the level to log to the log folder for the job

 Quasi code flow

  1. Set the config file path based on the dbEnvr
  2. Create a connection to Tableau Online using the tableau_connect function in the tableau_helper.py module
  3. Populate a list of groups and users that need changing in Tableau Online by using the get_group_compare and get_user_compare functions in the dw_helper.py module
  4. Add and delete groups, users and group membership using the update_tableau_groups and update_tableau_users function from the tableau_helper.py module

ad_helper.py

 All of the functions that are used to interact with the Azure Microsoft Graph APIs.  These are mostly derived from the python module "adal" which is open source and created for interacting with Azure AD.   Details are here: https://adal-python.readthedocs.io/en/latest/aa

Repo link

api_endpoint

Parameters:

url - the url to turn into an approprirate endpoint

Function: Converts a relative path into a full uri based on the adal endpoint

Output:  server connection object

get_token

Parameters:

None

Function: Takes a credential - stored in the config file - and acquires an authentication token to interact with Azure AD

Output:  token, tokenType

get_response

Parameters:

url - The url endpoint for a webservice call

headers - The headers to pass to the request to the web service

Function: A generic function that calls the requests module to get a response from a web service.  The response is transformed into JSON for the output variable

Output:  nextPage - if the response has a next page url, JSON - the response in JSON format

get_group_dict

Parameters:

token - The token for Azure AD access

tokenType - The token type that is passed (refresh, access, etc.)

params - Any specific parameters to tag onto the request, many Azure AD request take specific parameters

Function: Takes a token to Azure AD and calls the groups request end point to get a list of all the groups that match a certain param. For this process we generally pass through the filter for group that start with "BI-"

Output:  groupDict - a dictionary of groups from Azure AD

get_member_dict

Parameters:

groupId - The Azure AD group id for a specific group

token - The token for Azure AD access

tokenType - The token type that is passed (refresh, access, etc.)

params - Any specific parameters to tag onto the request, many Azure AD request take specific parameters

Function: Takes a token to Azure AD and calls the groups members request end point to get a list of all the groups members for a given group. These are then stored in a dictionary with specific user details

Output:  memberDict - a dictionary of group members, with upn as the key and various other pieces of info on the user like displayName, department, jobTitle, etc.

get_update_lists

Parameters:

token - The token for Azure AD access

tokenType - The token type that is passed (refresh, access, etc.)

groupDict - dictionary object of AD groups

Function: Takes a token to Azure AD and a dictionary of groups and calls the get_member_dict function (see above) for each group, creating lists of groups, group members and individual users in separate lists

Output:  groupList, memberList, userList

 

tableau_helper.py

 All of the functions that are used to interact with the tableau APIs.  These are mostly derived from the Tableau python package created by Tableau and offered for free.  Details are here: https://tableau.github.io/server-client-python/docs/

Repo link

tableau_connect

Parameters:

None

Function: Uses data from the config file to connect to the Tableau Online instance

Output:  server connection object

 get_tableau_user_list

Parameters:

server – The connected Tableau server object

Function: Takes the server connection object and returns a list of all users

Output:  userList -  list of all users on the Tableau Online instance

 get_tableau_group

Parameters:

server – The connected Tableau server object

Function: Takes the server connection object and returns a list of all groups

Output:  groupList -  list of all groups on the Tableau Online instance

get_tableau_group_members

Parameters:

server – The connected Tableau server object

groupList - The list of groups on Tableau

Function: Takes the server connection and a list of groups and loops through each group populating a list of members of that group.

Output:  groupDetails -  the details for each group, memberDetails - the details for each member of the group

get_user_item

Parameters:

server – The connected Tableau server object

upn - The email of the user

Function: Takes a server connection and the upn of a user and gets the details for that user

Output:  userItem - the user Object from Tableau

get_group_item

Parameters:

server – The connected Tableau server object

group_name - The name of the group

Function: Gets the group object  for the group passed to the server object

Output:  groupItem - the group Object from Tableau

get_tableau_workbook_list

Parameters:

server – The connected Tableau server object

Function: Calls Tableau API to get list of all workbooks

Output:  workbookList - a list of all workbooks on the Tableau instance

get_workbook_permissions

Parameters:

server – The connected Tableau server object

wbObject – the workbook object from Tableau

Function: Takes a workbook object and returns a list of users with their permissions

Output:  permissionList - a list with the detail permissions for the workbook

get_workbook_views

Parameters:

server – The connected Tableau server object

wbObject – the workbook object from Tableau 

Function: Takes a workbooks object and returns a list of all the views in the workbook

Output:  viewList - a list of views with some details

update_tableau_groups

Parameters:

server – The connected Tableau server object

groupCompareList – a list of groups generated from the data warehouse function in the dw_helper.py module

Function: This function takes a group compare list and uses it to add or delete groups within Tableau Online so they are synced with the information in the data warehouse

Output:  Number added, Number deleted

update_tableau_users

Parameters:

server – The connected Tableau server object

userCompareList – a list of users generated from the data warehouse function in the dw_helper.py module

Function: This function takes a user compare list and uses it to add or delete users within Tableau Online so they are synced with the information in the data warehouse

Output:  Number added, Number deleted, Number with role changes

update_tableau_group_membership

Parameters:

server – The connected Tableau server object

group – the group Object for Tableau Online

memberList - a list of members taken from the data warehouse that should be in the given group

Function: This function takes a group object and a member list and makes sure the members in Tableau Online align with what is in the data warehouse.

Output:  Number added, Number deleted

dw_helper.py

All of the functions that are used to interact with the datamarq database

Repo link

insert_rows

Parameters:

connection – the python connection object to datamarq

table – the table name for insert

fieldList – the list of fields to be inserted

valueLIst – the list of values to be inserted

Function: Generic function that inserts values into a given table and fields

Output:  s – number of successful rows inserted, f – number of failed rows caught in exception

get_user_compare

Parameters:

connection – the python connection object to datamarq

Function: Executes a query in datamarq to get the userCompareList. The query is specified within the function.  It compare what is on the ad staging tables with what is on the Tableau staging tables

Output:  userCompareList – a list of users

get_group_compare

Parameters:

connection – the python connection object to datamarq

Function: Executes a query in datamarq to get the groupCompareList. The query is specified within the function.  It compare what is on the ad staging tables with what is on the Tableau staging tables

Output:  groupCompareList – a list of groups

get_member_compare

Parameters:

connection – the python connection object to datamarq

Function: Executes a query in datamarq to get the memberCompareList. The query is specified within the function.  It compare what is on the ad staging tables with what is on the Tableau staging tables

Output:  memberCompareList – a list of group members

 

get_user_membership

Parameters:

connection – the python connection object to datamarq

upn - a users unique upn (marquette email)

Function: Take a user's upn and returns the ad groups they are members of (with tab id if also a member in tableau) based on what is in the staging tables

Output:  groups – a list of groups a given user is a member in

 

update_hash

Parameters:

connection – the python connection object to datamarq

table – the table name

fieldList–List of fields in the table

Function: takes a field list and builds a hash statement to get a unique row hash then runs an update statement

Output:  None

cursor_list

Parameters:

cursor – a python sql cursor object (select statement usually)

name – the name of a tuple for each output row of the cursor

fields–list of fields in the output row

Function: Creates a result set (python list) from a cursor execution as a each row a named tupled with defined fields

Output:  resultSet – python list of named tuples as each row
 

update_audit

Parameters:

connection – the python connection object to datamarq

table – the table name

stageSource – The string to be put into the stage_source field in the staging table

Function: Runs commands to update the audit fields in a staging table with the default getdate() for the modified and created date

Output:  None

update_dt

Parameters:

connection – the python connection object to datamarq

table – the table name

dt_field – The date to be entered into a datefield

Function: Runs commands to update the given date field to the given date

Output:  None

alter_index

Parameters:

connection – the python connection object to datamarq

table – the table name

method – The method to be run on the index (disable, rebuild)

Function: Runs commands to alter all indexes on the table for the given method

Output:  None

trunc_table

Parameters:

connection – the python connection object to datamarq

table – the table name

Function: Runs command to truncate the given table

Output:  None

  

Change Log:

Date Developer Change Record
5/24/2021 N/A Initial Documentation


Troubleshooting Notes:

  1. Timing of Adding Members to Tableau

    Group members need to be added first in the AD group.  Then the AD group needs to be synced to Azure AD before it can be pulled back down and the membership/user and by synced to Tableau. Sometime the timing of this is off when the ETL - BI - Users job is run and there is delay adding someone.  Usually they are up in Azure AD within 30 minutes and the job can be run without a problem manually.  But ocassionally timing will be a problem
  2. Member Compare Query Issues

    This should be resolved but there have been issues fixed with the member compare query. It is a bit of a complicated query because it has to check for membership in a group and also membership in the BI-Tableau group.  Membership in the BI-Tableau group is what gets people ultimately into Tableau.  If someone isn't being added for some reason, check this query to make sure things show up correclty from the staging tables.  Members to be added shoudl be missing the tab data from the query and members to be deleted should be missing the ad data. Members with both sets of data shouldn't be changed.