• dazfuller

Dropping a SQL table in your Synapse Spark notebooks (Python Edition)

So since writing the original post about how to drop a SQL table from a Spark Notebook I've been meaning to follow it up with a version for those using PySpark. It might seem a straight forward thing to do, but connectivity and quirks of Synapse don't make it quite so simple. But anyway, here's the short and (hopefully) easy-to-follow post on how to do it in PySpark. If you want to know why these posts exist then I encourage you to read the original.

This time we're going to use PyODBC which is already deployed with Spark pools, so no need to install additional libraries. The first thing we need to do is, as before, use Synapse to get our connection string and an access token.

import struct
import pyodbc

Okay, so that actual first thing is a couple of imports, but then it's getting our information.

server = f"{mssparkutils.env.getWorkspaceName()}.sql.azuresynapse.net"
dbname = "curation"
auth_key = bytes(mssparkutils.credentials.getToken("DW"), 'utf8')
driver = "{ODBC Driver 17 for SQL Server}"
table = "dbo.example_table"

connection_string = f"DRIVER={driver};SERVER={server};DATABASE={dbname}"

Again we're using mssparkutils to get our workspace name, and we're using it to get an access token.

Connecting using PyODBC and an access token is possible in newer versions with the latest drivers, along with a host of other Active Directory based connectivity options. But connecting with a token is pretty poorly documented right now with the best information coming from a github issue. The short version of this though is that you need to take the authentication key and convert it to the right type.

exp_token = b""
for i in auth_key:
    exp_token += bytes({i})
    exp_token += bytes(1)
token_struct = struct.pack("=i", len(exp_token)) + exp_token

Once that's done though, connecting is pretty easy, and once connected we can check for our table and drop it if it exists (1256 is the SQL_COPT_SS_ACCESS_TOKEN option).

with pyodbc.connect(connection_string, attrs_before={ 1256:token_struct }) as conn:
    conn.autocommit = True
    cursor = conn.cursor()
    cursor.execute("SELECT OBJECT_ID(?, 'U')", table)
    object_id = cursor.fetchone()[0]
    if object_id is not None:
        print(f"Dropping table {table}")
        cursor.execute(f"DROP TABLE {table}")
        print(f"{table} does not exist")

There is (surprise) an issue with connecting to Azure Synapse through the ODBC driver however which can cause issues with transactions (the default for the driver). To work around this we have to set the connection auto-commit to true.

And that's it. Have fun and remember to double check which table you're dropping before you run the script 😉

25 views0 comments

Recent Posts

See All

For the Python version of the code below, see the follow-up post. One of the nice things with Spark Pools in Azure Synapse Analytics is how easy it is to write a data frame into a dedicated SQL Pool.