Dropping a SQL table in your Synapse Spark notebooks

dazfuller
Oct 21, 2021
4 min read

Updated: May 1, 2022

For the Python version of the code below, see the follow-up post.

One of the nice things with Spark Pools in Azure Synapse Analytics is how easy it is to write a data frame into a dedicated SQL Pool.

curated_df.write.mode("overwrite").synapsesql("curation.dbo.feed_in_tarrifs", Constants.INTERNAL)

There's couple of gotchas though. The first is that this is only available through Scala (though you can work around this to a degree with Python). But the one which normally catches people pretty quickly is that you can only do it once.

The once thing is a pretty big gotcha, in that it normally comes when you try to run the cell again, but you get an error. So you check the documentation and then spot the following written in there.

The table must not exist in the dedicated SQL pool or an error will be returned stating that "There is already an object named..."

So what can you do?

Well, one way could be to add an activity in your Synapse pipeline which deletes the table first? Or you could just drop in and manually drop the table? Both would work but you don't really want to be running a pipeline or switching views when working on a problem, so what else can we do?

Whilst Spark 3 in Synapse currently doesn't support SQL with JDBC connectivity, it does come the Microsoft JDBC Driver for SQL Server, and this gives us options.

In the solution here, I'm going to show how to drop the table using the JDBC driver, and using the Synapse Managed Identity. This means that the code will work when run as part of a pipeline, and it means that you don't have to start assigning permissions to others, as long as the managed identity has sufficient permissions in the first place. So what's the first step?

N.B. I'm doing all of this in Scala as it's much easier to work with for this purpose

I'm going to be using the TokenLibrary here, but you could do something similar with mssparkutils. The reason I'm not? Well the Token Library gives us a map back which makes this a lot easier to work with. To use it we're just need to do the following (we're also going to bring in some Java SQL libraries for use later)

import com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;

Next we want to get the server name that we want to connect to. When you create a Synapse Analytics workspace it automatically creates a default Linked Service for us, we're going to use it to get the server endpoint. We're going to use mssparkutils to get the workspace name so that we can re-use the same code in multiple workspaces.

val connectionProperties = TokenLibrary.getPropertiesAsMap(s"${mssparkutils.env.getWorkspaceName}-WorkspaceDefaultSqlServer")

val server = connectionProperties.get("Endpoint").get.asInstanceOf[String].split("[:,]")(1)

The split operation is because the endpoint takes the form of "tcp:<server name>,<port>", and we just want the name part. We're also having to convert to string because the value in the map is an Any type.

Importantly we also need to authenticate with the endpoint. We already have managed identity access but to use this we'd need to know the client id. We can get that from the workspace properties but I really don't want to hard-code anything here. So instead we're going to use Token Based Authentication. So how do we get a token?

Well, the Linked Service makes this really easy for us as it's already done the hard work. Using the connectionProperties we collected previously we can simply access the AuthKey property and get our token.

val authKey = connectionProperties.get("AuthKey").get.asInstanceOf[String]

Once we have that, and we've set our database name (SQL Pool name), we can connect and drop our table.

val ds = new SQLServerDataSource()
ds.setServerName(server)
ds.setDatabaseName(dbname)
ds.setAccessToken(authKey)

var connection: Connection = null

try {
    connection = ds.getConnection
    val stmt = connection.createStatement

    stmt.executeUpdate("DROP TABLE dbo.my_example_table")
} catch {
    case e: Throwable => println(e.getMessage)
} finally {
    if (connection != null && !connection.isClosed) {
        connection.close()
    }
}

And that's it :)

There's just a word of warning here.

This connection will come through as being from the managed identity. So you don't want people using the environment to follow this same approach generally, otherwise you won't know who is actually logging in. It's the same as using a single SQL account for all your devs. So this best left to processes which will run as part of a pipeline.

Otherwise, have fun making the most out of the synapsesql writer.

Update 2022-04-09, working around breaking changes

Microsoft changed the linked services which are created by default to be parameterised. This breaks the above solution because TokenLibrary currently can't work with parameterised Linked Services. Fortunately there's a way to work around this, but it's not using TokenLibrary which sort of breaks with the title of the post, but it works and we're all about solutions right ;)

In the above, instead of using TokenLibrary to get at the connection string and the authentication key, we can use mssparkutils instead. It's still possible in TokenLibrary but the code is a bit cleaner.

val server = s"${mssparkutils.env.getWorkspaceName}.sql.azuresynapse.net"
val dbname = "my_db_name"
val authKey = mssparkutils.credentials.getToken("DW")

Using this we can programmatically set the connection string, and we can still get an authentication token which is scoped to Dedicated SQL pools (this is in the documentation for mssparkutils if you look at the getToken method description).

11 Comments

UUdolfiJelenai

Apr 19

Cancellara wasn't always a Tudor man, but he was always destined to be a watch collector, starting with a Portugieser he received as an anniversary gift from his link wife. He would spend much of his later career wearing other IWCs — Aquatimers and Big link Pilots, even racing with them on-wrist. When pressed on why one would wear a mechanical watch while link cycling, he replied with a shrug "Because it's cool."

Apr 18

This is one I had to learn as I progressed through my dive and dive-watch career – your dive link watch does not have to be a giant lump of steel link that only works over a wetsuit or when you're surrounded by a literal ocean's worth of free space. Sure, a hardcore and extra toolish dive watch can link be a ton of fun, but as outlined above, you just need a watch that fits a pretty achievable list of specifications.

KaleoxKendax

Apr 07

Secondary market resource Chrono24 recently link went so far as to label the F.P. Journe Élégante 48 as one of its "Top Five Investment Watches link Of 2022", citing consistent sales over link $50,000. When it was first introduced, the Élégante started around $13,000. With the 2022 price increase, the collection now begins at $14,500 for the Élégante 40 and $16,700 for the Élégante 48 – that is, if you can find an example at retail.

VYanislJennyl

Mar 26

A pilot in western Australia, Rob link bought this Tudor link Black Bay GMT after being upgraded to captain. Along with commemorating a career achievement, this watch also gets utilized how it was intended: as a tool in the cockpit in flight and in the water when link Rob goes surfing or free-diving in the Indian Ocean.

qezoruzic

Feb 17

Exceptional service paired with best salon in naples florida high-quality hair products ensures lasting shine and strength. Whether for styling, coloring, or treatment, satisfaction is guaranteed.