• azurecoder

Azure Data Explorer my new favourite thing

Over the past few months I've been working with Azure Data Explorer (ADX) and have been liasing with Microsoft product team (who incidently gave a great talk to @ukazure which can be found here https://youtu.be/SRLZ6KkoaOM)

It has been a while since I have dug deep into a new technology warts and all but the best way is always to write and use a tool and discover the issues as you go.

Data Explorer offers a core capability of querying billions of data points through a linearly scalable cluster. Whilst you can view data "externally" like that in ADLS it's not that quick so there are two general storage layers which help.

  • Ingestion

  • Caching

You can ingest as much as you want and Azure Data Explorer (ADX) will allow the cluster to be queried against the ingested data. Data that is ingested is indexed and partitioned in "extents" so the data can be effectively parallelised in the query. There's a lot you can do here to control how the data is ingested as well which hopefully we'll explore in this series.

Caching is way of allowing relatively newer queries to be cached so that all subsequent queries run faster. Think of Spark and the cache() function but this is implicit. The more cache you have the faster things are. You'll find that ingestion and caching are two things that you'll need to scale up for which will depend on the volume of data you have and the number of users that are using the cluster.

I've created dexcmd so that I can control and manipulate everything around the edges of ADX. It's in it's infancy at the moment but it has been built to return information back from ADX on clusters and tables and permissions. I've just also added a function in today which will allow you to add users to a table (which is the lowest level of permission granularity). Give the tool a go, it's been built for people that need to support ADX and I hope to make it more intuitive over the coming months. You can see it here https://github.com/elastacloud/dexcmdline. Packages are made to install directly and I'll look to add this through choco going forward.

Anyway, planning to blog more in the coming months so have decided that I'll be blogging over things that are more palatable and building a series called ADX: 30 facts in 60 days which will allow to explore

  • command and control

  • ingestion

  • caching

  • kusto

Keep tuned I hope you get something out of this and I hope I stick with it.

46 views1 comment

Recent Posts

See All

For the Python version of the code below, see the follow-up post. One of the nice things with Spark Pools in Azure Synapse Analytics is how easy it is to write a data frame into a dedicated SQL Pool.

Note: this post is mainly about Azure but it can apply to any cloud. You get to a stage when you hear enough definitions of what people think a job is to take you to the point of nervous exhaustion. T