The new new normal
Well the number of people looking for work in the cloud and data sector has now shot up as everyone adjusts their forecast expectations...
Pies, Lies and AIs
Showing the struggle of organisations to address the data revolution
Exploring the world of data and organisational intransigence
Well the number of people looking for work in the cloud and data sector has now shot up as everyone adjusts their forecast expectations...
The nature of Elastacloud is beginning to change. Most of my advisors and sales team have never written a line of code in their life and...
I was stuck in traffic today for a few hours on the way to New Scientist Live. Missed loads of it. My bad planning. Worst luck. Really...
I had a great time last weekend with a garden party with my friend and one of my longstanding tech leads David. It was great to spend the...
I've taken the last week off. For the first time in about a decade I haven't responded to emails (at least I've picked and chosen), Teams...
I was mulling over whether to derive a new jocular term for a Data Mesh. I pondered Data Mess but that seemed to obvious so I've opted...
I felt compelled to write about this because it's a phenomenon I'm seeing more and more. Cloud and data is revolutionary because it's...
So since writing the original post about how to drop a SQL table from a Spark Notebook I've been meaning to follow it up with a version...
For the Python version of the code below, see the follow-up post. One of the nice things with Spark Pools in Azure Synapse Analytics is...
Note: this post is mainly about Azure but it can apply to any cloud. You get to a stage when you hear enough definitions of what people...
So a slight departure from Spark (sort of) for this post, but I wanted to look at one of the most commonly overlooked aspects of building...
Having recently released the Excel data source for Spark 3, I wanted to follow up with a "lets use it to process some Excel data" post....
People have data in Excel, so lets have a look at how we can read that data using Spark
Sometimes, when you go scanning through the documentation for Spark, you come across notes about certain functions. These tend to offer...
Something that we often see as Spark jobs are moved into production is that handling of bad data is either ignored, or a lot of effort...
Getting data into a good shape is a key part to Data Engineering, and we often get data in all sorts of shape and quality
The command line is a powerful environment that lets you do a lot of work quickly, easily, and in a repeatable way
It's now easier to see the documentation directly in the notebook, rather than having to call "help" or go searching the Internet
Back in October 2020 I had the amazing opportunity to with with Richard on a small project where we wanted to speed up processing of CEF...
So input output tables are an economic cludge format which groups together a bunch of coefficients per country per year (these are...