Snowflake took the world by storm when it brought in its horizontal-scaling warehouse and all-things-in-SQL approach. It helped free up data engineers from the tedious infra management and machine tweaking duties and enabled them to focus on core data
modelling and exploration activities. Snowflake also helps in abstracting the layers of data-partitioning, data-governance and data accessibility that further focusses the efforts towards data-driven initiatives minus a lot of the hassle. However, Snowpark takes things “back” to the code-heavy world and in many interesting ways.
Snowpark allows developers to bring their own code, in their preferred programming languages to execute on Snowflake’s virtual machines. It is as powerful as it sounds. But why was this needed? It was since the classic big data world has some advantages especially when data is applied to use-cases like machine learning. It also was needed because the testability of languages like Java, Python, Scala is higher than SQL, with ready unit-testing frameworks and established testing practices. That said, Snowflake also has several advantages like ease of management and less hassle. Snowpark is the promise of the best \from both worlds.
The following are some considerations to be aware of when evaluating Snowpark.
Supporting initiatives like machine learning
Less overheads and management worries
It is still Snowflake after all. Hence, the problem of loading different tools and waiting for some time for all booting to happen isn’t there. Snowflake is ready with all the packages needed to get up and running with your choice of stack. Snowpark also takes care of aspects like garbage collection and portioning as usual, so that while your code brings in your favorite features, you don’t need to engage in the maintenance activities. This can be a concern to some as a few data engineers do want the levers for their own control.
Ease of scaling and performance
Snowflake offers its out-of-the-box scaling features in Snowpark. So your data-models can easily scale to meet new users’ demands or more concurrent requests. It also means that your computations get the desired compute to run, in the classic serverless manner.
Security and compliance
Snowflake and hence, Snowpark bring in their abilities to manage and govern the data and put in access restrictions (even at a column level) where need be. This means that you wouldn’t need to tinker with tools like Ranger if you are bringing in your favorite Python code.
Better managed pipelines and workflows
All this also means that there is a more familiar and testable management of CI/CD. It also means better unit testing overall and closer to the classic big data world.
Single standardized way of accessing the data
Snowpark allows you to bring in different programming languages, but the idea is still to use Snowflake’s data warehouse. This has an advantage of having standardized ways to access the data and this can also mean that you spend some time in migrating your data from other databases like Presto DB to Snowflake. This is the “effort cost” of using the Snowflake-Snowpark architecture.
Snowpark is here to stay. It will be interesting to see, the kind of adoption it has in the data community. Also we will watch out for the support it gets from Snowflake. With its promise of bring any language and yet harness the power of Snowflake, it can be a good bet for a data
If you are looking for help with Snowflake or Snowpark and be serverless, you can connect with our team now. With extensive skill and experience, we will help you in achieving your technology goals in time.
Great content. Would love to read more from you. Are you based out of USA?