What Snowpark means for the data community

What Snowpark means for the data community?

What Snowpark means for the data community

Snowflake took the world by storm when it brought in its horizontal-scaling warehouse and all-things-in-SQL approach. It helped free up data engineers from the tedious infra management and machine tweaking duties and enabled them to focus on core data
modelling and exploration activities. Snowflake also helps in abstracting the layers of data-partitioning, data-governance and data accessibility that further focusses the efforts towards data-driven initiatives minus a lot of the hassle. However, Snowpark takes things “back” to the code-heavy world and in many interesting ways.

Snowpark allows developers to bring their own code, in their preferred programming languages to execute on Snowflake’s virtual machines. It is as powerful as it sounds. But why was this needed? It was since the classic big data world has some advantages especially when data is applied to use-cases like machine learning. It also was needed because the testability of languages like Java, Python, Scala is higher than SQL, with ready unit-testing frameworks and established testing practices. That said, Snowflake also has several advantages like ease of management and less hassle. Snowpark is the promise of the best \from both worlds.

The following are some considerations to be aware of when evaluating Snowpark.

Supporting initiatives like machine learning

It turns out that many advanced statistical modelling techniques and calculations needed for machine learning and artificial intelligence are better built in Python or Scala. Snowflake uses SQL or JavaScript for nearly everything and while SQL is much closer to business than Java or Python, the latter certain bring more firepower to the data game. Snowpark allows you to bring the power of those languages to your Snowflake world.

Less overheads and management worries

It is still Snowflake after all. Hence, the problem of loading different tools and waiting for some time for all booting to happen isn’t there. Snowflake is ready with all the packages needed to get up and running with your choice of stack. Snowpark also takes care of aspects like garbage collection and portioning as usual, so that while your code brings in your favorite features, you don’t need to engage in the maintenance activities. This can be a concern to some as a few data engineers do want the levers for their own control.

Ease of scaling and performance

Snowflake offers its out-of-the-box scaling features in Snowpark. So your data-models can easily scale to meet new users’ demands or more concurrent requests. It also means that your computations get the desired compute to run, in the classic serverless manner.

Security and compliance

Snowflake and hence, Snowpark bring in their abilities to manage and govern the data and put in access restrictions (even at a column level) where need be. This means that you wouldn’t need to tinker with tools like Ranger if you are bringing in your favorite Python code.

Better managed pipelines and workflows

All this also means that there is a more familiar and testable management of CI/CD. It also means better unit testing overall and closer to the classic big data world.

Single standardized way of accessing the data

Snowpark allows you to bring in different programming languages, but the idea is still to use Snowflake’s data warehouse. This has an advantage of having standardized ways to access the data and this can also mean that you spend some time in migrating your data from other databases like Presto DB to Snowflake. This is the “effort cost” of using the Snowflake-Snowpark architecture.

Conclusion

Snowpark is here to stay. It will be interesting to see, the kind of adoption it has in the data community. Also we will watch out for the support it gets from Snowflake. With its promise of bring any language and yet harness the power of Snowflake, it can be a good bet for a data
engineering team.

If you are looking for help with Snowflake or Snowpark and be serverless, you can connect with our team now. With extensive skill and experience, we will help you in achieving your technology goals in time.

1 Comment
  • Stephen
    Posted at 21:42h, 03 May

    Great content. Would love to read more from you. Are you based out of USA?