Structure your Machine Learning project source code like a pro

How to structure projects when working with Python in an ML project

Facundo Santiago
16 min readMay 16, 2022

As Machine Learning solutions start to gain more traction and make their way into mission-critical systems, the pressure on those solutions' quality and robustness also starts to increase.

The software engineering practice went through the same process maybe decades ago when computer systems started to become more complex. However, data science and the Machine Learning practice somehow didn’t pick up from there. Although it is true that machine learning solutions have some technical differences from traditional software systems, it is also true that we have been reluctant to adopt such practices despite the potential gain. Don’t you think it is about time to get pro?

One of the key lessons learned in software engineering is the use of a solid source control solution. However, it is not always clear which is the best way to structure a project to work with git, especially when you talk about Jupyter Notebooks, data, code assets, models… In this post we will go over the details about how to structure a git repository to work in an ML project, particularly using Python. To make it easier, I will start small and then gradually add the pieces to make the project able to scale as challenges start to appear. Hope this is convenient for you (although it may look like a long blog post). Bear with me :).

--

--

Facundo Santiago
Facundo Santiago

Written by Facundo Santiago

Product Manager @ Microsoft AI. Graduate adjunct professor at University of Buenos Aires. Frustrated sociologist.

Responses (6)