Recently I’ve been working with a data scraping project that works with a small amount of data, small enough that free resources/tier from the most popular cloud engines are enough to allocate my data, but I don’t like having the risk of being billed over this personal project. To solve this I’ve been looking for a free alternative that I can share and that runs automatically with Github Actions.
If you want to check out the repo that contains the code discussed in this post, follow this link.
I’ll illustrate how to integrate SQLite Databases with Github Actions using Python, but if you know how to modify a file using another programming language this post is still relevant to you.
Writing your Data Generator/Scrapper
First, your project needs to be on a repository, in my case, I’m using Github. I wrote a Python code that scrapes a webpage and saves the data to a SQLite database, on this example I’ll illustrate this with a much simpler code.
|
|
Setting the Workflow
If you run the above code multiple times on a local machine it’ll work, but you’ll notice that on Github it’ll not persist the changes, that is because you need to commit the changes. To do this you’ll need to create a workflow, on your repo create a yaml file on .github/workflow
. This file is going to be your workflow, you can choose any name you want.
|
|
Don’t forget to enable workflow permissions, on your repo go to Settings > Actions > General
, and select Read and write permissions.
Conclusion
This can be a good free alternative in case you want to be able to share the data you are scraping or generating. But you still need to keep an eye on Github’s limitations when using the free version. See the current usage limits on their official website.
If you would like to see a real-like application of this you can go to this repo. Where I’ve implemented a monthly scrapper that saves the data to an SQLite database that is available to everyone.