SQL is coming to The Graph
Anyone who has been developing on The Graph knows that we use GraphQL for queries. The Guild maintains an excellent GraphQL library and there are many benefits. During the last core dev call StreamingFast announced that SQL is coming the The Graph via an upgrade to their data processing suite that already includes superstars like the FireHose and Substreams. During the call they gave a demo of what it would look like to run these new features on your own setup which is exciting as it continues our work towards a truly decentralized future. StreamingFast encouraged devs to contact them on their discord to get help setting up the beta and also to share any specific needs so that they can add features that the community needs.
The call covered other topics as well but in this blog I want to focus on SQL. StreamingFast sepparated their presentation into 3 sepparate parts:
11:30 - Bringing SQL Queries to The Graph | Semiotic Labs
22:20 - How DBT (Data Build Tool) Takes SQL to the Next Level
30:30 - The Backend: Infrastructure for SQL and DBT Integration | feat. StreamingFast
*from the video discription. Video linked below.
I will introduce each part in this blog and link to some additional resources. My intention is not to teach about this topic but rather to give exposure to the new feature, cover what is in the video without you needing to watch it all, as well as to connect you to information that will be helpful.
Why SQL?
StreamingFast and others would like to expand the capability of The Graph to cover not only the needs of developers making dapps but also the data science community. GraphQL is great for dapp developers because they don't need to maintain an API backend and because dapps often use fixed queries which can be well defined in advance. SQL offers speed and flexibility as well as brand new applications for the decentralized data marketplace we are building. This is also good for the economy of the market, since more applications means more possible customers.
This also comes off the back of recent advancements in the tools for SQL. The team at StreamingFast compared the development of tools for GraphQL and other popular web languages to the development of tools for SQL by comparing the teams that typically use these languages. Web focused languages like GraphQL are designed by developers for developers who love to make use and maintain IDE's and all sorts of flashy easy to use tools. Conversely SQL is mostly use by "number crunchers" finance and marketing professionals or data scientists. Because of this difference SQL has been lagging in attractive and useful tools. Data Build Tool (DBT) has emerged as a popular solution to fill this gap in recent years. The team has chosen to also go with clickhouse as a choice of SQL integration for their products but know that it is possible and desirable to include other tools in the future.
Timeline
The timeline as shown in the dev call is as follows
SQL Data Service Timeline
Define the SQL API (Prototype Done)
Integrate DBTs into Substreams (Work In Progress)
Test Deployment with Pinax and Streaming Fast (Q4 2023)
Integrate with Gateway & Billing (Q1 2024)
-from The Graph Forum
DBT and Substreams
In many ways this SQL announcement from StreamingFast is an upgrade announcement for Substreams. Integration of DBT into substreams is ongoing but a beta is available along with a tutorial linked below. DBT's philosophy is comparable to the philosophy behind substreams. From the Forum Post and Video: "This modularization facilitates data lineage visualization, showing the interconnectivity between different data models."
DBT is modular like Substreams
DBT is Open Source and has an active community especially in web2
Both DBT and Substreams can be a part of a larger data pipeline where consumers and processors of data can enjoy interoperability between many different services and programs to facilitate their work
Both DBT and Substreams are well supported with documentation and tutorials which will empower future developers
New StreamingFast Product and Backend Discussion
There was a brief mention of competetors to the graph in this section of the discussion. It was asserted that there is currently no "great real-time solution" to many of these problems from The Graph or from any competitor such as Dune. There is a need for solutions to the data processing needs of the community:
Desire real-time API access
Scalability
Time, effort, and cost to deploy infrastructure
Learning curve for Substreams is steep
Difficult to join on-chain and off-chain data
Many current solutions are hosted on a proprietary platform
The proposed solution is something they are calling "deployable units" which they hope will simplify the deployment of both the front end and back end needs while minimizing scaling and hosting complexities. The unit is a stack: Firehose and Substreams perfom the data extraction and initial processing, then SQL sync is used to transfer data into databases like Postgress or ClickHouse. DBT is involved in data transformation as an alternative to Substreams for data manipulation.
"This setup allows direct querying of the data via Graph QL, REST API, or direct SQL access, facilitating easier data consumption for analytics and front-end applications." - Pedro, Summarizing StreamingFast
Live Demo of Deployable Units
StreamingFast gave a live demo, showing a deployable unit through Substreams with SQL directly integrated. Key features include:
Database output (DB out) is designed to feed into an SQL database
can package and share the entire schema as an .spkg file
Two different examples were given showing off the versatility of the units. One example used Postgress to process Bored Ape data while the other used ClickHouse to show CryptoPunks. In both cases they showed off the SQL queries in these deployed systems and how they allow for "direct, efficient data access and manipulation".
Q&A
Viewers asked several questions to the developers which were answered live during the call. Pablo also shared the answers on the forum post
Links and Resources
StreamingFast discord for discussion and help from the team
ping me on twitter or discord if you have any feedback @athsrueas