RapidCanvas is a multi-tenant application designed to meet the needs of business analysts and data scientists by offering a comprehensive data science platform. It can be operated through a user interface (UI) and Jupyter Notebooks, providing deep integration with the latter right out of the box. Users within a tenant can collaborate on projects-modifying content, exploring data, analyzing outcomes, etc.-directly from the UI. However, what about collaboration within Jupyter Notebooks?
Expert data scientists prefer coding, as it offers greater control over the platform. They heavily rely on notebooks for creating projects, experimenting with new problems, building data apps, and developing templates. When multiple people work on the same problem within their private notebooks in a tenant, how can they collaborate effectively?
We explored various out-of-the-box solutions provided by JupyterHub. We configured JupyterHub for collaborative mode using c.LabApp.collaborative, which adds a share button at the top to create a shareable link to the notebook server. However, due to our custom authentication mechanism powered by RapidCanvas auth flow, the share link didn't function as intended.
To address the collaboration issue, we implemented several features, including creating a shared space within notebooks, providing deep GitHub integration, and establishing a project publishing flow. Let's delve into the details.
We considered various approaches to solving this problem. The options included:
The latter option was more feasible, easier, and faster to implement. The first option required a steep learning curve and diverted our focus from our core business to delve into JupyterHub's intricacies.
We created a shared space within hosted notebooks in the form of a directory, available out-of-the-box for every user. This shared directory serves as a common space for users within a given tenant. Data scientists working on a project can place all related materials in the shared directory, making them easily accessible to all users.
The second enhancement was integrating GitHub. To enable tracking changes and version control on GitHub, we provided deep integration with Git. This requires a one-time authentication setup, after which data scientists can easily operate Git and run commands without worrying about authentication and other complexities.
We developed an internal requirement to enable system admins to publish sample/demo projects to any user in any tenant. This feature is particularly useful when onboarding new customers and sharing custom projects to help them get started.
Combining these solutions addressed almost all collaboration use cases on our platform. RapidCanvas now empowers data scientists and users to collaborate on complex problems, initiate discussions within their notebooks, and solve issues end-to-end efficiently.
By implementing these features, RapidCanvas significantly enhances the collaborative capabilities for data scientists and business analysts, fostering a more productive and cohesive working environment.
RapidCanvas has significantly improved collaboration for data scientists and business analysts by implementing shared directories, GitHub integration, and a project publishing flow within Jupyter Notebooks. These enhancements facilitate seamless teamwork, allowing users to easily share, track, and manage their work. The platform now empowers users to tackle complex problems collaboratively, fostering a more productive and cohesive working environment. RapidCanvas continues to prioritize user needs, ensuring effective and efficient collaboration. This comprehensive approach addresses nearly all collaboration use cases, making it a powerful tool for modern data science projects.