Galaxy server for complex TDM analysis – Part 1

Publié lecalendar par Olha NAHORNA

General overview

Galaxy is an open source, web-based platform for data-intensive analysis. Founded for biomedical research, today galaxy can also be applied in other fields. In Visa-TM, we are looking to implement it in Text and Data Mining (TDM) area.

There are several online Galaxy platforms available for free (UseGalaxy.eu – Europe, UseGalaxy.org – US, UseGalaxy.org.au – Australia), as well as many other public and private projects running based on Galaxy (https://galaxyproject.org/use/). Anyone can also create its personal local Galaxy instance for custom purpose.

Galaxy project has an active and growing community. Conferences, training workshops and collaboration festivals are held on regular basis. A lot of tutorials and FAQs are provided for users as well as for workshop trainers and their students. The communication tools (forum, gitter chat, Twitter, Mailing lists) are also available and actively used by community participants to exchange ideas on regional and international levels. As a result, Galaxy is sited in ~8000 publications, ~200 Galaxy platforms are running, ~8000 tools are available in the main toolshed (https://toolshed.g2.bx.psu.edu/) and it is still possible to add a custom tool if necessary.

Originally, Galaxy was designed for users who need to execute complex multistage reproducible analyses. Graphical interface makes this task user-friendly even for those users who don’t have strong development skills. In workflow engine, the chain of tools can be assembled together and then executed in one click. The workflow and analysis results can be saved for future use, as well as shared with other Galaxy users.

Example of simple galaxy workflow

Galaxy in Visa-TM

To execute Data and Text mining analysis, multiple steps are necessary: data cleaning, various data manipulations, different sorts of data processing, analysis executing and visualization of results. With Galaxy all these steps can be run in the same workflow. It is very useful if the objective is to execute same analysis many times or if it is necessary to tune one or another parameter in the complex workflow. In the last case it is easy to re-execute the full complex chain as many times as necessary, providing a slight parameter modification in one of the tools.

Visa-TM team involves Galaxy mostly as a background server that can be called by any application to execute TDM workflows via Galaxy API. In this machine-to-machine scenario the workflow should be previously created by a TDM expert, so the outside application can call it. Several different applications can call Galaxy server in parallel and obtain results once the analysis is finished. The results can also be sent to an external database. In case of high number of requests, the Galaxy queue will execute them once the resources become available.

The same usage of Galaxy was implemented in OpenMinTed project, so Visa-TM team tried to keep the same logic and stay consistent with OpenMinTed project for its best integration. As a reminder, Visa-TM project was funded with the objective to study how to deploy the OpenMinTed platform as a local instance in France.

Visa TM Galaxy can provide results to the calling application as well as store it in the external storage

In summary, a Visa-TM’s approach implies Galaxy as an internal server that can be contacted by external application, execute complex analysis and provide the results to that application.