Boite à outils NLP_tools

La bibliothèque d’outils pour le traitement NLP-tools (construite au dessus de contient des composants qui vous permettent d’ effectuer des traitements lexicaux, morphologiques et syntaxiques de base sur des corpus textuels .

Les composants disponibles vous permettent  de traitements de :

  • Stemming [engine = stemmer], en français et en anglais = racinisation des mots sous forme de stem.
       Ex: "dissection"    ->    "dissect"


  • Etiquetage en partie du discours [engine = POStagger] avec lemmatisation, en français et en anglais = affectation d’une catégorie syntaxique et production de la forme lemmatique de chaque mot.
      Ex: disséquera -> {« orth »: « disséquera », « pos »: « V », « lemma »: »disséquer »}


  • Chunking nominal, en anglais :
    • [engine = npchunker ]  extraction des groupes nominaux classique (analyse en constituant)
    • [engine = npchunkerdp ]  extraction des groupes nominaux basée sur de l’analyse en dépendance
      Ex:   "fleur_bleue", "muscle_strié _cardiaque", 
             "production_intérieur_de _gaz_de_la_Russie"
  • Reconnaissance terminologique [engine = termatcher] sur la ressource MX2015  =
      Ex: " Non-local effects by homogenization or 3D–1D dimension reduction in elastic materials reinforced by stiff fibers.We first consider an elastic thin heterogeneous cylinder of radius of order ε: the interior of the cylinder is occupied by a stiff material (fiber) that is surrounded by a soft material "
         " Non-MX_local_effects by MX_homogenization or 3D–1D MX_dimension_reduction in MX_elastic_materials reinforced by stiff MX_fibers .We first consider an elastic thin heterogeneous MX_cylinder of MX_radius of MX_order ε: the interior of the MX_cylinder is occupied by a stiff MX_material (MX_fiber ) that is surrounded by a MX_soft_material "

Selon le paraméte output indiqué et la nature du traitement, le résultat sera :

  • le texte, produit de la transformation du texte d’origine (output=doc)
  • une structure d’information au format json (output=json) plus complète qui contient toutes les metadonnées issues de l’analyse.

URLs du web service{langue}/{engine}/analyze?output={val} 
    • {langue}                       la langue à analyser           [en , fr]
    • {engine}                      nom pipeline de traitement à appliquer :
                     anglais :           [stemmer, postagger, npchunker, npchunkerdp]
                     francais :          [stemmer , postagger]
    • paramètres :
      {output}                       format du résultat           [doc , json]
      doc = le résultat est réinséré dans le document
      json = le résultat de l’analyse au format json

      Listes des routes

      Description de la tâche français anglais engine
      Stemming     /v1/fr/stemmer/analyze        /v1/en/stemmer/analyze stemmer
      Etiquettage en partie du discours /v1/fr/postagger/analyze /v1/en/postagger/analyze postagger
      Reconnaissance de termes contrôlés /v1/en/termmatcher/analyze  termmatcher
      Chunking nominal /v1/en/npchunker/analyze  NPchunker
      Chunking nominal issu d’une analyse en dépendance /v1/en/npchunkerdp/analyze  NPchunkerDP

      Code retour

      • 200 si OK
      • 404 si service non contacté

      L’analyse linguistique avec SPACY

      Pour une meilleur compréhension des formats d’analyse et des mécanismes impliqués dans nlptools, se référer à la documentation spacy disponible ici


      Exemple textuel du traitement

      Le format d'entrée :

      Exemple d’interrogation du chuncker en anglais, sortie doc :

      • route :  /v1/en/npchunker/analyze       
      • format de sortie  :  output=doc

      construit  l’url :

      cat <<EOF | curl --proxy "" -X POST --data-binary @- ""
      "idt":"08-0245642","value":"Random walk of passive tracers among randomly moving obstacles. Background: This study is mainly motivated by the need of understanding how the diffusion behaviour of a biomolecule (or even of a larger object) is affected by other moving macromolecules, organelles, and so on, inside a living cell, whence the possibility of understanding whether or not a randomly walking biomolecule is also subject to a long-range force field driving it to its target. Method: By means of the Continuous Time Random Walk (CTRW) technique the topic of random walk in random environment is here considered in the case of a passively diffusing particle in a crowded environment made of randomly moving and interacting obstacles. Results: The relevant physical quantity which is worked out is the diffusion cofficient of the passive tracer which is computed as a function of the average inter-obstacles distance. Coclusions: The results reported here suggest that if a biomolecule, let us call it a test molecule, moves towards its target in the presence of other independently interacting molecules, its motion can be considerably slowed down. Hence, if such a slowing down could compromise the efficiency of the task to be performed by the test molecule, some accelerating factor would be required. Intermolecular electrodynamic forces are good candidates as accelerating factors because they can act at a long distance in a medium like the cytosol despite its ionic strength."
      "idt":"08-040289","value":"Planck 2015 results. XIII. Cosmological parameters.We present results based on full-mission Planck observations of temperature and polarization anisotropies of the CMB. These data are consistent with the six-parameter inflationary LCDM cosmology. From the Planck temperature and lensing data, for this cosmology we find a Hubble constant, H0= (67.8 +/- 0.9) km/s/Mpc, a matter density parameter Omega_m = 0.308 +/- 0.012 and a scalar spectral index with n_s = 0.968 +/- 0.006. (We quote 68% errors on measured parameters and 95% limits on other parameters.) Combined with Planck temperature and lensing data, Planck LFI polarization measurements lead to a reionization optical depth of tau = 0.066 +/- 0.016. Combining Planck with other astrophysical data we find N_ eff = 3.15 +/- 0.23 for the effective number of relativistic degrees of freedom and the sum of neutrino masses is constrained to < 0.23 eV. Spatial curvature is found to be |Omega_K| < 0.005. For LCDM we find a limit on the tensor-to-scalar ratio of r <0.11 consistent with the B-mode constraints from an analysis of BICEP2, Keck Array, and Planck (BKP) data. Adding the BKP data leads to a tighter constraint of r < 0.09. We find no evidence for isocurvature perturbations or cosmic defects. The equation of state of dark energy is constrained to w = -1.006 +/- 0.045. Standard big bang nucleosynthesis predictions for the Planck LCDM cosmology are in excellent agreement with observations. We investigate annihilating dark matter and deviations from standard recombination, finding no evidence for new physics. The Planck results for base LCDM are in agreement with BAO data and with the JLA SNe sample. However the amplitude of the fluctuations is found to be higher than inferred from rich cluster counts and weak gravitational lensing. Apart from these tensions, the base LCDM cosmology provides an excellent description of the Planck CMB observations and many other astrophysical data sets."
      Le résultat :
      "idt": "08-0245642",
      "value": "random_walk passive_tracer move_obstacle diffusion_behaviour live_cell walking_biomolecule range_force_field continuous_time_random time_random_walk random_walk random_environment diffuse_particle crowded_environment interact_obstacle relevant_physical_quantity diffusion_cofficient passive_tracer test_molecule interact_molecule test_molecule accelerate_factor intermolecular_electrodynamic_force good_candidate accelerate_factor long_distance ionic_strength"
      "idt": "08-040289",
      "value": "cosmological_parameter mission_planck_observation observation_of_temperature polarization_anisotropy inflationary_lcdm_cosmology planck_temperature lense_datum matter_density_parameter scalar_spectral_index measured_parameter planck_temperature lense_datum planck_lfi_polarization lfi_polarization_measurement optical_depth depth_of_tau combine_planck effective_number relativistic_degree degree_of_freedom neutrino_masse spatial_curvature scalar_ratio ratio_of_r mode_constraint keck_array bkp_datum constraint_of_r evidence_for_isocurvature isocurvature_perturbation cosmic_defect equation_of_state dark_energy standard_big_bang big_bang_nucleosynthesis bang_nucleosynthesis_prediction planck_lcdm_cosmology excellent_agreement agreement_with_observation annihilate_dark_matter standard_recombination new_physics planck_result result_for_base base_lcdm agreement_with_bao bao_datum rich_cluster_count base_lcdm_cosmology excellent_description planck_cmb_observation astrophysical_datum_set"