Differences

This shows you the differences between two versions of the page.

Link to this comparison view

interfacing_with_c_-_pcre_library [2012/09/26 17:56]
fjfabien created by fj fabien
interfacing_with_c_-_pcre_library [2012/10/10 12:06] (current)
fjfabien Improvement of text and code
Line 8: Line 8:
  
 As an alternative,​ interfacing with PCRE will show some techniques for dealing with a C library. As an alternative,​ interfacing with PCRE will show some techniques for dealing with a C library.
 +There are enough primitives inside the package Interfaces.C.Strings to avoid a wrapper in C.
  
  
Line 46: Line 47:
 ===== Interface of the thin binding ===== ===== Interface of the thin binding =====
  
-The objective of the interface is to hide the dependancy from the package Interfaces.C. and the types exposed by the interface are : Integer, String, Pcre_Type, ​Pcre_Extra_type, and also System.Address.+The objective of the interface is to hide the dependancy from the package Interfaces.C. and the types exposed by the interface are : Integer, String, Pcre_Type, ​Extra_type(and also System.Address ​in the complete binding).
  
-The types Pcre and Pcre_Extra ​are opaque pointers and should not be accessible outside the interface so they are made private. +The types Pcre and Extra are opaque pointers and should not be accessible outside the interface so they are made private. 
-No operation on the components of pcre_extra are necessary, so  pcre and pcre_extra are implemented ​as System.address.+No operation on the components of pcre_extra are necessary, so  pcre and pcre_extra are just declared ​as System.Address.
  
-In Gnat, a string is implemented with the bounds and afterwards the content of the string and we must pass +The complete cycle in PCRE is (compile/​study/​exec) where Gnat.Regex has 2 phases ​(compile/​match)the study phase is an optimization ​of the patternthat output an object ​of type Extra. Here we by-pass ​the study phase.
-to the C code a pointer to charTo avoid the function Interfaces.C.New_String,​ which makes a new copy of the +
-data (when the data weight 50 Mb it's a burden)the trick is to point to the first element ​of the Ada String, +
-and give the address of the first element ​of the string.+
  
-The Ada interface for PCRE replicates ​the specifications of Gnat.Regex : 2 phases (compile / match) instead ​of 3 phases in pcre.h (compile/​study/​exec)+Compile allocates and returns a pointer to the compiled pattern, that is null if some error occuredIn that case, an error message is available as well as the position ​of the error.
  
- Some Ada extra :+Free is used to deallocate the compiled pattern.
  
-An exception if something goes wrongand Null values to check the validity ​of any opaque pointer (it could be replaced by a function Is_Valid).+Match takes as inputs the compiled pattern, the subject Ada string to parse. 
 +The parameter <​length>​ of string is necessary in case of partial scan
  
 +procedure Match ouputs a return code (Result) that is negative if there is no match or an error.
 +For a zero or positive return code, the match_array has the same output as the C library.
  
 === pcre.ads === === pcre.ads ===
  
 <code ada> <code ada>
-with System; ​use System;+----------------------------------------------------------------------- 
 +--  interface to PCRE 
 +----------------------------------------------------------------------- 
 +with System; 
 +with Interfaces;
  
 package Pcre is package Pcre is
  
-   Pcre_Error ​exception;+   type Options is new Interfaces.Unsigned_32;​ 
 + 
 +   ​PCRE_CASELESS ​         : constant Options ​:= 16#​00000001#​ ​--Compile
  
    type Pcre_Type is private;    type Pcre_Type is private;
-   ​type ​Pcre_Extra_type ​is private;+   ​type ​Extra_type ​is private;
  
-   ​Null_Pcre ​      ​: constant Pcre_Type;​ +   ​Null_Pcre ​ : constant Pcre_Type;​ 
-   Null_Pcre_Extra ​: constant ​Pcre_Extra_type;+   Null_Extra ​: constant ​Extra_type;
  
-   procedure Compile +   type Table_Type is private
-     ​(Pattern ​      : in String; +   Null_Table ​constant Table_Type;
-      Options ​      : in Integer; +
-      Matcher ​      : out Pcre_Type+
-      ​Matcher_Extra ​out Pcre_Extra_type);+
  
-   procedure ​Match + 
-     ​(Matcher ​            ​in Pcre_Type;​ +   -- output strings for error message; normally size of 80 should be enough 
-      ​Matcher_Extra ​      : in Pcre_Extra_type+   ​subtype Message is String (1 .. 80); 
-      ​Subject ​            System.Address+ 
-      ​-- Address of the first element of the string to be searched; +   procedure ​Compile 
-      Length, Startoffset ​in Integer+     ​(Matcher ​     out Pcre_Type;​ 
-      ​Options ​            in Integer+      ​Pattern ​     ​: in String
-      ​Match_0, Match_1 ​   ​: out Integer; +      ​Option ​      in Options
-      ​Result ​             ​out Integer);+      ​Error_Msg ​   ​out Message
 +      ​Last_Msg ​    out Natural
 +      ​Error_Offset ​: out Integer; 
 +      ​Table        ​in Table_Type := Null_Table);
  
    ​procedure Free (M : Pcre_Type);    ​procedure Free (M : Pcre_Type);
  
-   ​procedure ​Free (Pcre_Extra_type);+   ----------------- 
 +   -- Match_Array -- 
 +   ​----------------- 
 +   -- Result of matches : same output as PCRE 
 +   -- size must be a multiple of 3 x (nbr of parentheses + 1) 
 +   -- For top-level, range should be 0 .. 2 
 +   -- For N parentheses,​ range should be 0 .. 3*(N+1) -1 
 +   -- If the dimension of Match_Array is insufficient,​ Result of Match is 0. 
 +   -- 
 +   type Match_Array is array (Natural range <>) of Natural; 
 + 
 +   procedure ​Match 
 +     (Result ​             : out Integer; 
 +      Match_Vec ​          : out Match_Array;​ 
 +      Matcher ​            : in Pcre_Type;​ 
 +      Extra               : in Extra_type;​ 
 +      Subject ​            : in String; 
 +      Length, Startoffset : in Integer; 
 +      Option ​             : in Options ​:= 0);
  
 private private
  
    type Pcre_Type is new System.Address;​    type Pcre_Type is new System.Address;​
-   ​type ​Pcre_Extra_type ​is new System.Address;​+   ​type ​Extra_type ​is new System.Address
 + 
 +   ​Null_Pcre ​ : constant Pcre_Type ​ := Pcre_Type (System.Null_Address);​ 
 +   ​Null_Extra : constant Extra_type := Extra_type (System.Null_Address);
  
-   Null_Pcre ​      : constant Pcre_Type ​      := Pcre_Type (Null_Address)+   type Table_Type is new System.Address
-   Null_Pcre_Extra ​: constant ​Pcre_Extra_type ​:= +   Null_Table ​: constant ​Table_Type ​:= Table_Type ​(System.Null_Address);​
-      Pcre_Extra_type ​(Null_Address);​+
  
 end Pcre; end Pcre;
 </​code>​ </​code>​
- 
  
  
 ===== Implementation of the thin binding ===== ===== Implementation of the thin binding =====
  
 +In C, a string is implemented as a pointer to char terminated by a nul.
 +Using Gnat, an Ada string is implemented with the 2 bounds first, and afterwards the content of the string.
 +in package Interfaces.C.New_String
 +<code ada>
 +   ​function New_String (Str : String) return chars_ptr;
 +</​code>​
  
-The procedure Compile combines pcre_compile ​and pcre_study with sanity checks. Not big deal.+This function allocates a new copy of the data and adds terminating null. So the data are duplicated, which can be burden when the data weight 50 Mb.
  
-The procedure Match deals with the return of vector from the C code. +Also to avoid memory leak, this data must be freed after use.
-Ada allocates ​this vector that is used by the C code, so a pragma convention(C) is required, as well as a pragma Volatile so that the Ada compiler does not interfere/​optimize it.+
  
-The 2 procedures Free are for garbage collection. +The procedure Match deals with :
-The whole package has been tested for memory leaks with Valgrind and does not leak.+
  
-For the sake of simplicityno error handling in Compile ​is doneit is left to the reader.+  1/passing by reference ​the content ​of an Ada string. 
 +Due to the difference between the Ada string and the C stringthe trick is to point to the first element of the Ada String. In this casethere is no terminating nul, but as we pass the length of 
 +the data, this is no trouble. 
 + 
 +  2/getting back a vector from the C code. 
 +Ada allocates this vector that is used by the C code. 
 +Therefore a pragma convention(C) is required for the vector, as well as a pragma Volatile so that the Ada compiler does not interfere/​optimize it. 
 + 
 +The whole package has been tested for memory leaks with Valgrind and does not leak.
  
 === pcre.adb === === pcre.adb ===
Line 130: Line 168:
 with Interfaces.C; ​            use Interfaces.C;​ with Interfaces.C; ​            use Interfaces.C;​
 with Ada.Unchecked_Conversion;​ with Ada.Unchecked_Conversion;​
 +with System; ​                  use System;
  
 package body Pcre is package body Pcre is
  
    ​pragma Linker_Options ("​-lpcre"​);​    ​pragma Linker_Options ("​-lpcre"​);​
-   ​pragma Assert (int'​Size = Integer'​Size);​ -- always true with Gnat 
  
    use Interfaces;    use Interfaces;
Line 144: Line 182:
    ​function Pcre_Compile    ​function Pcre_Compile
      ​(pattern ​  : chars_ptr;      ​(pattern ​  : chars_ptr;
-      ​options ​  Integer;+      ​option ​   ​Options;
       errptr ​   : access chars_ptr;       errptr ​   : access chars_ptr;
       erroffset : access Integer;       erroffset : access Integer;
-      tableptr ​ : chars_ptr)+      tableptr ​ : Table_Type)
       return ​     Pcre_Type;       return ​     Pcre_Type;
    ​pragma Import (C, Pcre_Compile,​ "​pcre_compile"​);​    ​pragma Import (C, Pcre_Compile,​ "​pcre_compile"​);​
- 
-   ​function Pcre_Study 
-     ​(code ​   : Pcre_Type; 
-      options : Integer; 
-      errptr ​ : access chars_ptr) 
-      return ​   Pcre_Extra_type;​ 
-   ​pragma Import (C, Pcre_Study, "​pcre_study"​);​ 
  
    ​function Pcre_Exec    ​function Pcre_Exec
      ​(code ​       : Pcre_Type;      ​(code ​       : Pcre_Type;
-      extra       : ​Pcre_Extra_type;+      extra       : ​Extra_type;
       subject ​    : chars_ptr;       subject ​    : chars_ptr;
       length ​     : Integer;       length ​     : Integer;
       startoffset : Integer;       startoffset : Integer;
-      ​options ​    Integer;+      ​option ​     ​Options;
       ovector ​    : System.Address;​       ovector ​    : System.Address;​
-      ovecsize ​   : C.int)+      ovecsize ​   : Integer)
       return ​       Integer;       return ​       Integer;
    ​pragma Import (C, Pcre_Exec, "​pcre_exec"​);​    ​pragma Import (C, Pcre_Exec, "​pcre_exec"​);​
  
    ​procedure Compile    ​procedure Compile
-     ​(Pattern ​      ​: in String; +     (Matcher ​     : out Pcre_Type;​ 
-      ​Options ​      : in Integer+      ​Pattern ​     : in String; 
-      ​Matcher ​      : out Pcre_Type+      ​Option ​      : in Options
-      ​Matcher_Extra ​: out Pcre_Extra_type)+      ​Error_Msg ​   ​: out Message
 +      ​Last_Msg ​    : out Natural; 
 +      Error_Offset : out Integer; 
 +      Table        : in Table_Type := Null_Table)
    is    is
-      ​Regexp ​      : Pcre_Type;​ +      Error_Ptr : aliased chars_ptr;​ 
-      Regexp_Extra : Pcre_Extra_type;​ +      ​ErrOffset ​: aliased Integer; 
-      ​Error_Ptr ​   : aliased chars_ptr;​ +      Pat       ​: chars_ptr := New_String (Pattern);
-      ​Error_Offset ​: aliased Integer; +
-      Pat          : chars_ptr := New_String (Pattern);+
    begin    begin
-      ​Regexp ​:=+      ​Matcher ​:=
          ​Pcre_Compile          ​Pcre_Compile
            (Pat,            (Pat,
-            ​Options,+            ​Option,
             Error_Ptr'​Access,​             Error_Ptr'​Access,​
-            ​Error_Offset'​Access,​ +            ​ErrOffset'​Access,​ 
-            ​Null_Ptr);+            ​Table);
       Free (Pat);       Free (Pat);
  
-      if Regexp ​= Null_Pcre then +      if Matcher ​= Null_Pcre then 
-         raise Pcre_Error;+         Last_Msg ​                 := Natural (Strlen (Error_Ptr));​ 
 +         ​Error_Msg (1 .. Last_Msg) := Value (Error_Ptr);​ 
 +         ​Error_Offset ​             := ErrOffset;​ 
 +      else 
 +         ​Last_Msg ​    := 0; 
 +         ​Error_Offset := 0;
       end if;       end if;
-      Matcher ​     := Regexp; 
-      Regexp_Extra := Pcre_Study (Regexp, 0, Error_Ptr'​Access);​ 
-      if Regexp_Extra = Null_Pcre_Extra then 
-         raise Pcre_Error; 
-      end if; 
-      Matcher_Extra := Regexp_Extra;​ 
    end Compile;    end Compile;
 +
  
    ​procedure Match    ​procedure Match
-     ​(Matcher ​            : in Pcre_Type;​ +     (Result ​             : out Integer; 
-      ​Matcher_Extra ​      : in Pcre_Extra_type+      Match_Vec ​          : out Match_Array;​ 
-      Subject ​            : ​System.Address;​ +      ​Matcher ​            : in Pcre_Type;​ 
-      -- Address of the first element of a string;+      ​Extra               : in Extra_type
 +      Subject ​            : ​in String;
       Length, Startoffset : in Integer;       Length, Startoffset : in Integer;
-      ​Options ​            : in Integer; +      ​Option ​             ​: in Options ​:= 0)
-      Match_0, Match_1 ​   : out Integer; +
-      Result ​             ​out Integer)+
    is    is
-      ​Vecsize ​: constant := 3-- top-level matching +      ​Match_Size ​: constant ​Natural ​                    := Match_Vec'​Length
- +      m          : array (0 .. Match_Size ​- 1) of C.int := (others => 0);
-      m : array (0 .. Vecsize ​- 1) of C.int;+
       pragma Convention (C, m);       pragma Convention (C, m);
       pragma Volatile (m); -- used by the C library       pragma Volatile (m); -- used by the C library
  
-      Start  : constant chars_ptr := +      Start : constant chars_ptr := 
-         ​To_chars_ptr (Subject);+         ​To_chars_ptr (Subject ​(Subject'​First)'​Address);
    begin    begin
  
-      Result ​ :=+      Result :=
          ​Pcre_Exec          ​Pcre_Exec
            ​(Matcher,​            ​(Matcher,​
-            ​Matcher_Extra,+            ​Extra,
             Start,             Start,
             Length,             Length,
             Startoffset,​             Startoffset,​
-            ​Options,+            ​Option,
             m (0)'​Address,​             m (0)'​Address,​
-            ​C.int (Vecsize)); +            ​Match_Size); 
-      ​Match_0 := Integer (m (0)); +      ​for I in 0 .. Match_Size - 1 loop 
-      ​Match_1 ​:= Integer (m (1)); +         ​if ​m (I> 0 then 
 +            ​Match_Vec (I) := Integer (m (I)); 
 +         else 
 +            Match_Vec (I) := 0; 
 +         end if; 
 +      end loop;
    end Match;    end Match;
  
Line 242: Line 276:
  
    ​procedure Free (M : Pcre_Type) is    ​procedure Free (M : Pcre_Type) is
-   begin 
-      Pcre_Free (System.Address (M)); 
-   end Free; 
- 
-   ​procedure Free (M : Pcre_Extra_type) is 
    begin    begin
       Pcre_Free (System.Address (M));       Pcre_Free (System.Address (M));
Line 258: Line 287:
 ===== Test of Pcre binding ===== ===== Test of Pcre binding =====
  
-A simple program : compiling a pattern and showing positions in the subject string.+Example taken from Regex at the site Rosetta.org
  
-=== test_pcre.adb ===+=== test_0.adb ===
 <code ada> <code ada>
 -- --
--- A simple ​test to show the values of m0 & m1+-- Basic test : splitting a sentence into words
 -- --
-with Text_IO; use Text_IO; +with Ada.Text_IO; use Ada.Text_IO; 
-with Pcre;    use Pcre;+with Pcre;        use Pcre;
  
-procedure ​Test_Pcre ​is+procedure ​Test_0 ​is
  
-   Regexp ​         ​: Pcre_Type;​ +   procedure Search_For_Pattern 
-   Regexp_Extra ​   ​Pcre_Extra_type+     ​(Compiled_Expression ​in Pcre.Pcre_Type;​ 
-   ​Retcode ​        ​: Integer; +      ​Search_In ​          in String; 
-   PositionCount Integer ​        := 0; +      Offset ​             : in Natural; 
-   m0, m1          ​Integer+      First, Last         : out Positive; 
-   ​Subject ​        ​: constant String := "Z2345A789B123456789AA";+      Found               : out Boolean) 
 +   is 
 +      Result ​ : Match_Array (0 .. 2); 
 +      ​Retcode : Integer; 
 +   begin 
 +      Match 
 +        (Retcode, 
 +         ​Result,​ 
 +         ​Compiled_Expression,​ 
 +         ​Null_Extra,​ 
 +         ​Search_In,​ 
 +         ​Search_In'​Length,​ 
 +         ​Offset);​ 
 + 
 +      if Retcode < 0 then 
 +         ​Found ​:= False; 
 +      else 
 +         ​Found := True; 
 +         First := Search_In'​First + Result (0); 
 +         ​Last ​ := Search_In'​First + Result (1) - 1; 
 +      end if
 +   end Search_For_Pattern;​ 
 + 
 +   ​Word_Pattern ​constant String := "​([A-z]+)"​; 
 + 
 +   ​Subject ​         : constant String := ";-)I love PATTERN matching!"
 +   ​Current_Offset ​  : Natural ​        := 0; 
 +   ​First,​ Last      : Positive; 
 +   ​Found ​           : Boolean; 
 +   ​Regexp ​          : Pcre_Type;​ 
 +   ​Msg ​             : Message; 
 +   ​Last_Msg,​ ErrPos : Natural ​        := 0;
  
 begin begin
-   ​Compile +   ​Compile (Regexp, Word_Pattern, 0, Msg, Last_MsgErrPos);
-     (Pattern ​      => "​[A-Z][0-9]"​, +
-      Options ​      ​=> ​0, +
-      Matcher ​      => Regexp, +
-      Matcher_Extra => Regexp_Extra);+
  
 +   -- Find all the words in Subject string
    loop    loop
-      ​Match+      ​Search_For_Pattern
         (Regexp,         (Regexp,
-         Regexp_Extra,​ +         ​Subject,​ 
-         Subject ​(1)'​Address+         Current_Offset
-         Subject'​Length+         First
-         Position+         Last
-         0+         Found); 
-         m0, +      exit when not Found
-         m1, +      Put_Line ("<" & Subject ​(First .. Last) & ">"​);​ 
-         ​Retcode); +      ​Current_Offset ​:= Last;
-      exit when Retcode < 0+
-      Put_Line +
-        ​("m0:=" & +
-         ​Integer'​Image ​(m0) & +
-         " ​m1:=" & +
-         ​Integer'​Image (m1) & +
-         "​ character => " ​+
-         ​Subject (m1)); +
-      Count    := Count + 1+
-      ​Position ​:= m1;+
    end loop;    end loop;
-   ​Put_Line ("​Count is" & Integer'​Image (Count)); 
  
    Free (Regexp);    Free (Regexp);
-   Free (Regexp_Extra);​ +end Test_0;
-end Test_Pcre;+
 </​code>​ </​code>​
  
 Output : Output :
 <​code>​ <​code>​
-m0:= 0 m1:= 2 character =2 +<I
-m0:= 5 m1:= 7 character =7 +<love
-m0:= 9 m1:= 11 character =1 +<PATTERN
-Count is 3+<​matching>​
 </​code>​ </​code>​
 +
 +===== Complete code of the binding =====
 +
 +The complete code of the binding and some examples can be download at  [[http://​sourceforge.net/​projects/​lorenz/​files/​|sourceforge.net]]
  

Navigation