Differences

This shows you the differences between two versions of the page.

Link to this comparison view

good_text_i_o_practice [2013/01/09 09:14]
sparre appended a bit to the explanations
good_text_i_o_practice [2013/01/16 08:08] (current)
sparre [POSIX.Memory_Mapping] It is not an _exact_ solution. ;-)
Line 3: Line 3:
 The Ada standard libraries provides a general purpose text I/O package named [[http://​www.ada-auth.org/​standards/​12rm/​html/​RM-A-10.html|Ada.Text_IO]]. ​ This is good for small problems where efficiency is not an issue, and its parsing and formatting routines are excellent. ​ One of the challenges for the compiler writers is that Ada.Text_IO is required to keep track of column, line and page numbers. ​ Another challenge in the Ada.Text_IO design is that it requires "​many"​ system calls or copying of the data being read/​written from/to a file. The Ada standard libraries provides a general purpose text I/O package named [[http://​www.ada-auth.org/​standards/​12rm/​html/​RM-A-10.html|Ada.Text_IO]]. ​ This is good for small problems where efficiency is not an issue, and its parsing and formatting routines are excellent. ​ One of the challenges for the compiler writers is that Ada.Text_IO is required to keep track of column, line and page numbers. ​ Another challenge in the Ada.Text_IO design is that it requires "​many"​ system calls or copying of the data being read/​written from/to a file.
  
-Test runs of the examples at <http://​rosettacode.org/​wiki/​Read_entire_file#​Ada>​ on three different text files (respectively 1 Mb, 32 Mb and 1024 Mb in size):+Below are some patterns for good text I/O practice. ​ The tasks they solve are derived from [[http://​rosettacode.org/​|Rosetta Code]].
  
-Using Unbounded_Strings:​ +===== Read entire file =====
-------------------------+
  
-<​code>​ +[[http://rosettacode.org/wiki/​Read_entire_file|Task description]].
-1024 kb +
-./using_unbounded_strings > copy  0,02s user 0,27s system 16% cpu 1,836 total +
-./using_unbounded_strings > copy  0,02s user 0,01s system 91% cpu 0,031 total +
-./using_unbounded_strings > copy  0,02s user 0,02s system 96% cpu 0,046 total+
  
-32 Mb +==== Ada.Direct_IO + Ada.Directories ====
-./​using_unbounded_strings > copy  0,68s user 0,41s system 38% cpu 2,809 total +
-./​using_unbounded_strings > copy  0,70s user 0,40s system 99% cpu 1,100 total +
-./​using_unbounded_strings > copy  0,71s user 0,34s system 99% cpu 1,067 total+
  
-1024 Mb +Using Ada.Directories to first ask for the file size and then Ada.Direct_IO to read the whole file in one chunk:
-./​using_unbounded_strings > copy  20,99s user 11,72s system 96% cpu 33,861 total +
-./​using_unbounded_strings > copy  21,07s user 11,83s system 90% cpu 36,170 total +
-./​using_unbounded_strings > copy  21,28s user 12,15s system 73% cpu 45,767 total +
-</​code>​+
  
-Using Direct_IO: +<code Ada>with Ada.Directories,​ 
-----------------+     Ada.Direct_IO, 
 +     Ada.Text_IO;​
  
-<​code>​ +procedure Read_Entire_File is 
-1024 kb +   File_Name : String ​ := "​read_entire_file.adb";​ 
-./​using_direct_io > copy  0,00s user 0,00s system 68% cpu 0,006 total +   File_Size : Natural := Natural (Ada.Directories.Size (File_Name));​ 
-./​using_direct_io > copy  0,01s user 0,01s system 84% cpu 0,019 total +   subtype File_String ​   is String (1 .. File_Size); 
-./​using_direct_io > copy  0,00s user 0,00s system 64% cpu 0,012 total+   package File_String_IO is new Ada.Direct_IO (File_String);​
  
-32 Mb +   ​File ​    : File_String_IO.File_Type;​ 
-raised STORAGE_ERROR ​stack overflow ​(or erroneous memory access+   Contents ​File_String;​ 
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access+begin 
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access)+   ​File_String_IO.Open  ​(File, Mode => File_String_IO.In_File,​ 
 +                               Name => File_Name); 
 +   File_String_IO.Read  ​(File, Item => Contents); 
 +   File_String_IO.Close ​(File);
  
-1024 Mb +   ​Ada.Text_IO.Put ​(Contents); 
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +end Read_Entire_File;​</​code>​
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access+
-</​code>​+
  
-Using recursion:​ +This kind of solution is limited a bit by the fact that the GNAT implementation of Ada.Direct_IO first allocates a copy of the read object on the stack inside Ada.Direct_IO.Read. ​ On Linux you can use the command <​code>​limit stacksize 1024M</​code>​ to increase the available stack for your processes to 1Gb, which gives your program more freedom to use the stack for allocating objects.
-----------------+
  
-<​code>​ +This solution requires the Ada 2005 standard library ​(specificly package Ada.Directoriesto work.
-1024 kb +
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access)+
  
-32 Mb +==== POSIX.Memory_Mapping ====
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access)+
  
-1024 Mb +Mapping the whole file into the address space of your process and then overlaying the file with a String object.
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-</​code>​+
  
-Using Memory_Mapping: +<code Ada>with Ada.Text_IO,​ 
----------------------+     ​POSIX.IO,​ 
 +     ​POSIX.Memory_Mapping, 
 +     System.Storage_Elements;​
  
-<​code>​ +procedure Read_Entire_File is 
-1024 kb +   use POSIX, POSIX.IOPOSIX.Memory_Mapping;​ 
-./​using_memory_maps > copy  0,00s user 0,24s system 71% cpu 0,347 total +   use System.Storage_Elements;​
-./​using_memory_maps > copy  0,00s user 0,00s system 85% cpu 0,005 total +
-./​using_memory_maps > copy  0,00s user 0,01s system 74% cpu 0,011 total+
  
-32 Mb +   ​Text_File ​   : File_Descriptor;​ 
-./​using_memory_maps ​copy  0,00s user 0,07s system 79% cpu 0,086 total +   Text_Size ​   : System.Storage_Elements.Storage_Offset;​ 
-./​using_memory_maps ​copy  0,00s user 0,07s system 96% cpu 0,075 total +   ​Text_Address : System.Address;​ 
-./​using_memory_maps ​copy  ​0,00s user 0,06s system 94% cpu 0,068 total+begin 
 +   ​Text_File := Open (Name ="​read_entire_file.adb"​, 
 +                      Mode => Read_Only);​ 
 +   ​Text_Size := Storage_Offset (File_Size (Text_File));​ 
 +   ​Text_Address := Map_Memory (Length ​    => Text_Size, 
 +                               ​Protection => Allow_Read
 +                               Mapping ​   =Map_Shared, 
 +                               ​File ​      => Text_File
 +                               Offset ​    => 0);
  
-1024 Mb +   ​declare 
-./​using_memory_maps > copy  0,00s user 1,38s system 42% cpu 3,227 total +      Text : String (1 .. Natural (Text_Size));​ 
-./​using_memory_maps > copy  0,00s user 1,63s system 91% cpu 1,776 total +      for Text'​Address use Text_Address;​ 
-./​using_memory_maps > copy  0,00s user 1,62s system 88% cpu 1,840 total +   ​begin 
-</​code>​+      Ada.Text_IO.Put (Text); 
 +   end;
  
 +   ​Unmap_Memory (First ​ => Text_Address,​
 +                 ​Length => Text_Size);
 +   Close (File => Text_File);
 +end Read_Entire_File;</​code>​
 +
 +This solution requires the POSIX Ada API (implemented as FLORIST or WPOSIX) to work.  (It has not been tested with an Ada 83 compiler.)
 +
 +==== Summary ====
 +
 +Using POSIX.Memory_Mapping is slightly faster than using Ada.Direct_IO,​ but you only really get a benefit from using memory mapping if you don't actually need the whole file, as the operating system only will copy in the parts of the file actually accessed by the application.
 +
 +===== Process text file =====
 +
 +[[http://​rosettacode.org/​wiki/​File_IO|Task description]]. ​ In other words, read a file into a variable (possibly only a part of the file at a time) and write it out to another file.
 +
 +==== Line by line ====
 +
 +This solution reads from the file one line at a time.  One nice thing about this solution is that you easily can switch it to read from standard input - and possibly anything which your operating system considers a file.
 +
 +<code Ada>with Ada.Command_Line,​ Ada.Text_IO;​ use Ada.Command_Line,​ Ada.Text_IO;​
 +
 +procedure Read_File_Line_By_Line is
 +   ​Read_From : constant String := "​input.txt";​
 +   ​Write_To ​ : constant String := "​output.txt";​
 +
 +   ​Input,​ Output : File_Type;
 +begin
 +   begin
 +      Open (File => Input,
 +            Mode => In_File,
 +            Name => Read_From);
 +   ​exception
 +      when others =>
 +         ​Put_Line (Standard_Error,​
 +                   "​Can not open the file '"​ & Read_From & "'​. Does it exist?"​);​
 +         ​Set_Exit_Status (Failure);
 +         ​return;​
 +   end;
 +
 +   begin
 +      Create (File => Output,
 +              Mode => Out_File,
 +              Name => Write_To);
 +   ​exception
 +      when others =>
 +         ​Put_Line (Standard_Error,​
 +                   "​Can not create a file named '"​ & Write_To & "'​."​);​
 +         ​Set_Exit_Status (Failure);
 +         ​return;​
 +   end;
 +
 +   loop
 +      declare
 +         Line : String := Get_Line (Input);
 +      begin
 +         -- You can process the contents of Line here.
 +         ​Put_Line (Output, Line);
 +      end;
 +   end loop;
 +exception
 +   when End_Error =>
 +      Close (Input);
 +      Close (Output);
 +end Read_File_Line_By_Line;</​code>​
 +
 +This solution requires the Ada 2005 standard library to work.
 +
 +Notice how we avoid explicit checks for read access to the input file, creation/​write access to the output file, as well as availablity of more data to process. ​ Even if we put in explicit checks, we would still have to handle the same exceptions, as another application can change the state of the file system in parallel with this application,​ creating a race condition.
 +==== POSIX.Memory_Mapping ====
 +
 +The [[#​posixmemory_mapping|POSIX.Memory_Mapping]] solution for reading an entire file into memory practically solves this task as well.  Still, it has some limitations which may make it irrelevant for some purposes:
 +  * It only works for an actual file (i.e. one stored on a file system). ​ Specifically it doesn'​t work for standard input, pipes and TCP connections.
 +  * It is not line oriented (i.e. you have to parse line-breaks yourself).
 +  * It requires an implementation of the POSIX Ada API (for example FLORIST or WPOSIX).

Navigation