Differences

This shows you the differences between two versions of the page.

Link to this comparison view

good_text_i_o_practice [2013/01/09 09:09]
sparre created (still messy notes)
good_text_i_o_practice [2013/01/16 08:08] (current)
sparre [POSIX.Memory_Mapping] It is not an _exact_ solution. ;-)
Line 1: Line 1:
 ====== Good text I/O practice ====== ====== Good text I/O practice ======
  
-The Ada standard libraries provides a general purpose text I/O package named [[http://​www.ada-auth.org/​standards/​12rm/​html/​RM-A-10.html|Ada.Text_IO]]. ​ This is good for small problems where efficiency is not an issue, and its parsing and formatting routines are excellent. ​ One of the challenges for the compiler writers is that Ada.Text_IO is required to keep track of column, line and page numbers.  ​ATest runs of the examples at <http://rosettacode.org/​wiki/​Read_entire_file#​Ada>​ on three different text files (respectively 1 Mb, 32 Mb and 1024 Mb in size):+The Ada standard libraries provides a general purpose text I/O package named [[http://​www.ada-auth.org/​standards/​12rm/​html/​RM-A-10.html|Ada.Text_IO]]. ​ This is good for small problems where efficiency is not an issue, and its parsing and formatting routines are excellent. ​ One of the challenges for the compiler writers is that Ada.Text_IO is required to keep track of column, line and page numbers.  ​Another challenge in the Ada.Text_IO design is that it requires "​many"​ system calls or copying ​of the data being read/written from/to a file.
  
-Using Unbounded_Strings: +Below are some patterns for good text I/O practice. ​ The tasks they solve are derived from [[http://​rosettacode.org/​|Rosetta Code]].
-------------------------+
  
-1024 kb +===== Read entire file =====
-./​using_unbounded_strings > copy  0,02s user 0,27s system 16% cpu 1,836 total +
-./​using_unbounded_strings > copy  0,02s user 0,01s system 91% cpu 0,031 total +
-./​using_unbounded_strings > copy  0,02s user 0,02s system 96% cpu 0,046 total+
  
-32 Mb +[[http://rosettacode.org/wiki/​Read_entire_file|Task description]].
-./using_unbounded_strings > copy  0,68s user 0,41s system 38% cpu 2,809 total +
-./using_unbounded_strings > copy  0,70s user 0,40s system 99% cpu 1,100 total +
-./using_unbounded_strings > copy  0,71s user 0,34s system 99% cpu 1,067 total+
  
-1024 Mb +==== Ada.Direct_IO + Ada.Directories ====
-./​using_unbounded_strings > copy  20,99s user 11,72s system 96% cpu 33,861 total +
-./​using_unbounded_strings > copy  21,07s user 11,83s system 90% cpu 36,170 total +
-./​using_unbounded_strings > copy  21,28s user 12,15s system 73% cpu 45,767 total+
  
-Using Direct_IO: +Using Ada.Directories to first ask for the file size and then Ada.Direct_IO ​to read the whole file in one chunk:
-----------------+
  
-1024 kb +<code Ada>with Ada.Directories
-./​using_direct_io > copy  0,00s user 0,00s system 68% cpu 0,006 total +     Ada.Direct_IO
-./​using_direct_io > copy  0,01s user 0,01s system 84% cpu 0,019 total +     Ada.Text_IO;
-./​using_direct_io > copy  0,00s user 0,00s system 64% cpu 0,012 total+
  
-32 Mb +procedure Read_Entire_File is 
-raised STORAGE_ERROR ​stack overflow ​(or erroneous memory access+   File_Name ​String ​ := "​read_entire_file.adb";​ 
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access+   ​File_Size : Natural := Natural ​(Ada.Directories.Size (File_Name)); 
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access)+   subtype File_String ​   is String ​(1 .. File_Size); 
 +   package File_String_IO is new Ada.Direct_IO ​(File_String);
  
-1024 Mb +   ​File ​    : File_String_IO.File_Type;​ 
-raised STORAGE_ERROR ​stack overflow ​(or erroneous memory access+   Contents ​File_String;​ 
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access+begin 
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access)+   ​File_String_IO.Open  ​(File, Mode => File_String_IO.In_File,​ 
 +                               Name => File_Name); 
 +   File_String_IO.Read  ​(File, Item => Contents); 
 +   File_String_IO.Close ​(File);
  
-Using recursion: +   ​Ada.Text_IO.Put (Contents); 
-----------------+end Read_Entire_File;</​code>​
  
-1024 kb +This kind of solution is limited a bit by the fact that the GNAT implementation of Ada.Direct_IO first allocates a copy of the read object on the stack inside Ada.Direct_IO.Read. ​ On Linux you can use the command <​code>​limit stacksize 1024M</​code>​ to increase the available ​stack for your processes to 1Gb, which gives your program more freedom to use the stack for allocating objects.
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access)+
  
-32 Mb +This solution requires the Ada 2005 standard library ​(specificly package Ada.Directoriesto work.
-raised STORAGE_ERROR : stack overflow ​(or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access)+
  
-1024 Mb +==== POSIX.Memory_Mapping ====
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access) +
-raised STORAGE_ERROR : stack overflow (or erroneous memory access)+
  
-Using Memory_Mapping:​ +Mapping the whole file into the address space of your process and then overlaying the file with a String object.
----------------------+
  
-1024 kb +<code Ada>with Ada.Text_IO,​ 
-./​using_memory_maps > copy  0,00s user 0,24s system 71% cpu 0,347 total +     POSIX.IO
-./​using_memory_maps > copy  0,00s user 0,00s system 85% cpu 0,005 total +     POSIX.Memory_Mapping
-./​using_memory_maps > copy  0,00s user 0,01s system 74% cpu 0,011 total+     System.Storage_Elements;​
  
-32 Mb +procedure Read_Entire_File is 
-./​using_memory_maps > copy  0,00s user 0,07s system 79% cpu 0,086 total +   use POSIX, POSIX.IOPOSIX.Memory_Mapping;​ 
-./​using_memory_maps > copy  0,00s user 0,07s system 96% cpu 0,075 total +   use System.Storage_Elements;​
-./​using_memory_maps > copy  0,00s user 0,06s system 94% cpu 0,068 total+
  
-1024 Mb +   ​Text_File ​   : File_Descriptor;​ 
-./​using_memory_maps ​copy  0,00s user 1,38s system 42% cpu 3,227 total +   Text_Size ​   : System.Storage_Elements.Storage_Offset;​ 
-./​using_memory_maps ​copy  0,00s user 1,63s system 91% cpu 1,776 total +   ​Text_Address : System.Address;​ 
-./​using_memory_maps ​copy  ​0,00s user 1,62s system 88% cpu 1,840 total+begin 
 +   ​Text_File := Open (Name ="​read_entire_file.adb"​, 
 +                      Mode => Read_Only);​ 
 +   ​Text_Size := Storage_Offset (File_Size (Text_File));​ 
 +   ​Text_Address := Map_Memory (Length ​    => Text_Size, 
 +                               ​Protection => Allow_Read
 +                               Mapping ​   =Map_Shared, 
 +                               ​File ​      => Text_File
 +                               Offset ​    => 0);
  
 +   ​declare
 +      Text : String (1 .. Natural (Text_Size));​
 +      for Text'​Address use Text_Address;​
 +   begin
 +      Ada.Text_IO.Put (Text);
 +   end;
  
 +   ​Unmap_Memory (First ​ => Text_Address,​
 +                 ​Length => Text_Size);
 +   Close (File => Text_File);
 +end Read_Entire_File;</​code>​
 +
 +This solution requires the POSIX Ada API (implemented as FLORIST or WPOSIX) to work.  (It has not been tested with an Ada 83 compiler.)
 +
 +==== Summary ====
 +
 +Using POSIX.Memory_Mapping is slightly faster than using Ada.Direct_IO,​ but you only really get a benefit from using memory mapping if you don't actually need the whole file, as the operating system only will copy in the parts of the file actually accessed by the application.
 +
 +===== Process text file =====
 +
 +[[http://​rosettacode.org/​wiki/​File_IO|Task description]]. ​ In other words, read a file into a variable (possibly only a part of the file at a time) and write it out to another file.
 +
 +==== Line by line ====
 +
 +This solution reads from the file one line at a time.  One nice thing about this solution is that you easily can switch it to read from standard input - and possibly anything which your operating system considers a file.
 +
 +<code Ada>with Ada.Command_Line,​ Ada.Text_IO;​ use Ada.Command_Line,​ Ada.Text_IO;​
 +
 +procedure Read_File_Line_By_Line is
 +   ​Read_From : constant String := "​input.txt";​
 +   ​Write_To ​ : constant String := "​output.txt";​
 +
 +   ​Input,​ Output : File_Type;
 +begin
 +   begin
 +      Open (File => Input,
 +            Mode => In_File,
 +            Name => Read_From);
 +   ​exception
 +      when others =>
 +         ​Put_Line (Standard_Error,​
 +                   "​Can not open the file '"​ & Read_From & "'​. Does it exist?"​);​
 +         ​Set_Exit_Status (Failure);
 +         ​return;​
 +   end;
 +
 +   begin
 +      Create (File => Output,
 +              Mode => Out_File,
 +              Name => Write_To);
 +   ​exception
 +      when others =>
 +         ​Put_Line (Standard_Error,​
 +                   "​Can not create a file named '"​ & Write_To & "'​."​);​
 +         ​Set_Exit_Status (Failure);
 +         ​return;​
 +   end;
 +
 +   loop
 +      declare
 +         Line : String := Get_Line (Input);
 +      begin
 +         -- You can process the contents of Line here.
 +         ​Put_Line (Output, Line);
 +      end;
 +   end loop;
 +exception
 +   when End_Error =>
 +      Close (Input);
 +      Close (Output);
 +end Read_File_Line_By_Line;</​code>​
 +
 +This solution requires the Ada 2005 standard library to work.
 +
 +Notice how we avoid explicit checks for read access to the input file, creation/​write access to the output file, as well as availablity of more data to process. ​ Even if we put in explicit checks, we would still have to handle the same exceptions, as another application can change the state of the file system in parallel with this application,​ creating a race condition.
 +==== POSIX.Memory_Mapping ====
 +
 +The [[#​posixmemory_mapping|POSIX.Memory_Mapping]] solution for reading an entire file into memory practically solves this task as well.  Still, it has some limitations which may make it irrelevant for some purposes:
 +  * It only works for an actual file (i.e. one stored on a file system). ​ Specifically it doesn'​t work for standard input, pipes and TCP connections.
 +  * It is not line oriented (i.e. you have to parse line-breaks yourself).
 +  * It requires an implementation of the POSIX Ada API (for example FLORIST or WPOSIX).

Navigation