LSB Library Import Tools

From ISP_RAS
Jump to: navigation, search

Contents

Description

Libtodb2 is a new set of tools for automating import of new libraries to the LSB database.

Original libtodb tool used debug version of libraries and readelf to obtain necessary data. libtodb2 initially also followed this method; however, in many it is possible to obtain more accurate data using headers and gcc. Appropriate modifications to libtodb2 is being developed at the moment; please see Headers Analysis Tool for more information.

Note: We recommend to try out Headers Analysis Tool before proceeding with the upload procedure described in this page.

Prerequisites

  1. Perl
  2. Java 6
  3. 'readelf' command from either binutils or elfutils
  4. ctags
  5. mysql database loaded with the LSB data, see: Creating LSB database

Download

Development version of the tools can be obtained from LSB Bazaar

Usage

Let's consider libtodb2 usage by adding libasound information to the database. We'll need two versions of the library file - stripped (let's say 'libasound.so.2.0.0') and not stripped one, with debug information (e.g. 'libasound.2.0.0.debug'). You may either compile library from sources (compile with '-g' option to get not stripped file, then run 'strip' command on it to obtain the stripped one) or get ready files - some distributions (e.g. Debian) provide debug versions of the libraries as well as stripped ones. Make sure your stripped and debug files are for the same library version and for the same architecture.

Now proceed with the libtodb2 itself:

  • Run 'dump_interfaces.pl' on library file to obtain list of symbols exported by it; redirect its output to 'exported_list' file:
    ./dump_interfaces.pl libasound.so.2.0.0 > exported_list
    File 'exported_list' contains interface names that are exported by our library. 'int_data' file will be also created containing 'data' and 'common' interfaces exported by library.
  • 'exported_list' contains names and versions, semicolon separated. We'll also need list of names without versions:
    cut exported_list -d\; -f1 > exported_names
  • Run readelf on file with debug information and print result to some file, let say 'debug_info':
    readelf -Wwi libasound.so.2.0.0.debug > debug_info
  • Information produced by different readelf versions may have different format. The following script tries to make it more uniform in order to simplify DWARFParser work:
    ./patch_debug_info.pl
  • Run DWARFParser on information obtained:
    cd java && java -classpath util.jar:libtodb.jar DWARFParser ../debug_info ../exported_names ../parse_result > ../parser_out; cd ../
  • Get information about interfaces
    • Run 'get_int_info.pl' to obtain information about functions:
      ./get_int_info.pl -l libasound
      By default information from 'exported_list' and 'parse_result' files will be used, but you can specify file names manually (call ./get_int_info.pl --help for more information). The following files will be created:
      • 'interfaces' file with general information about interfaces - name, version and return type Tname and return type Ttype.
      • 'guessed_ints' containing information about interfaces that are exported by library, cannot be found in debug information but can be associated with other interfaces found in the debug info.
      • 'int_params' - information about parameters.
      • 'not_found_ints' - list of interfaces that are exported by library but are missing debug information and cannot be associated with any other interface.
    • The files created may have duplicated records since some information can be repeated more than once in the debug info. So let's leave only uniq info:
       sort interfaces | uniq >interfaces_tmp && mv interfaces_tmp interfaces
      sort int_params | uniq >int_params_tmp && mv int_params_tmp int_params
      sort guessed_ints | uniq >guessed_ints_tmp && mv guessed_ints_tmp guessed_ints
      sort not_found_ints | uniq >not_found_ints_tmp && mv not_found_ints_tmp not_found_ints
    • Run 'get_data_info.pl' to obtain information about 'data' and 'common' interfaces:
      ./get_data_info.pl -l libasound
      This script works with the same files as get_int_info.pl and supports the same options. The resulting file 'data_ints' should be checked for correctness and joined with 'interfaces':
      cat data_ints >> interfaces
      File 'not_found_data_ints' will contain data interfaces not found in debug info
  • Run 'get_type_info.pl' to obtain information about types:
    ./get_types_info.pl -l libasound
    Information from 'parse_result' file will be processed. You may specify another file name as script argument. Files 'types', 'base_types' and 'type_members' will be created.
  • All previous steps didn't require the database itself. However, the next stages will use some data from it, and each step can be performed only after results from previous step are uploaded to the database. To begin with, let's create Library and LibGroup entries for the new lib:
    INSERT INTO Library VALUES(<lib_id>,'libasound',NULL,NULL);
    INSERT INTO LibGroup VALUES(<lg_id>,'libasound Interfaces',<lib_id>,0,'');
    Note that it makes no sense to create different lib groups on this stage, since all interfaces will be automatically assigned to the same lib group. Such work can be performed later, when all other data is already uploaded and proved to be correct.
  • Collect information about headers. Create file, e.g. 'headers_list', containing path to library's header files. The path may contain both single files and directories, in the latter case all headers inside directory (and all its subdirectories) will be processed. For libasound, header files are located at '/usr/include/alsa':
    echo "/usr/include/alsa" > headers_list
    Now let's call 'header_to_db.pl':
    ./header_to_db.pl -l libasound -t types -i exported_names -h headers_list -p /usr/include/
    This line tells header_to_db to find interfaces from 'exported_names' and types from 'types' file among library headers listed in 'headers_list'. The latter can contain both separate file names and directories (in case of directory all files with '.h' extension inside it will be processed). '-p' option tells the script to remove its value ('/usr/include' in our example) from header names when uploading them to the databse. By default full header names will be printed. The following files will be created:
    • HEADER_QUERY.sql
    • INTERFACE_HEADER
    • INTERFACE_HEADER_QUERY.sql
    • TYPE_HEADER
    • TYPE_HEADER_QUERY.sql
    • types_ORDERED file with type names in the order they appear inside headers (there are no restrictions on ordering of types from different headers, it is simply guaranteed that type declarations from the same header have the same order as inside header). Inside database, types will have the same ordering as in this file. In many cases this ordering is enough to generate correct headers from the database. So backup 'types' file and replace it with 'types_ORDERED' one:
      mv types types.initial && mv types_ORDERED types
  • Insert information about headers into the database:
    mysql lsb <HEADER_QUERY.sql
    (provide mysql with options such as user, dbhost, etc., if needed).
  • Upload type information to the database.
    • Upload general information
      ./get_types_db_id.pl -q types
      (Call it without '-q' to get messages about types not found in the database). 'add_new_types.sql' file will be created. Upload it to the database:
      mysql lsb <add_new_types.sql
    • Upload base types information
       ./basetypes_to_db.pl -l libasound >basetypes_insert.sql
      The script uses data from base_types file. Explore errors produced by this script carefully! There can be not enouhh information for some types in debug file, but in many cases records for such types can be safely omitted. If you got a message like 'Failed to add basetype', try to delete erroneous entry from base_types file and execute the script again (surely, it is strongly recommended to make a backup of base_types). You can also get messages like 'No record for basetype', they should also be explored carefully. In some cases script tries to guess which types should be added to the database. All such suggestions are placed in inserted_basetypes.sql file (there is no need to execute statements from it; they were already executed during basetypes_to_db.pl work). Note: Sometimes readelf gives several distinct base types for the given type; in this case the one will be chosen which is present in header files.
      When everything looks ok, upload collected data to the database:
      mysql lsb <basetypes_insert.sql
    • Upload type<->header mapping:
      mysql lsb <TYPE_HEADER_QUERY.sql
  • Upload interface information to the database:
    • Call 'int_to_db.pl' to process interface data:
      ./int_to_db.pl -l libasound >int_insert.sql
      Note that all interfaces will be assigned to the same lib group. By default the first found lib group for the given library will be used, but you may specify LGid with '--libgroup' option. Sometimes you may obtain "Can't deside to which header interface should be assign" error. In this case you should provide the script with headers priority file, 'headers_priority'. Each line of this file should contain header name and header priority, space separated. If an interface will be met in several headers, it will be assigned to the header with the samller priority value. As a result of script execution, 'INTERFACE_HEADER.priority' file will be created. 'int_errors' file may contain the following messages:
    1. "Can't detect header for 'foo' interface": such interfaces will be inserted in the database, but they will not be assigned to headers.
    2. "Can't find Type record for 'foo' type": this means that the script failed to find record in the Type table corresponding to interface return type. Interface will NOT be inserted in the db. If everything looks ok, the data can be uploaded:
      mysql lsb <int_insert.sql
    • Add interface parameters information:
      ./add_parameters.pl int_params
      SQL setting up interface parameters will be created. If everything looks ok, upload it:
      mysql lsb <int_parameters.sql
    • Call 'process_guessed_ints.pl' to process data about 'guessed' interfaces:
      ./process_guessed_ints.pl
      This script will insert necessary records directly in the database.
    • We have uploaded all interfaces as 'Functions', which is not correct - remember, we have 'data_ints' file with a list of data interfaces. Let's set Itype field to 'Data' for these symbols:
      echo "UPDATE Interface SET Itype='Data' WHERE Ilibrary='libasound' AND Iname IN (" >data_ints_update.sql
      cut data_ints -d\; -f1 | perl -e 'while(<STDIN>) { chomp; print "\n\"$_\","  }; print ");";' | sed s/\,\)/\)/ >> data_ints_update.sql
      mysql < data_ints_update.sql
  • Upload information about type members.
    ./typemember_to_db.pl >typemembers_insert.sql
    The script uses data from type_members file. Note that some types can have members whose types were not uploaded on the previous steps. This means that debug information doesn't contain information about those types. Similar to problems with base types, in many cases such messages can be ignored. However, it can be useful to save such messages since they may help if header files generated from the database for library being uploaded will be incomplete. Now upload the data:
    mysql lsb <typemembers_insert.sql
  • Some complex types (usually structures) are opaque (should not be visible to users) and their contents need not be uploaded to the database:
    DELETE FROM TypeMember WHERE TMmemberof IN (SELECT Tid FROM Type WHERE Tlibrary='libasound' AND Theadgroup=0 AND Tname NOT LIKE 'anon%' AND Ttype <> 'FuncPtr');
  • During upload a huge number of FuncPtr types can be uploaded, but a lot of them are actually the same. Collaps function pointers that are actually the pointers to the same function:
    ./collapse_funcptr.pl
  • Now it is time to decide which interfaces should be actually included in LSB. On the basis of this information, status of types should be set. For the latter purpose we have a script called 'set_appearedin.pl'. It can also mark interfaces as included (either all, if you specify '-a' or '--all option, or selected only, if you provide it with file containing list of interfaces to be included with '-i' or '--ints' option). Additionally, all headers can be marked automatically as included through '-f' or '--headers' options. For testing purposes, it is useful to mark all interfaces, all headers and all types required by interfaces as included:
    ./set_appearedin.pl -l libasound -a -f -v 4.0
    '-v' option says that all entries will be marked as 'included in LSB 4.0'.
  • Well, interfaces and types are uploaded. But header files also contain constants, and it can be useful for LSB headers to define them, too. To collect information about constant and macro definitions 'mkconstfile' from old libtodb tool is used with a wrapper called 'process_defines.pl'. The script should be provided at least with library name and path to directory with header files:
    ./process_defines.pl -l libasound --headers /use/include/alsa
    If headers are not inside some subdirectory of '/usr/include/', then the prefix should be specified with '-p' option designating which part of path to header should be removed when looking for header in the database. For example, if we want to get libasound headers from '/foo/bar/dir/alsa/' directory and in the database all headers are named as 'alsa/header.h', then we should call the scipt as follows:
    ./process_defines.pl -l libasound --headers /foo/bar/dir/alsa -p /foo/bar/dir/
    The script will not insert constants and their values to the database, but print SQL file for every header which can be examined and then uploaded. By default all constants will be marked as 'Generic' (can be changed by '-a' option) and marked as included in LSB 3.2 (can be changed with '-v' option). A separate SQL file is created for every header file; these sqls should be analyzed and uploaded to the database.

Collecting Information for BinOnly Data Symbols

For C++ libraries, some data symbols may be present that visible on binary level only (e.g. 'typeinfo for ...' or 'vtable for ...'). Nevertheless, we should store the sizes of such symbols (as well as for 'normal' data symbols) to generate proper stub libraries. To store such information, dummy entries in the Type table are created with appropriate ArchType entries to store size information. In order to create such entries, one can use mksymsize script.

First, create a file with list of symbols; every line of this file should contain interface library and name, space separated. For example, let's create 'bin_data_list' file with single entry:

echo "libstdcxx _ZNSt10ctype_base5punctE" > bin_data_list

And now call mksymsize:

cat bin_data_list | ./mksymsize -a x86-64 -v 4.0 libstdcxx: /usr/lib64/libstdc++.so.6 > fill_data_sizes.sql

In this call we say that we are processing data for symbols on x86-64 architecture that appeared in LSB 3.2; then the location of libstdcxx library on our system is specified.

Note 1: The script needs database access to create correct SQL queries and not to duplicate entries in the Type table.

Note 2: One can list symbols from different libraries in the bin_data_list file; all such libraries should be specified in mksymsize parameters.

Note 3: Binary only data symbols usually have different values different architectures. mksymsize script creates SQL to fill data only for one architecture; however, in most cases one can be sure that symbol size varies only among 32 bit and 64 bit architectures. Thus, it is not necessary to collect data for every particular architecture. If you have 64bit operating system with 64bit libs in /usr/lib64/ and 32bit libs in /usr/lib/, then it should be enough to perform the following steps:

cat bin_data_list | ./mksymsize -a x86-64 -v 4.0 libstdcxx: /usr/lib64/libstdc++.so.6 > fill_data_sizes.sql
mysql lsb < fill_data_sizes.sql
cat bin_data_list | ./mksymsize -a PPC64 -v 4.0 libstdcxx: /usr/lib64/libstdc++.so.6 >> fill_data_sizes.sql
cat bin_data_list | ./mksymsize -a S390X -v 4.0 libstdcxx: /usr/lib64/libstdc++.so.6 >> fill_data_sizes.sql
cat bin_data_list | ./mksymsize -a IA64 -v 4.0 libstdcxx: /usr/lib64/libstdc++.so.6 >> fill_data_sizes.sql
cat bin_data_list | ./mksymsize -a IA32 -v 4.0 libstdcxx: /usr/lib/libstdc++.so.6 >> fill_data_sizes.sql
cat bin_data_list | ./mksymsize -a PPC32 -v 4.0 libstdcxx: /usr/lib/libstdc++.so.6 >> fill_data_sizes.sql
cat bin_data_list | ./mksymsize -a S390 -v 4.0 libstdcxx: /usr/lib/libstdc++.so.6 >> fill_data_sizes.sql

Please note that we are inserting resulting fill_data_sizes.sql file into db right after the first call to mksymsize and only then this script is called for other 6 architectures. This is important, since during the first pass the script will create entry in the Type table, and on other passes it will only insert appropriate ArchType entries.

Note 4: To obtain the list of symbols, one can use mkstublibs script (stub libraries generator). In this case one should upload library to the database first and then try to generate stub libraries. The script will produce warnings like liba: No data for symbol b (Iid). The list of such warnings is actually what you need - just remove extra word using sed, for example:

mkstublibs -a x86-64 -v 4.0 2>&1 | grep "No data" | cut -f1 -d\( | sed s/:\ No\ data\ for\ symbol// > symbols_list

Importing symbol version

If you use the headertodb2.pl script, you may notice that it (by its design) doesn't pick up some important information - e.g., symbol versions (that just cannot be extracted from header files).

Symbol versions can be added independently; to do this, you need libtodb2/dump_interfaces.pl and libtodb2/add_symbol_versions.sh scripts, as well as shared object file to be analyzed.

The steps are like the following:

dump_interface.pl libfoo.so.1
add_symbol_versions.sh exported_list int_data >add_versions.sql

That's all - 'add_versions.sql can be applyed to the database to update information about symbol versions. This sql assumes that information about symbols themselves (Interface table entries) is already present in the database.

Personal tools