B2FIND Search Guide
- The B2FIND Portal
- Command Line Interface
- Use Case Scenario
The EUDAT metadata service B2FIND can be utilized in two ways:
The discovery web portal supports user-friendly navigation and filtering features. Powerful search functionalities are provided, that include:
- Free text search over the full text bodies of all datasets indexed in the B2FIND catalog
- Geospatial and temporal search for all datasets, that cover a chosen region or, respectively, a chosen time period.
- Other 'faceted' search, i.e. selecting values from certain metadata fields
- The script searchB2FIND.py enables submitting search requests from the command line using the CKAN API functionality
The according search requests can be combined and executed in one go. A successful search results in the list of all datasets of the B2FIND catalogue, which fulfill the search criteria. The metadata fields of each found dataset can be displayed and comprises as well links to access the underlying data objects. In the following we describe the usage of B2FIND step by step.
By clicking 'Communities' you get an overview about all communities that provide metadata to B2FIND.
There are two ways to start the search process and to get taken to the search result page (fig. 2), with all available datasets listed on the right side and the interface to several search and filter functionalities in the navigation bar on the left side:
- By clicking 'Faceted Search'
- By pressing the magnifying glass in the free text field 'Search your data'. (At this stage you can already enter a string to be searched for or choose one of the shown 'Popular Tags').
As in the entry page, shown in fig. 1, as well the main search page provides a "Google-like" free text search that basically works with an input box where you can type your query.
You would usually simply type the keywords you are interested in and hit return. For example, if you are interested in documents or datasets on Climate and you know that somebody with the name Scott did something with it you would type
and get this result:
You may use the free text search field as well to search for certain values of 'keys' or 'facets' by using a colon. Facets are searchable B2FIND categories. Please have a look at the next section for a detailed explanation of the interfaces to smart faceted search functionalities in the navigation bar. Here we only address the possibility to search for facets via the 'Free Text Search Field'.
For example you may find all resources that are originated from the discipline Biology by typing:
A full list of facets, which can be searched for in the B2FIND catalog, and their description is found in the user documentation in the section Metadata field definitions. Please note that this search method is case intensive and requires accurate spelling. E.g. typing
will lead to no result, because there exists no facet discipline with small letter d.
In order to specify your search you may combine several different search methods. With the Boolean Operators AND and OR you can add or replace and exclude keywords or facets. Regarding our example you may search for all resources within the discipline Biology that include the word foraminifera or have something to do with the name Schiebel by typing a query like this:
The faceted search interface provides you with options to filter your search by choosing 'facets'. This tool may help you to narrow down the search results for your specific search demands.
B2FIND provides the opportunity to filter out datasets that have a given extent in space or in time. This is implemented by the following three graphical interfaces :
- 'Filter by location' searchs for all datasets which have an intersection with a region chosen from the world map.
- 'Filter by time' searchs for all datasets which cover a chosen time period.
- 'Publication Year' searchs for all datasets, which are 'published' within a given period of years.
In the world map widget in the left upper corner you can select a region by drag and drop. This triggers a search for all datasets their spatial extension has an intersection with the selected region.
|1. Clear possibly previous search request by clicking the button 'Clear' in the 'Filter by location' interface in the navigation bar.|
2. Selection of a region from the world map
2.a Click on the 'Draw a rectangle' button in the right upper corner of the world map widget to start the spatial selection.
|2.b Drag with the mouse over the wished spatial selection, the rectangle's borders will be marked red. Finally press the 'Apply' button to execute the search request.|
|3. Search result: If execution of the query is finished, all datasets whose spatial extension has an overlap with the selected region are listed in the right panel.|
With the search widget 'Filter by time' you can select a time period the research data are related to by zooming in date and time histograms as described in the following.
|1. Clear possibly previous search request by clicking the 'Clear' button of the 'Filter by time' interface in the navigation bar.|
|2. Clicking the button 'Filter by time' in the navigation panel opens the time line chart.|
|2.a Select a base period by dragging the mouse with pressed left in the 'histogram' at the bottom. This causes the opening of the datasets/time diagram over the chosen period on the upper part of the chart.|
2.b Zooming in time: 'Drag and drop' with the left mouse button hold over a time interval in the upper time graph (zoomed part is shown).
2.c Repeat zooming until the desired section is shown in the chart.
2.d Optionally you can reset the last zoom by clicking the button 'Reset zoom'.
3. Select a time interval by holding "Ctrl" (Win/Linux) or "Cmd" (Mac) down and clicking on two points (start and end time) in the chart line.
Close the timeline chart by clicking on 'Apply'.
4. Up to now the search request is applied but not executed. To perform the search you must click on the magnifying glass within the free text search field.
The amount of datasets that are shown is reduced and adapted to the chosen time-period, this period is displayed on the left side time-boxes as well.
|5. Please note that the time-period is displayed in 'Seconds since/before Christ'. For a better understanding the real date is displayed if you mouse-over the digits. The start and end time can as well directly changed by editing the integers in the fields or by using the arrow buttons to in- or decrease the time span.|
With the search widget 'Publication Year' you can search for datasets that are published within a certain period of time.
|1. Clear possibly previous search request by clicking the 'Clear' button.|
2. Select a period of publishing years
2.a Select a start year: Click on the left text field in the interface 'Publication Year'. This results in opening a select panel showing the years of the current decade. By clicking on the '<<' or the '>>' button you can switch to the previous or following decade. Finally select the wished start year by clicking on it in the decade panel. This will apply the chosen year for the search, and all datasets that are published in this year or later are listed.
|2.b Select an end year: Click on the right text field in the interface 'Publication Year'. The selection of the end year works then in teh same manner as for the start year and excecutes the search for all datasets published between start and end year.|
|3. Search result: If execution of the query is finished, all datasets that are published within the chosen years are listed in the right panel.|
Filter out by autocomplete functionality: By typing in Morr in the filter field of the menue 'Creator' the full value list is restricted to all names cntaining the string 'Morr'.
|Select a name from creator list: Click on Morris Riedel (10) and the ten matching datasets will be listed. Furthermore you can see that Morris Riedel has two datasets created, where as well Gabrielle Cavallero is a 'Creator'. If you want to narrow down the results to these two records, just click on this name.|
The datasets found to fulfill the search criteria are shown on the result page on the right side of the portal. Consider e.g. the following examplary use case. An user wants to search within the community B2SHARE for all data created by Morris Riedel, belonging to discipline Remote sensing and tagged with the keyword cross validation.
By clicking the dataset title the dataset view opens. The textual metadata are shown on the right side; on top the title, the description and the tags as clickable buttons and underneath the other B2FIND fields with their values in tabulated form. On the left side bar the spatial extent is shown as a red bounding box.
Among these fields are links provided that enable access on related data.
- The data resources the metadata are related to.
- The original metadata - as harvested from the provider by OAI-PMH and formatted in XML.
If you are interested to download the data resource lying behind the found metadata, use the link provided in the field 'Source'. As far as available additionally the associated 'PID' and/or 'DOI' is provided.
In our example the link leads you to the landing page of the related data of the WDCC. On this page are further metadata provided and a further link under 'Data access' is provided, to allow you the download of the data, if you have the needed authorisation information.
If you want to examine the 'raw' metadata as originally harvested by B2FIND from the data provider you can click on the field 'MetadataAccess' and the associated XML record is listed in the browser.
In this case the metadata is provided in community specific format (ISO19139) and namespaces are used to describe the geo referenced fields.
Beside the possibility to search over the web interface you can as
well use the CKAN API suite to submit search requests
directly from the command line.
Adapted for the needs and features of B2FIND the Python
searchB2FIND.py is provides a powerful and
userfriendly tool to submit complex search demands. The script resides in the git repository
https://github.com/EUDAT-B2FIND/md-ingestion and can be downloaded
If you want for instance a list of all records which belongs to the discipline 'Earth Sciences', enter:
>./searchB2FIND.py Discipline:"Earth?Sciences" ---------------------------------------------------------------------------------------------------- Search in b2find.eudat.eu for pattern Discipline:Earth?Sciences ..... => 3410 datasets found
The script writes the list of the IDs of the found records to the file results.txt.
Further options and arguments are shown by entering:
>./searchB2FIND.py -h usage: searchB2FIND.py [-h] [--ckan IP/URL] [--output STRING] [--community STRING] [--ids [IDS [IDS ...]]] [PATTERN [PATTERN ...]] Description: Lists identifers of datasets that fulfill the given search criteria positional arguments: PATTERN CKAN search pattern, i.e. by logical conjunctions joined field:value terms. optional arguments: -h, --help show this help message and exit --ckan IP/URL CKAN portal address, to which search requests are submitted (default is b2find.eudat.eu) --output STRING, -o STRING Output file name and format. Format is determined by the extention, supported are 'txt' (plain ascii file) or 'hd5' file. Default is the ascii file results.txt. --community STRING, -c STRING Community where you want to search in --ids [IDS [IDS ...]], -i [IDS [IDS ...]] Identifiers of found records outputed. Default is 'id'. Additionally 'Source','PID' and 'DOI' are supported. Examples: 1. >./searchB2FIND.py -c aleph tags:LEP searchs for all datasets of community ALEPH with tag "LEP" in b2find.eudat.eu. 2. >./searchB2FIND.py author:"Jones*" AND Discipline:"Crystal?Structure" --ckan eudat-b1.dkrz.de searchs in eudat-b1.dkrz.de for all datasets having an author starting with "Jones" and belongs to the discipline "Crystal Structure" 3. >./searchB2FIND.py -c narcis DOI:'*' --ids DOI returns the list of id's and DOI's for all records in community "NARCIS" that have a DOI
To demonstrate a possible combined search via the provided faceted search functionalities we take the example of th following use case scenario :
A biologist is looking for research data in her discipline concerning the temporal period from year 1897 to 2012 and the European region. She is interested only in datasets published between 2009 and 2016 and created by her colleague V. Neumann.
The results can be narrowed down step by step using the filter functionalities provided in the navigation bar as illustrated in the following rows. While on the left hand side the search request are described in words, in the other two columns the related select action and resulting display page are shown, respectively.
|Description of the search request||Selection action in the navigation bar||Search result page|
|She starts the search by choosing the 'Filter of time' tool in order to get only search results taht are related with the period from 1897 to 2012|
|In order to filter out all datasets they are related to the European area she draws a bounding box surrounding the European continent.|
|In the widget 'Publication year' the researcher selects as start year 2009 and as end year 2016.|
|Next our Scientist restricts her query to her research area, i.e. she chooses 'Biology' form the facet 'Discipline' and get 443 datasets left.|
|Finally she selects the 'Creator' 'Neumann, V.' and results in two datasets. Above the title of all chosen foki (values) of the textual facets - here for Discipline and Creator - are displayed. By clicking on the 'x' a new search query will start that exclude the closed facets.|