Example
Search result
Seamless XHTML Searcher
Restricted XHTML Searcher
Element Searcher
Search by EPUB CFI

Searcher

Searcher is experimental now. Note that all interfaces are not stable at all.

Example

epub = EPUB::Parser.parse('childrens-literature.epub')
search_word = 'INTRODUCTORY'
results = EPUB::Searcher.search_text(epub, search_word)
# => [#<EPUB::Searcher::Result:0x007f80ccde9528
#   @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9730 @index=12, @info={}, @type=:character>],
#   @parent_steps=
#    [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccf3d3e8 @index=1, @info={:id=>nil}, @type=:itemref>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9e88 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9e38 @index=0, @info={:name=>"nav", :id=>"toc"}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9de8 @index=1, @info={:name=>"ol", :id=>"tocList"}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9d98 @index=0, @info={:name=>"li", :id=>"np-313"}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9d48 @index=1, @info={:name=>"ol", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9ca8 @index=1, @info={:name=>"li", :id=>"np-317"}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9c08 @index=0, @info={:name=>"a", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9bb8 @index=0, @info={}, @type=:text>],
#   @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9af0 @index=0, @info={}, @type=:character>]>,
#  #<EPUB::Searcher::Result:0x007f80ccebcb30
#   @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebcdb0 @index=12, @info={}, @type=:character>],
#   @parent_steps=
#    [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccde94b0 @index=2, @info={:id=>nil}, @type=:itemref>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd328 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd2d8 @index=0, @info={:name=>"section", :id=>"pgepubid00492"}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd260 @index=3, @info={:name=>"section", :id=>"pgepubid00498"}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd210 @index=1, @info={:name=>"h3", :id=>nil}, @type=:element>,
#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd198 @index=0, @info={}, @type=:text>],
#   @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebd0d0 @index=0, @info={}, @type=:character>]>]
puts results.collect(&:to_cfi).collect(&:to_fragment)
# epubcfi(/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]/2/1,:0,:12)
# epubcfi(/6/6!/4/2[pgepubid00492]/8[pgepubid00498]/4/1,:0,:12)
# => nil

Search result

Search result is an array of EPUB::Searcher::Result and it may be converted to an EPUBCFI string by EPUB::Searcher::Result#to_cfi_s.

Seamless XHTML Searcher

Now default searcher for XHTML is seamless searcher, which ignores tags when searching.

You can search words 'search word' from XHTML document below:

<html>
  <head>
    <title>Sample document</title>
  </head>
  <body>
    <p><em>search</em> word</p>
  </body>
</html>

Restricted XHTML Searcher

You can also use restricted searcher, which means that it can search from only single elements. For instance, it can find 'search word' from XHTML document below:

<html>
  <head>
    <title>Sample document</title>
  </head>
  <body>
    <p>search word</p>
  </body>
</html>

But cannot do so from document below:

<html>
  <head>
    <title>Sample document</title>
  </head>
  <body>
    <p><em>search</em> word</p>
  </body>
</html>

because the words 'search' and 'word' are not in the same element.

To use restricted searcher, specify algorithm option for search method:

results = EPUB::Searcher.search_text(epub, search_word, algorithm: :restricted)

Element Searcher

You can search XHTML elements by CSS selector or XPath.

EPUB::Searcher::Publication.search_element(@package, css: 'ol > li').collect {|result| result[:location]}.map(&:to_fragment)
# => ["epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313])",
#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/2[np-315])",
#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317])",
#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6)",
#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319])",
#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319]/4/2)",
#    :
#    :

Search by EPUB CFI

You can fetch XML node from EPUB document by EPUB CFI.

require "epub/parser"
require "epub/searcher"

epub = EPUB::Parser.parse("childrens-literature.epub")
cfi = EPUB::CFI("/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]")
itemref, node = EPUB::Searcher.search_by_cfi(epub, cfi)
puts itemref.item.full_path
puts node
# EPUB/nav.xhtml
# <li id="np-317" class="front">
#                                                         <a href="s04.xhtml#pgepubid00498">INTRODUCTORY</a>
#                                                 </li>