Searcher
Searcher is experimental now. Note that all interfaces are not stable at all.
Example
epub = EPUB::Parser.parse('childrens-literature.epub')
search_word = 'INTRODUCTORY'
results = EPUB::Searcher.search_text(epub, search_word)
# => [#<EPUB::Searcher::Result:0x007f80ccde9528
# @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9730 @index=12, @info={}, @type=:character>],
# @parent_steps=
# [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccf3d3e8 @index=1, @info={:id=>nil}, @type=:itemref>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9e88 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9e38 @index=0, @info={:name=>"nav", :id=>"toc"}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9de8 @index=1, @info={:name=>"ol", :id=>"tocList"}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9d98 @index=0, @info={:name=>"li", :id=>"np-313"}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9d48 @index=1, @info={:name=>"ol", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9ca8 @index=1, @info={:name=>"li", :id=>"np-317"}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9c08 @index=0, @info={:name=>"a", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde9bb8 @index=0, @info={}, @type=:text>],
# @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9af0 @index=0, @info={}, @type=:character>]>,
# #<EPUB::Searcher::Result:0x007f80ccebcb30
# @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebcdb0 @index=12, @info={}, @type=:character>],
# @parent_steps=
# [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccde94b0 @index=2, @info={:id=>nil}, @type=:itemref>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd328 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd2d8 @index=0, @info={:name=>"section", :id=>"pgepubid00492"}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd260 @index=3, @info={:name=>"section", :id=>"pgepubid00498"}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd210 @index=1, @info={:name=>"h3", :id=>nil}, @type=:element>,
# ##<EPUB::Searcher::Result::Step:0x007f80ccebd198 @index=0, @info={}, @type=:text>],
# @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebd0d0 @index=0, @info={}, @type=:character>]>]
puts results.collect(&:to_cfi).collect(&:to_fragment)
# epubcfi(/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]/2/1,:0,:12)
# epubcfi(/6/6!/4/2[pgepubid00492]/8[pgepubid00498]/4/1,:0,:12)
# => nil
Search result
Search result is an array of EPUB::Searcher::Result and it may be converted to an EPUBCFI string by EPUB::Searcher::Result#to_cfi_s.
Seamless XHTML Searcher
Now default searcher for XHTML is seamless searcher, which ignores tags when searching.
You can search words 'search word' from XHTML document below:
<html>
<head>
<title>Sample document</title>
</head>
<body>
<p><em>search</em> word</p>
</body>
</html>
Restricted XHTML Searcher
You can also use restricted searcher, which means that it can search from only single elements. For instance, it can find 'search word' from XHTML document below:
<html>
<head>
<title>Sample document</title>
</head>
<body>
<p>search word</p>
</body>
</html>
But cannot do so from document below:
<html>
<head>
<title>Sample document</title>
</head>
<body>
<p><em>search</em> word</p>
</body>
</html>
because the words 'search' and 'word' are not in the same element.
To use restricted searcher, specify algorithm
option for search
method:
results = EPUB::Searcher.search_text(epub, search_word, algorithm: :restricted)
Element Searcher
You can search XHTML elements by CSS selector or XPath.
EPUB::Searcher::Publication.search_element(@package, css: 'ol > li').collect {|result| result[:location]}.map(&:to_fragment)
# => ["epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313])",
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/2[np-315])",
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317])",
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6)",
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319])",
# "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319]/4/2)",
# :
# :
Search by EPUB CFI
You can fetch XML node from EPUB document by EPUB CFI.
require "epub/parser"
require "epub/searcher"
epub = EPUB::Parser.parse("childrens-literature.epub")
cfi = EPUB::CFI("/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]")
itemref, node = EPUB::Searcher.search_by_cfi(epub, cfi)
puts itemref.item.full_path
puts node
# EPUB/nav.xhtml
# <li id="np-317" class="front">
# <a href="s04.xhtml#pgepubid00498">INTRODUCTORY</a>
# </li>