DjVu: Difference between revisions
imported>AnomieBOT m Dating maintenance tags: {{Cn}} |
imported>Alexcalamaro →Format software: ce |
||
| Line 29: | Line 29: | ||
== History == | == History == | ||
The DjVu technology was originally developed by [[Yann LeCun]], [[Léon Bottou]], [[Patrick Haffner]], [[Paul G. Howard]], [[Patrice Simard]], and [[Yoshua Bengio]] at [[AT&T Labs]] | The DjVu technology was originally developed, from 1996 to 2001,<ref name="djvupaper" /> by [[Yann LeCun]], [[Léon Bottou]], [[Patrick Haffner]], [[Paul G. Howard]], [[Patrice Simard]], and [[Yoshua Bengio]] at [[AT&T Labs]] in [[Red Bank, New Jersey]].<ref name="yann.lecun/djvu">{{cite web |title=Yann's DjVu Page |url=http://yann.lecun.com/ex/djvu/ |website=yann.lecun.com |access-date=8 July 2025}}</ref> | ||
Prior to the standardization of [[PDF]] in 2008,<ref>{{cite web |url=https://www.iso.org/standard/51502.html |title=ISO 32000-1:2008 – Document management – Portable document format – Part 1: PDF 1.7 |website=Iso.org |date=2008-07-01 |access-date=2010-02-21}}</ref><ref>{{cite web |last=Orion |first=Egan |title=PDF 1.7 is approved as ISO 32000 |website=[[The Inquirer]] |publisher=[[Incisive Media]] |date=2007-12-05 |url=http://www.theinquirer.net/gb/inquirer/news/2007/12/05/pdf-approved-iso-32000 |access-date=2007-12-05 |url-status=dead |archive-url=https://web.archive.org/web/20071213004627/http://www.theinquirer.net/gb/inquirer/news/2007/12/05/pdf-approved-iso-32000 |archive-date=December 13, 2007}}</ref> DjVu was considered superior because it is an [[open file format]],{{cn|date=March 2025}} in contrast to the [[proprietary file format|proprietary]] nature of PDF at the time. The declared higher compression ratio (and thus smaller file size) and the claimed ease of converting large volumes of text into DjVu format were other arguments for DjVu's superiority over PDF in 2004. Independent technologist [[Brewster Kahle]] in a 2004 talk on IT Conversations discussed the benefits of allowing easier access to DjVu files.<ref>{{cite web |url=http://itc.conversationsnetwork.org/shows/detail400.html |author=Brewster Kahle |title=Universal Access to All Knowledge |format=Audio; Speech at 1h:31 m:20s |publisher=Conversations Network |date=December 16, 2004}}</ref><ref>{{cite web |url=https://www.ecmconnection.com/doc/lizardtech-to-open-source-a-djvu-java-viewer-0001 |title=LizardTech To Open Source A DjVu Java Viewer |website=ECM Connection |date=7 December 2004 |access-date= 18 August 2017}}</ref> | Prior to the standardization of [[PDF]] in 2008,<ref>{{cite web |url=https://www.iso.org/standard/51502.html |title=ISO 32000-1:2008 – Document management – Portable document format – Part 1: PDF 1.7 |website=Iso.org |date=2008-07-01 |access-date=2010-02-21}}</ref><ref>{{cite web |last=Orion |first=Egan |title=PDF 1.7 is approved as ISO 32000 |website=[[The Inquirer]] |publisher=[[Incisive Media]] |date=2007-12-05 |url=http://www.theinquirer.net/gb/inquirer/news/2007/12/05/pdf-approved-iso-32000 |access-date=2007-12-05 |url-status=dead |archive-url=https://web.archive.org/web/20071213004627/http://www.theinquirer.net/gb/inquirer/news/2007/12/05/pdf-approved-iso-32000 |archive-date=December 13, 2007}}</ref> DjVu was considered superior because it is an [[open file format]],{{cn|date=March 2025}} in contrast to the [[proprietary file format|proprietary]] nature of PDF at the time. The declared higher compression ratio (and thus smaller file size) and the claimed ease of converting large volumes of text into DjVu format were other arguments for DjVu's superiority over PDF in 2004. Independent technologist [[Brewster Kahle]] in a 2004 talk on IT Conversations discussed the benefits of allowing easier access to DjVu files.<ref>{{cite web |url=http://itc.conversationsnetwork.org/shows/detail400.html |author=Brewster Kahle |title=Universal Access to All Knowledge |format=Audio; Speech at 1h:31 m:20s |publisher=Conversations Network |date=December 16, 2004}}</ref><ref>{{cite web |url=https://www.ecmconnection.com/doc/lizardtech-to-open-source-a-djvu-java-viewer-0001 |title=LizardTech To Open Source A DjVu Java Viewer |website=ECM Connection |date=7 December 2004 |access-date= 18 August 2017}}</ref> | ||
The DjVu library distributed as part of the open-source package ''DjVuLibre'' has become the [[reference implementation]] for the DjVu format. DjVuLibre has been maintained and updated by the original developers of DjVu since 2002.<ref name="sourceforge.net">{{cite web |url= | The DjVu library distributed as part of the open-source package ''DjVuLibre'' has become the [[reference implementation]] for the DjVu format. DjVuLibre has been maintained and updated by the original developers of DjVu since 2002.<ref name="sourceforge.net">{{cite web |url=https://djvu.sourceforge.net/ |title=DjVuLibre: Open Source DjVu library and viewer |website=djvu.sourceforge.net}}</ref> | ||
The DjVu file format specification has gone through a number of revisions, the most recent being from 2005. | The DjVu file format specification has gone through a number of revisions, the most recent being from 2005. | ||
| Line 83: | Line 83: | ||
During a number of years, significantly overlapping with the period when DjVu was being developed, there were no PDF viewers for free operating systems—a particular stumbling block was the rendering of vectorised fonts, which are essential for combining small file size with high resolution in PDF. Since displaying DjVu was a simpler problem for which free software was available, there were suggestions that the [[free software movement]] should employ DjVu instead of PDF for distributing documentation; rendering for creating DjVu is in principle not much different from rendering for a device-specific printer driver, and DjVu can as a last resort be generated from scans of paper media. However, when [[FreeType]] 2.0 in 2000 began to provide rendering of all major vectorised font formats, that specific advantage of DjVu began to erode. | During a number of years, significantly overlapping with the period when DjVu was being developed, there were no PDF viewers for free operating systems—a particular stumbling block was the rendering of vectorised fonts, which are essential for combining small file size with high resolution in PDF. Since displaying DjVu was a simpler problem for which free software was available, there were suggestions that the [[free software movement]] should employ DjVu instead of PDF for distributing documentation; rendering for creating DjVu is in principle not much different from rendering for a device-specific printer driver, and DjVu can as a last resort be generated from scans of paper media. However, when [[FreeType]] 2.0 in 2000 began to provide rendering of all major vectorised font formats, that specific advantage of DjVu began to erode. | ||
In the 2000s, with the growth of the [[World Wide Web]] and before widespread adoption of [[broadband]], DjVu was often adopted by [[digital library|digital libraries]] as their format of choice, thanks to its integration with software like [[Greenstone (software)|Greenstone]]<ref>{{cite web |url=http://wiki.greenstone.org/doku.php?id=nzdl:projects |title=nzdl:projects - Greenstone |website=Wiki.greenstone.org |access-date=7 December 2021}}</ref> and the [[Internet Archive]],<ref>{{cite web |url=https://blog.lib.uiowa.edu/hardinmd/2008/09/05/google-books-vs-djvu-in-internet-archive/ |title=Google Books vs DjVu in Internet Archive |website=Blog.libuiowa.edu |date=2018-09-05 |author=Eric Rumsey |access-date=2018-08-21 |archive-date=2018-08-22 |archive-url=https://web.archive.org/web/20180822014943/https://blog.lib.uiowa.edu/hardinmd/2008/09/05/google-books-vs-djvu-in-internet-archive/ |url-status=dead }}</ref> browser plugins which allowed advanced online browsing, smaller file size for comparable quality of book scans and other image-heavy documents<ref>{{cite web |url=https://blog.lib.uiowa.edu/hardinmd/2008/09/10/djvu-again/ |title=DjVu again |date=2018-09-10 |author=Eric Rumsey |website=Blog.libuiowa.edu}}</ref> and support for embedding and searching full text from [[optical character recognition|OCR]].<ref>{{cite web |url=https://blog.archive.org/2004/12/09/new-book-collection-color-scans-djvu-some-pdf/ |title=New book collection: color scans, djvu, some pdf |format=PDF |website=Blog.archive.org |date=2004-12-09 |author=Jeff Kaplan}}</ref><ref>{{cite book |chapter=Efficient search in hidden text of large DjVu documents |date=2011-09-12 |author=Janusz S. Bień |title=Advanced Language Technologies for Digital Libraries |series=Lecture Notes in Computer Science |volume=6699 |pages=1–14 |doi=10.1007/978-3-642-23160-5_1 |isbn=978-3-642-23159-9 |s2cid=3095526 |url=http://bc.klf.uw.edu.pl/177/3/JSB_Alt4dl-2010u.pdf}}</ref> | In the 2000s, with the growth of the [[World Wide Web]] and before widespread adoption of [[broadband]], DjVu was often adopted by [[digital library|digital libraries]] as their format of choice, thanks to its integration with software like [[Greenstone (software)|Greenstone]]<ref>{{cite web |url=http://wiki.greenstone.org/doku.php?id=nzdl:projects |title=nzdl:projects - Greenstone |website=Wiki.greenstone.org |access-date=7 December 2021}}</ref> and the [[Internet Archive]],<ref>{{cite web |url=https://blog.lib.uiowa.edu/hardinmd/2008/09/05/google-books-vs-djvu-in-internet-archive/ |title=Google Books vs DjVu in Internet Archive |website=Blog.libuiowa.edu |date=2018-09-05 |author=Eric Rumsey |access-date=2018-08-21 |archive-date=2018-08-22 |archive-url=https://web.archive.org/web/20180822014943/https://blog.lib.uiowa.edu/hardinmd/2008/09/05/google-books-vs-djvu-in-internet-archive/ |url-status=dead }}</ref> browser plugins which allowed advanced online browsing, smaller file size for comparable quality of book scans and other image-heavy documents<ref>{{cite web |url=https://blog.lib.uiowa.edu/hardinmd/2008/09/10/djvu-again/ |title=DjVu again |date=2018-09-10 |author=Eric Rumsey |website=Blog.libuiowa.edu |access-date=2018-08-21 |archive-date=2018-08-22 |archive-url=https://web.archive.org/web/20180822014921/https://blog.lib.uiowa.edu/hardinmd/2008/09/10/djvu-again/ |url-status=dead }}</ref> and support for embedding<ref name="filosofie/solcan/odjv">{{cite web |last1=Solcan |first1=Mihail Radu |title=Insert OCRed text in DjVu (automatic method) |url=http://www.ub-filosofie.ro/~solcan/wt/gnu/d/odjv.html |website=www.ub-filosofie.ro |publisher=Faculty of Philosophy at the [[University of Bucharest]] |date=2009-02-03}}</ref> and searching full text from [[optical character recognition|OCR]].<ref>{{cite web |url=https://blog.archive.org/2004/12/09/new-book-collection-color-scans-djvu-some-pdf/ |title=New book collection: color scans, djvu, some pdf |format=PDF |website=Blog.archive.org |date=2004-12-09 |author=Jeff Kaplan}}</ref><ref>{{cite book |chapter=Efficient search in hidden text of large DjVu documents |date=2011-09-12 |author=Janusz S. Bień |title=Advanced Language Technologies for Digital Libraries |series=Lecture Notes in Computer Science |volume=6699 |pages=1–14 |doi=10.1007/978-3-642-23160-5_1 |isbn=978-3-642-23159-9 |s2cid=3095526 |url=http://bc.klf.uw.edu.pl/177/3/JSB_Alt4dl-2010u.pdf |archive-date=2021-11-03 |access-date=2021-10-16 |archive-url=https://web.archive.org/web/20211103125051/http://bc.klf.uw.edu.pl/177/3/JSB_Alt4dl-2010u.pdf |url-status=dead }}</ref> | ||
Some features such as the thumbnail previews were later integrated in the Internet Archive's BookReader<ref>{{cite web |url=https://blog.lib.uiowa.edu/hardinmd/2010/10/19/internet-archives-bookreader-thumbnail-view/ |title=Internet Archive's BookReader Thumbnail View |date=2010-09-10 |author=Eric Rumsey |website=Blog.libuiowa.edu}}</ref> and DjVu browsing was deprecated in its favour as around 2015 some major browsers stopped supporting [[NPAPI]] and DjVu plugins with them.<ref name=ia2016>{{cite web |url=https://archive.org/post/1053214/djvu-files-for-new-uploads |date=2016-02-26 |title=DjVu files for new uploads |author1=[[Brewster Kahle]] |author2=Jeff Kaplan |website=Archive.org}}</ref> | Some features such as the thumbnail previews were later integrated in the Internet Archive's BookReader<ref>{{cite web |url=https://blog.lib.uiowa.edu/hardinmd/2010/10/19/internet-archives-bookreader-thumbnail-view/ |title=Internet Archive's BookReader Thumbnail View |date=2010-09-10 |author=Eric Rumsey |website=Blog.libuiowa.edu |access-date=2018-08-21 |archive-date=2018-08-22 |archive-url=https://web.archive.org/web/20180822014854/https://blog.lib.uiowa.edu/hardinmd/2010/10/19/internet-archives-bookreader-thumbnail-view/ |url-status=dead }}</ref> and DjVu browsing was deprecated in its favour as around 2015 some major browsers stopped supporting [[NPAPI]] and DjVu plugins with them.<ref name=ia2016>{{cite web |url=https://archive.org/post/1053214/djvu-files-for-new-uploads |date=2016-02-26 |title=DjVu files for new uploads |author1=[[Brewster Kahle]] |author2=Jeff Kaplan |website=Archive.org}}</ref> | ||
== Design == | == Design == | ||
| Line 152: | Line 150: | ||
== Licensing == | == Licensing == | ||
DjVu is an [[open file format]] with patents.<ref name="DjVu" /> The file format specification is published, as well as source code for the reference library.<ref name="DjVu" /> The original authors distribute an [[open-source software|open-source]] implementation named "''DjVuLibre''" under the [[GNU General Public License]] and a patent grant.<ref>{{cite web | url=https://djvu.sourceforge.net/licensing.html | title=DjVuLibre: Open Source DjVu library and viewer }}</ref> The rights to the commercial development of the encoding software have been transferred to different companies over the years, including [[AT&T Corporation]], [[LizardTech]],<ref>{{cite web |url=https://www.lizardtech.com/company/about |title=Company – About – LizardTech |last=Extensis |website=Lizardtech.com}}</ref> ''Celartem''<ref name="bloomberg.com">{{cite web |url=https://www.bloomberg.com/profile/company/2590570Z:US |title=Celartem, Inc.: Private Company Information – Bloomberg |website=Bloomberg.com}}</ref> and ''ePapyrus Solutions K.K.'' (formerly ''Cuminas''<ref>{{cite web |url=https://www.cuminas.jp/company |title=会社情報 - Cuminas Corporation |website=Cuminas.jp |access-date=2018-01-14 |archive-url=https://web.archive.org/web/20180115001836/https://www.cuminas.jp/company |archive-date=2018-01-15 |url-status=dead}}</ref> before joining ePapyrus Solutions, Inc.<ref name="epapyrus">{{cite web |date=2022-06-03 |script-title=ja:株式譲渡および完全子会社化のお知らせ |trans-title=Notice regarding share transfer and becoming a wholly owned subsidiary |work=epapyrus.jp |language=ja |url=https://www.epapyrus.jp/cat_news/2366 |access-date=2024-12-08}}</ref>).<ref name="epapyrus.jp">{{cite web |date=2023-11-06 |script-title=ja:会社名変更のお知らせ |trans-title=Notice of company name change |work=epapyrus.jp |language=ja |url=https://www.epapyrus.jp/cat_news/2548 |access-date=2024-12-08}}</ref> Patents typically have an expiry term of about 20 years. | DjVu is an [[open file format]] with patents.<ref name="DjVu" /> The file format specification is published, as well as source code for the reference library.<ref name="DjVu" /> The original authors distribute an [[open-source software|open-source]] implementation named "''DjVuLibre''" under the [[GNU General Public License]] and a patent grant.<ref>{{cite web | url=https://djvu.sourceforge.net/licensing.html | title=DjVuLibre: Open Source DjVu library and viewer }}</ref> The rights to the commercial development of the encoding software have been transferred to different companies over the years, including [[AT&T Corporation]], [[LizardTech]],<ref>{{cite web |url=https://www.lizardtech.com/company/about |title=Company – About – LizardTech |last=Extensis |website=Lizardtech.com |access-date=2018-01-14 |archive-date=2018-01-15 |archive-url=https://web.archive.org/web/20180115001539/https://www.lizardtech.com/company/about |url-status=dead }}</ref> ''Celartem''<ref name="bloomberg.com">{{cite web |url=https://www.bloomberg.com/profile/company/2590570Z:US |title=Celartem, Inc.: Private Company Information – Bloomberg |website=Bloomberg.com}}</ref> and ''ePapyrus Solutions K.K.'' (formerly ''Cuminas''<ref>{{cite web |url=https://www.cuminas.jp/company |title=会社情報 - Cuminas Corporation |website=Cuminas.jp |access-date=2018-01-14 |archive-url=https://web.archive.org/web/20180115001836/https://www.cuminas.jp/company |archive-date=2018-01-15 |url-status=dead}}</ref> before joining ePapyrus Solutions, Inc.<ref name="epapyrus">{{cite web |date=2022-06-03 |script-title=ja:株式譲渡および完全子会社化のお知らせ |trans-title=Notice regarding share transfer and becoming a wholly owned subsidiary |work=epapyrus.jp |language=ja |url=https://www.epapyrus.jp/cat_news/2366 |access-date=2024-12-08}}</ref>).<ref name="epapyrus.jp">{{cite web |date=2023-11-06 |script-title=ja:会社名変更のお知らせ |trans-title=Notice of company name change |work=epapyrus.jp |language=ja |url=https://www.epapyrus.jp/cat_news/2548 |access-date=2024-12-08}}</ref> Patents typically have an expiry term of about 20 years. | ||
Celartem acquired LizardTech and [[Extensis]].<ref>{{cite web |url=http://www.celartem.com/en/company-overview/ |title=Company Overview – Celartem Technology, Inc. |website=Celartem.com |access-date=7 December 2021 |archive-date=27 May 2019 |archive-url=https://web.archive.org/web/20190527231303/http://www.celartem.com/en/company-overview/ |url-status=dead }}</ref><ref>{{cite web |url=https://www.extensis.com/newsroom/celartem-technology-announces-merger-of-us-holdings/ |title=Celartem Technology Announces Merger of US Holdings – Extensis.com |access-date=2018-01-14 |archive-url=https://web.archive.org/web/20180115003244/https://www.extensis.com/newsroom/celartem-technology-announces-merger-of-us-holdings/ |archive-date=2018-01-15 |url-status=dead}}</ref><ref name="bloomberg.com" /><ref>{{cite web |url=https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=3046493 |title=Celartem Technology Inc.: Private Company Information – Bloomberg |website=Bloomberg.com}}</ref><ref>{{cite web |url=https://bigpicturemag.com/celartem-sells-extensis-and-lizardtech-plugins-and-xtensions-onone-software |title=Celartem Sells Extensis and LizardTech Plugins and XTensions to onOne Software – Big Picture – Wide Format Printing |website=bigpicture.net|date=28 July 2005 }}</ref> | Celartem acquired LizardTech and [[Extensis]].<ref>{{cite web |url=http://www.celartem.com/en/company-overview/ |title=Company Overview – Celartem Technology, Inc. |website=Celartem.com |access-date=7 December 2021 |archive-date=27 May 2019 |archive-url=https://web.archive.org/web/20190527231303/http://www.celartem.com/en/company-overview/ |url-status=dead }}</ref><ref>{{cite web |url=https://www.extensis.com/newsroom/celartem-technology-announces-merger-of-us-holdings/ |title=Celartem Technology Announces Merger of US Holdings – Extensis.com |access-date=2018-01-14 |archive-url=https://web.archive.org/web/20180115003244/https://www.extensis.com/newsroom/celartem-technology-announces-merger-of-us-holdings/ |archive-date=2018-01-15 |url-status=dead}}</ref><ref name="bloomberg.com" /><ref>{{cite web |url=https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=3046493 |title=Celartem Technology Inc.: Private Company Information – Bloomberg |website=Bloomberg.com}}</ref><ref>{{cite web |url=https://bigpicturemag.com/celartem-sells-extensis-and-lizardtech-plugins-and-xtensions-onone-software |title=Celartem Sells Extensis and LizardTech Plugins and XTensions to onOne Software – Big Picture – Wide Format Printing |website=bigpicture.net|date=28 July 2005 }}</ref> | ||
== | == Format adoption == | ||
Free creators, manipulators, converters, web browser plug-ins, and desktop viewers are available.<ref name=":0" /> | |||
Free creators, manipulators, converters, web browser plug-ins, and desktop viewers are available.<ref name=":0" /> | |||
In 2002, the DjVu file format was chosen by the [[Internet Archive]] as a format in which its ''[[Million Book Project]]'' provides scanned [[public-domain]] books online (along with [[TIFF]] and PDF).<ref>{{cite web |url=http://wiki.laptop.org/go/DJVU |title=Image file formats – OLPC |publisher=Wiki.laptop.org |access-date=2008-09-09}}</ref> In February 2016, the Internet Archive announced that DjVu would no longer be used for new uploads, among other reasons citing the format's declining use and the difficulty of maintaining their [[Java applet]] based viewer for the format.<ref name=ia2016/> | In 2002, the DjVu file format was chosen by the [[Internet Archive]] as a format in which its ''[[Million Book Project]]'' provides scanned [[public-domain]] books online (along with [[TIFF]] and PDF).<ref>{{cite web |url=http://wiki.laptop.org/go/DJVU |title=Image file formats – OLPC |publisher=Wiki.laptop.org |access-date=2008-09-09}}</ref> In February 2016, the Internet Archive announced that DjVu would no longer be used for new uploads, among other reasons citing the format's declining use and the difficulty of maintaining their [[Java applet]] based viewer for the format.<ref name=ia2016/> | ||
[[Wikimedia Commons]], a media repository used by [[Wikipedia]] among others, conditionally permits PDF and DjVu media files.<ref>[[c:Commons:Project scope#PDF and DjVu formats|Wikimedia Commons. Project scope: PDF and DjVu]].</ref> | [[Wikimedia Commons]], a media repository used by [[Wikipedia]] among others, conditionally permits PDF and DjVu media files.<ref>[[c:Commons:Project scope#PDF and DjVu formats|Wikimedia Commons. Project scope: PDF and DjVu]].</ref> | ||
==Format software== | |||
''any2djvu'' converts [[PostScript|.ps]] ''.ps[[.gz]]'' [[.pdf]] to .djvu (a DjVu file) via the Any2DjVu server, maintained by [[Léon Bottou]] and [[Yann LeCun]], hosted by the [[Courant Institute of Mathematical Sciences]] at [[New York University]], with hardware donated by Caminova, Inc.<ref>{{cite web |title=Welcome to the Any2DjVu Server |url=http://www.djvu.org/any2djvu/ |website=DjVu.org |access-date=8 July 2025 |language=en}}</ref><ref>{{cite web |title=help: What the Any2Djvu Server Does |url=http://any2djvu.djvu.org/help.php |website=any2djvu.djvu.org |access-date=8 July 2025}}</ref> | |||
Jakub Wilk's ''pdf2djvu'' creates DjVu files from PDF files for GNU/Linux OS<ref name="cnF/pdf2djvu">{{cite web |title=pdf2djvu |url=https://command-not-found.com/pdf2djvu |website=command-not-found.com |access-date=8 July 2025}}</ref> (archived),<ref>{{cite web |last1=Wilk |first1=Jakub |title=pdf2djvu |url=https://github.com/jwilk-archive/pdf2djvu |website=jwilk's archive |publisher=|via=[[github.com]] |access-date=8 July 2025 |date=15 April 2025}}</ref> including [[Ubuntu]], and [[Cygwin]] (''orphaned'').<ref name="cygwin/pdf2djvu">{{cite web |title=pdf2djvu |url=https://cygwin.com/packages/summary/pdf2djvu.html |website=Package Summary |publisher=[[Cygwin]] .com |access-date=8 July 2025}}</ref><ref name="oss4u/pdf-tricks">{{cite web |title=PDF Tricks for the Linux Command-Line |url=https://www.opensourceforu.com/2023/12/pdf-tricks-for-the-linux-command-line/ |website=Open Source For You |access-date=8 July 2025 |date=4 December 2023 |quote=Converting PDF to DjVu}}</ref> | |||
The selection of downloadable DjVu viewers is wider on [[Linux distributions]] than it is on Windows or macOS. Additionally, the format is rarely supported by proprietary scanning software. | |||
DjVu is supported by a number of multi-format document viewers and e-book reader software on Linux ([[Okular]], [[Evince]], Zathura), Windows ([[Okular]] and [[SumatraPDF]]) and Android (''Document Viewer'',<ref>{{cite web |title=Document Viewer |date=2022-04-04 |url=https://github.com/SufficientlySecure/document-viewer |publisher=Sufficiently Secure |access-date=2022-04-09}}</ref> [[FBReader]], EBookDroid,<ref name="code/ebookdroid">{{cite web |title=EBookDroid |url=https://code.google.com/archive/p/ebookdroid/ |website=[[Google Code]] Archive - code.google.com |access-date=8 July 2025 |archive-url=https://web.archive.org/web/20210830070908/https://code.google.com/archive/p/ebookdroid/ |archive-date=30 August 2021 |language=en |quote=a document viewer for Android.}}</ref> PocketBook). | |||
''DjVu.js Viewer'' is a project that develops a [[program library]], a [[web application]], and [[browser extension]]s for ''[[Firefox]]''<ref>[https://addons.mozilla.org/en-US/firefox/addon/djvu-js-viewer/ ''DjVu.js Viewer'' Firefox]</ref> and ''[[Google Chrome]]'',<ref>[https://chromewebstore.google.com/detail/djvujs-viewer/bpnedgjmphmmdgecmklcopblfcbhpefm ''DjVu.js Viewer'' Google Chrome]</ref> to view DjVu files.<ref>[https://djvu.js.org/ DjVu.js Viewer] ([https://github.com/RussCoder/djvujs github]): "It requires access to third-party websites only to render embedded documents (<embed> tag) and open links to .djvu files (on any website). The extensions, by and large, are a local copy of the ''DjVu.js Viewer'' which is available on ''djvu.js.org''".</ref> | |||
== See also == | == See also == | ||
| Line 184: | Line 191: | ||
{{commonscat|DjVu file format}} | {{commonscat|DjVu file format}} | ||
* [https://djvu.js.org/ DjVu.js Viewer] | * [https://www2.cuminas.jp/en/downloads DjVu software downloads] – Cuminas Corporation | ||
* | * [https://djvu.js.org/ DjVu.js Viewer] used in: [[Firefox]] and [[Google Chrome]] | ||
* | * {{cite web |last1=Wilk |first1=Jakub |title=pdf2djvu |url=http://jwilk.net/software/pdf2djv |website= jwilk.net <!-- |access-date=8 July 2025 --> |date=2010-04-01 |archive-url=https://web.archive.org/web/20100401114447/http://jwilk.net/software/pdf2djv |archive-date=1 April 2010 }} | ||
* {{cite web |last1=Wilk |first1=Jakub |title=pdf2djvu |url=https://github.com/jwilk-archive/pdf2djvu |website=jwilk-archive |via=[[github.com]] <!-- |access-date=8 July 2025 --> |date=15 April 2025}} | |||
{{Compression formats}} | {{Compression formats}} | ||
Latest revision as of 10:17, 14 December 2025
Template:Short description Script error: No such module "Infobox".Template:Template otherScript error: No such module "Check for unknown parameters".
DjVuTemplate:Efn is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web.
DjVu has been promoted as providing smaller files than PDF for most scanned documents.[1] The DjVu developers report that color magazine pages compress to 40–70 kB, black-and-white technical papers compress to 15–40 kB, and ancient manuscripts compress to around 100 kB; a satisfactory JPEG image typically requires 500 kB.[2] Like PDF, DjVu can contain an OCR text layer, making it easy to perform copy and paste and text search operations.
History
The DjVu technology was originally developed, from 1996 to 2001,[2] by Yann LeCun, Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, and Yoshua Bengio at AT&T Labs in Red Bank, New Jersey.[3]
Prior to the standardization of PDF in 2008,[4][5] DjVu was considered superior because it is an open file format,Script error: No such module "Unsubst". in contrast to the proprietary nature of PDF at the time. The declared higher compression ratio (and thus smaller file size) and the claimed ease of converting large volumes of text into DjVu format were other arguments for DjVu's superiority over PDF in 2004. Independent technologist Brewster Kahle in a 2004 talk on IT Conversations discussed the benefits of allowing easier access to DjVu files.[6][7]
The DjVu library distributed as part of the open-source package DjVuLibre has become the reference implementation for the DjVu format. DjVuLibre has been maintained and updated by the original developers of DjVu since 2002.[8]
The DjVu file format specification has gone through a number of revisions, the most recent being from 2005.
| Version | Release date | Notes | |
|---|---|---|---|
| Template:VersionScript error: No such module "Unsubst". | 1996–1999 | Developmental versions by AT&T labs preceding the sale of the format to LizardTech. | |
| Template:Version[9] | April 1999 | DjVu version 3. DjVu changed from a single-page format to a multipage format. | |
| Template:Version[9] | September 1999 | Indirect storage format replaced. The searchable text layer was added. | |
| Template:Version[9] | April 2001 | Page orientation, color JB2 | |
| Template:Version[9] | July 2002 | CID chunk | |
| Template:Version[9] | February 2003 | LTAnno chunk | |
| Template:Version[9] | May 2003 | NAVM chunk. Support for DjVu bookmarks (outlines) was added. Changes made by Versions 23 and 24 were made obsolete. | |
| Template:Version[9] | April 2005 | Text/line annotations | |
| Template:Version | |||
The primary usage of the DjVu format has been the electronic distribution of documents with a quality comparable to that of printed documents. As that niche is also the primary usage for PDF, it was inevitable that the two formats would become competitors. It should however be observed that the two formats approach the problem of delivering high resolution documents in very different ways: PDF primarily encodes graphics and text as vectorised data, whereas DjVu primarily encodes them as pixmap images. This means PDF places the burden of rendering the document on the reader, whereas DjVu places that burden on the creator.
During a number of years, significantly overlapping with the period when DjVu was being developed, there were no PDF viewers for free operating systems—a particular stumbling block was the rendering of vectorised fonts, which are essential for combining small file size with high resolution in PDF. Since displaying DjVu was a simpler problem for which free software was available, there were suggestions that the free software movement should employ DjVu instead of PDF for distributing documentation; rendering for creating DjVu is in principle not much different from rendering for a device-specific printer driver, and DjVu can as a last resort be generated from scans of paper media. However, when FreeType 2.0 in 2000 began to provide rendering of all major vectorised font formats, that specific advantage of DjVu began to erode.
In the 2000s, with the growth of the World Wide Web and before widespread adoption of broadband, DjVu was often adopted by digital libraries as their format of choice, thanks to its integration with software like Greenstone[10] and the Internet Archive,[11] browser plugins which allowed advanced online browsing, smaller file size for comparable quality of book scans and other image-heavy documents[12] and support for embedding[13] and searching full text from OCR.[14][15] Some features such as the thumbnail previews were later integrated in the Internet Archive's BookReader[16] and DjVu browsing was deprecated in its favour as around 2015 some major browsers stopped supporting NPAPI and DjVu plugins with them.[17]
Design
The DjVu file format is based on the Interchange File Format and is composed of hierarchically organized chunks. The IFF structure is preceded by a 4-byte AT&T magic number. Following is a single FORM chunk with a secondary identifier of either DJVU or DJVM for a single-page or a multi-page document, respectively.
All the chunks can be contained in a single file in the case of the so called bundled documents, or can be contained in several files: one file for every page plus some files with shared chunks.
| Chunk identifier | Contained by | Description |
|---|---|---|
| FORM:DJVU | FORM:DJVM | Describes a single page. Can either be at the root of a document and be a single-page document or referred to from a DIRM chunk.
|
| FORM:DJVM | — | Describes a multi-page document. Is the document's root chunk. |
| FORM:DJVI | FORM:DJVM | Contains data shared by multiple pages. |
| FORM:THUM | FORM:DJVM | Contains thumbnails. |
| INFO | FORM:DJVU | Must be the first chunk. Describes the page width, height, format version, resolution, gamma, and rotation. |
| DIRM | FORM:DJVM | Must be the first chunk. References other FORM chunks. These chunks can either follow this chunk inside the FORM:DJVM chunk or be contained in external files. These types of documents are referred to as bundled or indirect, respectively.
|
| NAVM | FORM:DJVM | If present, must immediately follow the DIRM chunk. Contains a BZZ-compressed outline of the document.
|
| ANTa, ANTz | FORM:DJVI or FORM:DJVU | Annotations. |
| TXTa, TXTz | FORM:DJVU | Unicode text and layout information. |
| INCL | FORM:DJVU | The ID of an included FORM::DJVI chunk.
|
| Sjbz | FORM:DJVU | BZZ compressed JB2 bitonal data used to store mask. |
| Djbz | FORM:DJVI or FORM:DJVU | Shared shape table. |
| WMRM | ? | JB2 data required to remove a watermark. |
| FORM:DJVU | Obsolete chunk with unknown content. |
DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100 dpi); the mask image is a high-resolution bilevel image (e.g., 300 dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44.[2] The mask image is compressed using a method called JB2 (similar to JBIG2). The JB2 encoding method identifies nearly identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs.
Optionally, these shapes may be mapped to UTF-8 codes (either by hand or potentially by a text recognition system) and stored in the DjVu file. If this mapping exists, it is possible to select and copy text.
Since JB2 (also called DjVuBitonal) is a variation on JBIG2, working on the same principles,[18] both compression methods have the same problems when performing lossy compression. In 2013 it emerged that Xerox photocopiers and scanners had been substituting digits for similar looking ones, for example replacing a 6 with an 8.[19] A DjVu document has been spotted in the wild with character substitutions, such as an n with bleeding serifs turning into a u and an o with a spot inside turning into an e.[20] Whether lossy compression has occurred is not stored in the file.[9] Thus the DjView viewing application can't warn the user that glyph substitutions might have occurred, neither when opening a lossy compressed file, nor in the Information or Metadata dialogue boxes.[21]
Licensing
DjVu is an open file format with patents.[1] The file format specification is published, as well as source code for the reference library.[1] The original authors distribute an open-source implementation named "DjVuLibre" under the GNU General Public License and a patent grant.[22] The rights to the commercial development of the encoding software have been transferred to different companies over the years, including AT&T Corporation, LizardTech,[23] Celartem[24] and ePapyrus Solutions K.K. (formerly Cuminas[25] before joining ePapyrus Solutions, Inc.[26]).[27] Patents typically have an expiry term of about 20 years.
Celartem acquired LizardTech and Extensis.[28][29][24][30][31]
Format adoption
Free creators, manipulators, converters, web browser plug-ins, and desktop viewers are available.[32]
In 2002, the DjVu file format was chosen by the Internet Archive as a format in which its Million Book Project provides scanned public-domain books online (along with TIFF and PDF).[33] In February 2016, the Internet Archive announced that DjVu would no longer be used for new uploads, among other reasons citing the format's declining use and the difficulty of maintaining their Java applet based viewer for the format.[17]
Wikimedia Commons, a media repository used by Wikipedia among others, conditionally permits PDF and DjVu media files.[34]
Format software
any2djvu converts .ps .ps.gz .pdf to .djvu (a DjVu file) via the Any2DjVu server, maintained by Léon Bottou and Yann LeCun, hosted by the Courant Institute of Mathematical Sciences at New York University, with hardware donated by Caminova, Inc.[35][36]
Jakub Wilk's pdf2djvu creates DjVu files from PDF files for GNU/Linux OS[37] (archived),[38] including Ubuntu, and Cygwin (orphaned).[39][40]
The selection of downloadable DjVu viewers is wider on Linux distributions than it is on Windows or macOS. Additionally, the format is rarely supported by proprietary scanning software.
DjVu is supported by a number of multi-format document viewers and e-book reader software on Linux (Okular, Evince, Zathura), Windows (Okular and SumatraPDF) and Android (Document Viewer,[41] FBReader, EBookDroid,[42] PocketBook).
DjVu.js Viewer is a project that develops a program library, a web application, and browser extensions for Firefox[43] and Google Chrome,[44] to view DjVu files.[45]
See also
Notes
References
<templatestyles src="Reflist/styles.css" />
- ↑ a b c Script error: No such module "citation/CS1".
- ↑ a b c Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ a b c d e f g h Cite error: Script error: No such module "Namespace detect".Script error: No such module "Namespace detect".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ a b Script error: No such module "citation/CS1".
- ↑ Artem Mikheev, Luc Vincent, Mike Hawrylycz & Léon Bottou: Electronic Document Publishing Using DjVu
- ↑ See the JBIG2 article for more details and references.
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ a b Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Cite error: Script error: No such module "Namespace detect".Script error: No such module "Namespace detect".
- ↑ Script error: No such module "citation/CS1".
- ↑ Wikimedia Commons. Project scope: PDF and DjVu.
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ DjVu.js Viewer Firefox
- ↑ DjVu.js Viewer Google Chrome
- ↑ DjVu.js Viewer (github): "It requires access to third-party websites only to render embedded documents (<embed> tag) and open links to .djvu files (on any website). The extensions, by and large, are a local copy of the DjVu.js Viewer which is available on djvu.js.org".
Script error: No such module "Check for unknown parameters".
External links
- DjVu software downloads – Cuminas Corporation
- DjVu.js Viewer used in: Firefox and Google Chrome
- Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1".
Script error: No such module "Navbox". Script error: No such module "Navbox". Template:Graphics file formats