Nepomuk/zh-cn: Difference between revisions

From KDE Wiki Sandbox
mNo edit summary
(Updating to match new version of source page)
 
(30 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<languages />  
<languages />  


=Nepomuk=
=Semantic Search=


这个页面的目的不是详细解释Nepomuk技术及其每个细节,而是作简短的概述,举一些例子,分享其背后的设想和给予指向网上相关资讯的链接。
The aim of this page is not to fully explain KDE's Semantic Search technology and every detail, but to give a short overview, some examples, share the vision behind it and link to relevant information on the web.
 
== Baloo is the next generation of semantic search ==
 
From KDE Applications 4.13 onwards, the '[https://community.kde.org/Baloo Baloo]' file indexing and file search framework replaces Nepomuk. Read [http://dot.kde.org/2014/02/24/kdes-next-generation-semantic-search details on the changes for Applications 4.13 here]. Semantic Search no longer uses a single, big database, but separate, specialized databases for each type of data. The new search databases are in <tt>$HOME/.local/share/baloo</tt>. If you upgraded to KDE Applications 4.13 from an earlier KDE release, you can delete <tt>$KDEHOME/share/apps/nepomuk</tt>.


==简短说明==
==简短说明==


[[Special:myLanguage/Glossary#Nepomuk|术语]]提到的,Nepomuk 有关数据的分类,组织和表达。它并非是一个应用程序,而是可以被开发人员应用在程序中的组件。
As the [[Special:myLanguage/Glossary#Nepomuk|Glossary]] mentions, Semantic Search is about classification, organisation and presentation of data. It is not an application, but a component which can be used by developers within applications.
 
=== Try out in Dolphin ===
For example, the [[Special:myLanguage/Dolphin|Dolphin]] file manager makes use of Search. In KDE Applications versions prior to 4.13 Semantic Search must be enabled from <menuchoice>System settings -> Desktop search</menuchoice>. The information sidebar of Dolphin (<menuchoice>Control -> Panels -> Information</menuchoice>, or press <keycap>F11</keycap>) presents information extracted by Search about the selected file, and also allows you to assign tags, ratings and comments to files. This information is then stored and indexed by Search. You can then search for metadata using the navigation bar in Dolphin. Click to <menuchoice>Find</menuchoice>, or press <keycap>Ctrl+F</keycap>, and search for file names or file contents.
 
<!-- info about nepomuksearch:/ deleted per its developer: "Actually nepomuksearch:/ is an internal thing and should not be entered by the user." its developer -->


<span class="mw-translate-fuzzy">
=== 试用 ===
举例来说,[[Dolphin_(zh_CN)|Dolphin]]采用了Nepomuk。对于后面例举的必须保证 系统设置 -> 桌面搜索 的Nepomuk和Strigi都已经开启。Dolphin的信息侧面板允许你给文件打上标签,评分和评论。这些信息随后存储在Nepomuk并且被Strigi编入索引。然后你可以在Dolphin里用导航栏搜索元数据(metadata)。输入"nepomuksearch:/",随后是搜索词。
</span>


==功能==
==功能==


Nepomuk 提供数''功能给应用程序。第一个也是其中最简单的是手动标记(manual tagging),评分(rating)和注释(commenting),如使用在 Dolphin中。这有助于你更快的找到你的文件,但这种做法太耗费精力。
KDE's Semantic Search offers several 'layers' of functionality to applications. The first and most simple of those is manual tagging, rating and commenting, as used in Dolphin. This helps you to find your files faster, but is also a lot of work.
 
To make finding files containing text easier, Search offers a second functionality: indexing the text of files. You can find files by entering some words which you know are in there, or just (part of) their title.
 
The third layer is a very complex one, and the reason why the underlying technology, Nepomuk, was conceived as a research project of several companies and universities in the European union. This is where you will find difficult words like 'semantic desktop' and 'ontologies'. Basically, it is about context and relationships.


<span class="mw-translate-fuzzy">
=== Indexing files ===
为了使得搜索包含文本的文件更容易,Nepomuk 提供了第二个功能:索引(indexing)文件中的文本。它使用一种被称作 [[Glossary (zh CN) #Strigi|Strigi]] 的技术来实现。现在你也能通过输入你所已知的其中某些词语,或仅仅(部分)它们的标题来搜索到文件。
</span>


第三层非常复杂,是 Nepomuk 被视为欧盟(European union)多个公司和大学研究项目的原因。这部分你会遇到很难理解的词组像是'语义桌面(semantic desktop)''本体论(ontologies)'。基本上,它涉及到语境(context)和关系(relationships)。
Search does not index every file on the hard drive. Its default configuration in most Linux distributions excludes some common patterns for backup files and configuration directories. You can change this in <menuchoice>System Settings -> Desktop Search</menuchoice>. Add folders to be excluded. If you want to turn off indexing of files entirely, just add your home folder there.
 
In '''System Settings''' you can also control whether Search indexes files on removable media such as USB drives and CD-ROMs. In KDE Applications 4.13 this is not available, removable media are not indexed. Future versions are planned to reintroduce this functionality.


==例子==
==例子==


让我试着用两个例子解释 Nepomuk 提供了什么。
Let me try to explain what Semantic Search offers using two examples. These features are not available fully yet - the base is there but application developers need to integrate this in their applications.


===关系(Relationship)===
===关系(Relationship)===
Line 32: Line 41:
假设你2周前从你的某个好友手上收到一张照片。你随后保存照片到电脑的某个地方。现在你如何找到那个文件哪?如果你不记得保存的位置,你就杯具了(人生是个茶几,你是个杯具)。
假设你2周前从你的某个好友手上收到一张照片。你随后保存照片到电脑的某个地方。现在你如何找到那个文件哪?如果你不记得保存的位置,你就杯具了(人生是个茶几,你是个杯具)。


现在 Nepomuk 旨在帮助你。你知道这个文件是你朋友发给你的,但你电脑不知道。然而,Nepomuk 能够记得这个关系。搜索你朋友的名字,随后便会出现照片哦!
Now Semantic Search aims to help you. You know this file came from that friend of yours, your computer does not know. Search, however, can remember this relationship. Searching on the name of your friend will therefore pop up the photo!


另一个潜在的关系是在你复制出来文本的网页和你粘贴文本进去的文档之间,抑或显示同一辆车的两幅图片之间。这样的关系有时能从文件本身上获得(你能够分辨照片,看出谁或什么东西在那上面)或由关联的程序(上面 E-mail 的例子)提供。Nepomuk 的这部分还是加紧开发中,需要整合进应用程序,所以你可以期望这功能花费更多年真正实现。切~~( ﹁ ﹁ )
Another potential relationship is between a web page you copied text from and the document you pasted it into, or two images showing the same car. Such relations can sometimes be extracted from the files themselves (you could analyze photos and see who or what is on there) or supplied by the applications involved (as in the above email example). This part of Search is still under heavy development, and needs integration in applications, so you can expect it to take a few more years to really shine.


总之,Nepomuk 的这部分是有关使得搜索智能。试想下 Google 是如何智能化你的搜索:当你搜索旅店和城市名,它在网站搜索结果之上显示 google 地图显示你提到的那个城市中的各家旅店!它甚至可能推荐一个更适合的名称以免你犯了拼写错误。Google 对网站之间的关系(链接)使用复杂计算,试着将最相关的信息放置在搜索结果的顶部。Nepomuk 会能够提供那样智能的搜索结果,并用关联信息根据相关性调整搜索结果。
All in all, this part of Semantic Search is about making search smart. Think about how Google tries to be smart with your searches: when you search for a hotel and a city name, it shows above the website results a google map showing hotels in the city you mentioned! It might even suggest a better name in case you made a spelling mistake. Google also tries to put the most relevant information on top of the list of results, using complex calculations on relationships (links) between websites. Semantic Search will be able to offer such smart results and order them on relevancy using relationship information.


===语境(Context)===
===语境(Context)===


这些关系(relationships)不但能帮你搜索文件,也会影响到应用程序及它们提供的信息。注意这种使用 Nepomuk 的方式说是设想更恰当而非现实!当中很多组件已经做好,但总体上看都还没整合进应用程序和桌面。
These relationships can not only help you while searching for files, but also have an influence on applications and what information they present. Note that this way of using Search is still more a vision than reality! Many of the components are in place, but it is not yet integrated in applications and the desktop as a whole.


这里有个例子讲把语境认知(context awareness)带到桌面上助你工作更有效率。
这里有个例子讲把语境认知(context awareness)带到桌面上助你工作更有效率。
Line 48: Line 57:
如果能更好的组织这一切会非常棒,对吗?
如果能更好的组织这一切会非常棒,对吗?


<span class="mw-translate-fuzzy">
Enter '[[Special:myLanguage/Glossary#Activities|activities]]'. These have been introduced in [[Special:myLanguage/Plasma|Plasma]], and currently offer different 'desktops'. They are a bit like virtual desktops, except that the desktop itself changes, not the set of applications. Different widgets, background, things like that. Of course, since Plasma 4.3, each virtual desktop can have its own activity, bringing the two in sync.
进入'[[Glossary (zh CN)#Activities|活动]]'。之前有在 [[Plasma/Introduction_to_Plasma_(zh_CN)|Plasma]] 中介绍过,当前提供了另类的'桌面'。它们有点像虚拟桌面,但改变的是桌面本身,不是程序集。不同的部件,壁纸之类的东西。当然,自 KDE SC 4.3 后,每个虚拟桌面能够拥有自个的活动,虚拟桌面和活动是同步的。
</span>


如果程序和桌面认识活动,你可以根据经常工作的任务创建活动。所以如果你经常不得不修改带报价的电子表格,你创建这样一个活动:摆放一个「文件夹视图」(或多个)部件到桌面上,添加一个计算器和一个 todo 部件来记录还需要修改的东西。可能需要一个「邮件文件夹」部件来显示有关报价表问题的邮件!
如果程序和桌面认识活动,你可以根据经常工作的任务创建活动。所以如果你经常不得不修改带报价的电子表格,你创建这样一个活动:摆放一个「文件夹视图」(或多个)部件到桌面上,添加一个计算器和一个 todo 部件来记录还需要修改的东西。可能需要一个「邮件文件夹」部件来显示有关报价表问题的邮件!


<span class="mw-translate-fuzzy">
一旦有人问报价问题,你就切换到这个活动。打开表格程序。表格程序认识你的活动,它的最近打开列表显示的是报价表格,不是你在另外一个活动工作的存货清单![[Special:myLanguage/Kopete|Kopete]],这个聊天程序显示着某个知道有关价格的同事,因为她是你经常在当前这个活动聊天的对象。
一旦有人问报价问题,你就切换到这个活动。打开表格程序。表格程序认识你的活动,它的最近打开列表显示的是报价表格,不是你在另外一个活动工作的存货清单![[Kopete (zh CN)|Kopete]],这个聊天程序显示着某个知道有关价格的同事,因为她是你经常在当前这个活动聊天的对象。
</span>


等你完成后,你回到另外一个活动,所有程序再一次调整它们的行为来适应你当前的工作。
等你完成后,你回到另外一个活动,所有程序再一次调整它们的行为来适应你当前的工作。
Line 64: Line 69:
当然,上面所述很大程度与电脑后头工作的人是在办公室还是在家有关。游戏玩家或临时用户可能不太会用到这些活动。
当然,上面所述很大程度与电脑后头工作的人是在办公室还是在家有关。游戏玩家或临时用户可能不太会用到这些活动。


注意上面描述的情节仍旧离现实很远。基础的东西在 KDE 中已经有了,但其他部分缺很大。
The scenario described above is already partially implemented in the Activities but much work is still left.
 
==Frequently Asked Questions==
 
The following is taken from a [http://forum.kde.org/viewtopic.php?f=154&t=97098&p=204592 KDE forums] post. Please feel free to add/remove/modify details if you have the time!
 
;What is the Nepomuk Semantic Desktop, and the Strigi Desktop File Indexer?
 
: Nepomuk and Strigi are technologies part of what delivers the abilities of the Semantic Search in KDE. Both are not used directly in the latest generation of KDE's Semantic Search ([http://dot.kde.org/2014/02/24/kdes-next-generation-semantic-search details]), however their successors share much of their code and concepts. Semantic Search provides a way to organize, annotate and build relationships among the data (not only file name and content, but for example which applications used a certain file, or how it is tagged). A number of KDE applications and workspaces use this basic infrastructure to deliver features such as email tagging ([[Special:myLanguage/KMail|KMail]]) or activity setup (Plasma).
 
:The file indexing allows applications such as [[Special:myLanguage/Dolphin|Dolphin]] to search for files based on content, name, or other meta-data (e.g. tags) associated to indexed files. Such an indexer can also index non-text files, such as PDFs, by accessing the meta-data contained in these files (author, publication information, etc.). Some KDE components ship additional "analyzers" for more file types.
 
; Why do we need both Akonadi and Semantic Search?  Aren't they doing the same thing?
 
: In short, Akonadi provides a cache of PIM data like calendar items, contacts and email, which is used by applications like KMail and Korganizer but also the calendar build in Plasma. Semantic Search plugs in Akonadi to provide search functionality. How Baloo offers search is actually up to the application. In case of KDE PIM, Xapian is used to provide indexing and search.
 
;How can I disable the semantic desktop?
 
: File indexing can be disabled by adding the users' home folder to the <menuchoice>System Settings -> Desktop Search -> Do not search in these locations list</menuchoice>. The other functionality is part of the applications that use it and thus can't be disabled without crippling these applications. For example, to not have any search in KMail you'll have to simply remove KMail...
 
In versions of the KDE Applications before 4.13, Semantic Search would have components running separate from applications. This functionality could be disabled by unchecking <menuchoice>Enable Nepomuk File Indexer</menuchoice> in the [[Special:myLanguage/System_Settings/Search_Desktop|Desktop Search]] section of [[Special:myLanguage/System Settings|System Settings]]. In case you want to turn off all semantic features, uncheck <menuchoice>Enable Nepomuk Semantic Desktop</menuchoice>. Notice that this will turn off search in [[Special:myLanguage/Dolphin|Dolphin]] as well.
 
:Notice that with the latter option some programs who use Semantic Search for meta-data will offer reduced functionality: for example [[Special:myLanguage/KMail|KMail]] will not be able to tag mail, or Plasma activities will not offer additional features such as icons, or program data information.
 
;Baloo/Semantic Search is eating 100% CPU! What do I do?
 
:Just wait. Certain files are very hard or even impossible to Index. At the moment, this includes for example text files of over 50 megabyte. When Search finds these, it will try for a fixed time. When it fails, it will try to find out what file is broken and disable indexing it in the future. As it indexes files in batches of about 40, it has to find the problematic file by indexing that bunch in parts: first half/second half, index problematic half in pieces again, until the file is found. This can take up to 30 minutes of heavy cpu usage. Unfortunately, while Baloo will not start to index a new batch of 40 files while on battery power, it continues to determine the broken file while on battery. This behaviour has been fixed in in KDE Applications 4.13.1 (it will stop indexing immediately when the power cord is unplugged) and the time the search for each file can take has been reduced to about 10 minutes. The Semantic Search team is working on improving the indexing tools to handle more difficult files.
 
;Why do I have nepomukservicestub processes even though I've disabled Nepomuk?
 
:It may be a bug. Please file a [http://bugs.kde.org bug report] with a complete description of your problem and the steps to trigger it.
 
;File indexing of PDF/some other file types doesn't work.
 
:PDF indexing is a known issue and it's being tracked in {{bug|231936}}. If you have issues with other files, open a bug, preferably adding a sample file that shows the problem.
 
;The program nepomukservicestub crashes at startup.
 
:A large number of fixes for crashes has been fixed for the 4.7.2 release of the KDE Workspaces and Applications. If you encounter more, please file bugs report with detailed instructions on how to reproduce the problem, as sometimes the developers are unable to trigger them in their test setups.
 
;The virtuoso-t process hangs at 100% CPU.
 
:Virtuoso-t is a key component of the old Semantic Search infrastructure and in some occasions the commands sent by the other components end up taking too much time (hence showing the effect of 100% CPU).
 
Virtuoso is no longer used by Semantic Search starting the Applications 4.13 release.
 
;Sometimes Nepomuk consumes too much RAM.
 
:Many of these problems have been fixed, in other cases however the developers are unable to reproduce the issues correctly. In this case, providing examples and test cases to [http://bugs.kde.org/ bug reports] increase the chances to get these bugs fixed.
 
;Search accesses the disk too much on startup.:
 
:A throttling mechanism implemented in the file indexer, versions after KDE SC 4.8 should no longer have this issue.
 
;My Search database has been corrupted. How do I clean it?
 
:In the extreme case your database is really corrupted and all other attempts have failed, you can delete the <tt>$KDEHOME/share/apps/nepomuk</tt> directory (where <tt>$KDEHOME</tt> is usually <tt>.kde</tt> or <tt>.kde4</tt> in your home directory) while Nepomuk is not running. The database will be cleared, but you will also lose existing information such as tags, ratings and comments.
 
== Advanced troubleshooting ==


==共享和隐私==  
==共享和隐私==  
Line 72: Line 135:
这个问题当然在考虑之中,是 Nepomuk 研究的一个重要课题。由于这些隐私顾虑,外加技术挑战,所以暂时 Nepomuk 内容是私有的。放心,Nepomuk 团队尽其可能尊重你的隐私。
这个问题当然在考虑之中,是 Nepomuk 研究的一个重要课题。由于这些隐私顾虑,外加技术挑战,所以暂时 Nepomuk 内容是私有的。放心,Nepomuk 团队尽其可能尊重你的隐私。


:''更多信息'':
==更多信息==
::[http://en.wikipedia.org/wiki/Semantic_desktop Wikipedia - Semantic Desktop]  
 
::[http://en.wikipedia.org/wiki/NEPOMUK_(framework)  Wikipedia - NEPOMUK Framework]  
The new Search technology (post KDE Applications 4.13):
::[http://nepomuk.semanticdesktop.org/xwiki/bin/view/Main1/ NEPOMUK website]
* [http://dot.kde.org/2014/02/24/kdes-next-generation-semantic-search user information article on the dot]
::[http://nepomuk.kde.org/discover/user NEPOMUK KDE site]
* [http://community.kde.org/Baloo Developer information on community.kde.org]
::[http://dot.kde.org/2009/12/10/exploring-new-nepomuk-features-mandriva-linux-2010 article explaining Nepomuk on the DOT]
* [http://en.wikipedia.org/wiki/Semantic_desktop Wikipedia - Semantic Desktop]  
 
The old Search technology:
* [http://techbase.kde.org/Projects/Nepomuk Nepomuk pages for developers on KDE TechBase]
* [http://nepomuk.kde.org/discover/user NEPOMUK KDE site]
* [http://en.wikipedia.org/wiki/NEPOMUK_(framework)  Wikipedia - NEPOMUK Framework]  
* [http://nepomuk.semanticdesktop.org/nepomuk/ NEPOMUK website]
* [http://dot.kde.org/2009/12/10/exploring-new-nepomuk-features-mandriva-linux-2010 article explaining Nepomuk on the DOT]
* [http://kdenepomukmanual.wordpress.com Getting started user manual]


[[Category:系统_(zh_CN)]]
[[Category:系统/zh-cn]]

Latest revision as of 05:10, 9 May 2018

Semantic Search

The aim of this page is not to fully explain KDE's Semantic Search technology and every detail, but to give a short overview, some examples, share the vision behind it and link to relevant information on the web.

Baloo is the next generation of semantic search

From KDE Applications 4.13 onwards, the 'Baloo' file indexing and file search framework replaces Nepomuk. Read details on the changes for Applications 4.13 here. Semantic Search no longer uses a single, big database, but separate, specialized databases for each type of data. The new search databases are in $HOME/.local/share/baloo. If you upgraded to KDE Applications 4.13 from an earlier KDE release, you can delete $KDEHOME/share/apps/nepomuk.

简短说明

As the Glossary mentions, Semantic Search is about classification, organisation and presentation of data. It is not an application, but a component which can be used by developers within applications.

Try out in Dolphin

For example, the Dolphin file manager makes use of Search. In KDE Applications versions prior to 4.13 Semantic Search must be enabled from System settings -> Desktop search. The information sidebar of Dolphin (Control -> Panels -> Information, or press F11) presents information extracted by Search about the selected file, and also allows you to assign tags, ratings and comments to files. This information is then stored and indexed by Search. You can then search for metadata using the navigation bar in Dolphin. Click to Find, or press Ctrl+F, and search for file names or file contents.


功能

KDE's Semantic Search offers several 'layers' of functionality to applications. The first and most simple of those is manual tagging, rating and commenting, as used in Dolphin. This helps you to find your files faster, but is also a lot of work.

To make finding files containing text easier, Search offers a second functionality: indexing the text of files. You can find files by entering some words which you know are in there, or just (part of) their title.

The third layer is a very complex one, and the reason why the underlying technology, Nepomuk, was conceived as a research project of several companies and universities in the European union. This is where you will find difficult words like 'semantic desktop' and 'ontologies'. Basically, it is about context and relationships.

Indexing files

Search does not index every file on the hard drive. Its default configuration in most Linux distributions excludes some common patterns for backup files and configuration directories. You can change this in System Settings -> Desktop Search. Add folders to be excluded. If you want to turn off indexing of files entirely, just add your home folder there.

In System Settings you can also control whether Search indexes files on removable media such as USB drives and CD-ROMs. In KDE Applications 4.13 this is not available, removable media are not indexed. Future versions are planned to reintroduce this functionality.

例子

Let me try to explain what Semantic Search offers using two examples. These features are not available fully yet - the base is there but application developers need to integrate this in their applications.

关系(Relationship)

假设你2周前从你的某个好友手上收到一张照片。你随后保存照片到电脑的某个地方。现在你如何找到那个文件哪?如果你不记得保存的位置,你就杯具了(人生是个茶几,你是个杯具)。

Now Semantic Search aims to help you. You know this file came from that friend of yours, your computer does not know. Search, however, can remember this relationship. Searching on the name of your friend will therefore pop up the photo!

Another potential relationship is between a web page you copied text from and the document you pasted it into, or two images showing the same car. Such relations can sometimes be extracted from the files themselves (you could analyze photos and see who or what is on there) or supplied by the applications involved (as in the above email example). This part of Search is still under heavy development, and needs integration in applications, so you can expect it to take a few more years to really shine.

All in all, this part of Semantic Search is about making search smart. Think about how Google tries to be smart with your searches: when you search for a hotel and a city name, it shows above the website results a google map showing hotels in the city you mentioned! It might even suggest a better name in case you made a spelling mistake. Google also tries to put the most relevant information on top of the list of results, using complex calculations on relationships (links) between websites. Semantic Search will be able to offer such smart results and order them on relevancy using relationship information.

语境(Context)

These relationships can not only help you while searching for files, but also have an influence on applications and what information they present. Note that this way of using Search is still more a vision than reality! Many of the components are in place, but it is not yet integrated in applications and the desktop as a whole.

这里有个例子讲把语境认知(context awareness)带到桌面上助你工作更有效率。

比如说你正在整理会议上摘的笔记。这时手机响了,某人问你要带报价的电子表格,还要你根据客户要求进行修改。再多来几个打扰后,你会发现整个桌面全是文件和窗口...

如果能更好的组织这一切会非常棒,对吗?

Enter 'activities'. These have been introduced in Plasma, and currently offer different 'desktops'. They are a bit like virtual desktops, except that the desktop itself changes, not the set of applications. Different widgets, background, things like that. Of course, since Plasma 4.3, each virtual desktop can have its own activity, bringing the two in sync.

如果程序和桌面认识活动,你可以根据经常工作的任务创建活动。所以如果你经常不得不修改带报价的电子表格,你创建这样一个活动:摆放一个「文件夹视图」(或多个)部件到桌面上,添加一个计算器和一个 todo 部件来记录还需要修改的东西。可能需要一个「邮件文件夹」部件来显示有关报价表问题的邮件!

一旦有人问报价问题,你就切换到这个活动。打开表格程序。表格程序认识你的活动,它的最近打开列表显示的是报价表格,不是你在另外一个活动工作的存货清单!Kopete,这个聊天程序显示着某个知道有关价格的同事,因为她是你经常在当前这个活动聊天的对象。

等你完成后,你回到另外一个活动,所有程序再一次调整它们的行为来适应你当前的工作。

这样一个基于活动的工作流程的收益会远远超越你最初的期望。它不但能帮你找到文件和联系人,还有助于切换任务本身。人脑不擅长多任务-人类切换任务后需要花费几分钟来达到正常速度。改变'环境'能大大加快反应,即使只是屏幕上的。把它跟你假日收拾包裹时的情绪对比下!

当然,上面所述很大程度与电脑后头工作的人是在办公室还是在家有关。游戏玩家或临时用户可能不太会用到这些活动。

The scenario described above is already partially implemented in the Activities but much work is still left.

Frequently Asked Questions

The following is taken from a KDE forums post. Please feel free to add/remove/modify details if you have the time!

What is the Nepomuk Semantic Desktop, and the Strigi Desktop File Indexer?
Nepomuk and Strigi are technologies part of what delivers the abilities of the Semantic Search in KDE. Both are not used directly in the latest generation of KDE's Semantic Search (details), however their successors share much of their code and concepts. Semantic Search provides a way to organize, annotate and build relationships among the data (not only file name and content, but for example which applications used a certain file, or how it is tagged). A number of KDE applications and workspaces use this basic infrastructure to deliver features such as email tagging (KMail) or activity setup (Plasma).
The file indexing allows applications such as Dolphin to search for files based on content, name, or other meta-data (e.g. tags) associated to indexed files. Such an indexer can also index non-text files, such as PDFs, by accessing the meta-data contained in these files (author, publication information, etc.). Some KDE components ship additional "analyzers" for more file types.
Why do we need both Akonadi and Semantic Search? Aren't they doing the same thing?
In short, Akonadi provides a cache of PIM data like calendar items, contacts and email, which is used by applications like KMail and Korganizer but also the calendar build in Plasma. Semantic Search plugs in Akonadi to provide search functionality. How Baloo offers search is actually up to the application. In case of KDE PIM, Xapian is used to provide indexing and search.
How can I disable the semantic desktop?
File indexing can be disabled by adding the users' home folder to the System Settings -> Desktop Search -> Do not search in these locations list. The other functionality is part of the applications that use it and thus can't be disabled without crippling these applications. For example, to not have any search in KMail you'll have to simply remove KMail...

In versions of the KDE Applications before 4.13, Semantic Search would have components running separate from applications. This functionality could be disabled by unchecking Enable Nepomuk File Indexer in the Desktop Search section of System Settings. In case you want to turn off all semantic features, uncheck Enable Nepomuk Semantic Desktop. Notice that this will turn off search in Dolphin as well.

Notice that with the latter option some programs who use Semantic Search for meta-data will offer reduced functionality: for example KMail will not be able to tag mail, or Plasma activities will not offer additional features such as icons, or program data information.
Baloo/Semantic Search is eating 100% CPU! What do I do?
Just wait. Certain files are very hard or even impossible to Index. At the moment, this includes for example text files of over 50 megabyte. When Search finds these, it will try for a fixed time. When it fails, it will try to find out what file is broken and disable indexing it in the future. As it indexes files in batches of about 40, it has to find the problematic file by indexing that bunch in parts: first half/second half, index problematic half in pieces again, until the file is found. This can take up to 30 minutes of heavy cpu usage. Unfortunately, while Baloo will not start to index a new batch of 40 files while on battery power, it continues to determine the broken file while on battery. This behaviour has been fixed in in KDE Applications 4.13.1 (it will stop indexing immediately when the power cord is unplugged) and the time the search for each file can take has been reduced to about 10 minutes. The Semantic Search team is working on improving the indexing tools to handle more difficult files.
Why do I have nepomukservicestub processes even though I've disabled Nepomuk?
It may be a bug. Please file a bug report with a complete description of your problem and the steps to trigger it.
File indexing of PDF/some other file types doesn't work.
PDF indexing is a known issue and it's being tracked in bug #231936. If you have issues with other files, open a bug, preferably adding a sample file that shows the problem.
The program nepomukservicestub crashes at startup.
A large number of fixes for crashes has been fixed for the 4.7.2 release of the KDE Workspaces and Applications. If you encounter more, please file bugs report with detailed instructions on how to reproduce the problem, as sometimes the developers are unable to trigger them in their test setups.
The virtuoso-t process hangs at 100% CPU.
Virtuoso-t is a key component of the old Semantic Search infrastructure and in some occasions the commands sent by the other components end up taking too much time (hence showing the effect of 100% CPU).

Virtuoso is no longer used by Semantic Search starting the Applications 4.13 release.

Sometimes Nepomuk consumes too much RAM.
Many of these problems have been fixed, in other cases however the developers are unable to reproduce the issues correctly. In this case, providing examples and test cases to bug reports increase the chances to get these bugs fixed.
Search accesses the disk too much on startup.
A throttling mechanism implemented in the file indexer, versions after KDE SC 4.8 should no longer have this issue.
My Search database has been corrupted. How do I clean it?
In the extreme case your database is really corrupted and all other attempts have failed, you can delete the $KDEHOME/share/apps/nepomuk directory (where $KDEHOME is usually .kde or .kde4 in your home directory) while Nepomuk is not running. The database will be cleared, but you will also lose existing information such as tags, ratings and comments.

Advanced troubleshooting

共享和隐私

在给出其它链接前我要指出的是:共享 Nepomuk 数据。如果你的标记(tags),评分(ratings)和注释(comments)能够在你发送文件给其他人时同时共享给他会很棒。但是,要是你给一个联系人打上了“麻烦”的标签('在床上烦人'),然后发送这个联系人的信息给一个共同的好友,但你又不希望这个标签也一并发送过去...

这个问题当然在考虑之中,是 Nepomuk 研究的一个重要课题。由于这些隐私顾虑,外加技术挑战,所以暂时 Nepomuk 内容是私有的。放心,Nepomuk 团队尽其可能尊重你的隐私。

更多信息

The new Search technology (post KDE Applications 4.13):

The old Search technology: