opf-cc: epub 和 mobi 的自动简繁转换工具

写一个自动转换 epub 或者 mobi 格式文件的中文简繁体的工具是我一直想做的事情,因为有不少格式内容精美的书籍资源都只有繁体中文版本,而我又不习惯长篇阅读繁体,所以常常要手工转换再用 calibre 封装,不胜其烦,相信其他人也有类似需求。

上周末有空,就用 Python 写了 opf-cc 这个项目,是 Open Packaging Format Chinese Conversion 的缩写,因为 epub 和 Amazon 采用的 mobi 都只是封装方式,具体的文件布局都是按照 OPF 规范的。这里说说实现的思路。

简繁转换这个问题相对好解决,有现成的 OpenCC 在处理多繁一简或者多简一繁的问题上已经很完善了,所以就稍微修改了一下 OpenCC 的代码直接拿来用了,修改都作为 pull request 已经提交到上游了。

解包 epub 比较简单,因为 epub 实际上就是 zip 压缩包,所以用 Python 的 zipfile 模块直接就可以解压。mobi 的解包稍微麻烦一些,如果不用 calibre 那一套庞大的库,mobiunpack 就是最好的选择。

解包后需要找到应该转换的文件,比较麻烦的地方是有的目录中 href 到的文件名本身就是繁体,如果直接整个目录文件一起转换,就得把文件也对应改名,比较麻烦,这里我尝试用 lxml 来解析目录文件,挑出文本来调用 OpenCC 的 Python 模块进行转换,对于 href 属性的内容则不转换。

重新打包 epub 也简单,用 OS X 的 zip 工具一压就可以了 (更新: fishy 提供了不依赖单独 zip 工具而是直接用 Python 的 zipfile 的实现),但 mobi 的打包比较麻烦,要么用 calibre 要么用 Amazon 提供的 KindleGen,好处是下载安装一个二进制程序就可以了,坏处是生成的文件大小要比原来的文件大一倍有余,calibre 就没这个问题 (更新: 经 byelims 推荐使用 kindlestrip 来处理 KindleGen 生成的文件,可以去掉冗余的数据)。

总的说来这个项目还有不少可以改进的地方,除了上述两点以外,还有可以加入简体转繁体的功能,也就是给 OpenCC 传一个不同的参数的事而已。不过我设想该有的功能都已经有了,具体应用的时候遇到什么问题再拿来改进。

另外有兴趣的朋友可以提供更方便的封装,比如用 Automator 或者 ThisService 做成 OS X 的 Service,就可以直接在 Finder 里选中文件右键点击转换了。

My new job

In my previous post I talked about leaving Nokia and the Qt community. So what am I joining? Turned out I’m staying in Oslo for Opera Software. Why? There are a few reasons.

  • When I applied for a job at Nokia, Qt Development Frameworks, I also sent my resume to Opera. But their response came too late (I got a “Your background looks very interesting…” letter after 4 months), by the time I received it, I have finished my interviews at Nokia and almost decided to join them. So I joined the trolls for 2 years. But I have always been thinking what would be like to work on Opera instead. Now I got the chance.
  • I joined the trolls expecting to be a Mac developer, but as it turned out I actually focused on the other interest: typography. It’s wonderful to be one of the few typography engineers in the world, but I still want to sharpen my Cocoa skills from time to time. So now I’m actually working full time as a Mac developer for Opera.
  • Working on typography is my dream job since I was a child. But I had the fear that I was too familiar with internals of Qt thus afraid of change and learning new things. Now I got the exposure of a whole new area and have to quickly learn a lot of new things — exactly I wanted.
  • Doing framework job is a great learning experience, the code has to be so solid and stable and I get to work with many great engineers. But from time to time I wanted to work on some products that are closer to the end user, like a browser. Something that you can go to the party and tell rest of the people what you are working on. (Explaining Qt to non-tech people is not exactly my strength.)

I have worked in the new Opera office for more than a month and so far it has been a really great experience. The work is fast pace, challenging and my colleagues are friendly. The best thing so far is we have free beers on every Friday 🙂 I will probably write again about my job after a few months and tell you more.

< /troll >

Two years ago, I started my first job at Nokia, Qt Development Frameworks. Originally planned to become a Mac developer, I ended up working on the text layout and font rendering part of Qt. Not exactly carried out what I wanted to do, it is still a fantastic job with many good memories. Some of my favorite parts of this job are:

  • Passionate co-workers with the amount of dedication I have never seen before. I really enjoyed working with the fellows in graphics team. Many of them have spent more than ten or even twenty years on graphics programming. It is truly a team I’m honored to work with and I learnt a lot from them.
  • Really high quality code base with decent workflow (git and gerrit are the best tools and JIRA for bug tracking worked well for me).
  • Nice office layout with separated rooms.
  • Social activities such as climbing, skiing and movie nights, we also helped each other on moving and hang out together in house warming parties – time spent with friendly and helpful trolls is one of my best memories.
  • Free drinks and snacks (I enjoyed the apple juice, chocolate milk and cookies).
  • Ice creams for the whole summer, and donuts on the first Friday of every month.
  • Lønningspils – a Norwegian tradition: monthly beer drinking on paycheck weekends.
  • Table tennis – I guess I let a lot of people down as a Chinese yet played just so so in this sport.
  • The 8-core Mac Pro I used as the main development machine.

Things that didn’t work that well includes:

  • Pressure from Nokia to do some of the stuff I didn’t enjoy that much and ended up not shipping anything, I surely learnt a lot from it, but still felt that is a waste of time.
  • Canteen – It was quite good when I joined, but got much worse from the end of last year when the previous chef left. Terrible soup choices and warm food usually tasted awful.

Anyway, after these wonderful two years, I think it’s time to move on. I want to work on something else now. So I didn’t join most of my colleagues transferring to Digia, but I sincerely think that they have a very good chance of succeed and making “Qt Everywhere” more true than it is. It has been a great ride and if anyone is looking for a job and Digia is hiring, I can recommend it without any hesitation.

I will try to cover what’s next for me in the next post, stay tuned 😉

Thoughts on Learnable Programming

After reading Bret Victor’s new essay Learnable Programming and some of the responses, the idea keep lingering in my head.

While his famous lecture Inventing on Principle tackled the idea of making new principles, people are mostly interested in applying that visualization method on IDEs and such. Although most of these efforts are largely experimental, I still find it harsh to dismiss them and consider them as a failure. Live coding is not the point of Victor’s lecture or essay, but it is still immensely better than a dumb editor and cold console that I used to use to learn programming.

(My girlfriend, who majored in landscape design, learned most JavaScript concepts with the rudimentary web editor from Codecademy. She found it engaging and helpful. I doubt she will find it that way if she was given a Unix console and gcc, checking a.out output every time a change was made.)

In the area of learning a new way of thinking, things as simple as live coding can be useful.

What’s more important is how do we progress after grasping basic concepts like iteration and function? While most people think learning programming as a one time effort, I have programmed for more than 10 years and I’m still constantly learning.

Malcolm Gladwell’s Outliers brought us the concept of 10,000 hours of practice. He told us that Bill Joy became a master after practicing programming for 10,000 hours. But what exactly has Bill Joy done in his practice is unknown to us. We all knew that Bill has coded the vi editor and some of the BSD network stack. Did he start practicing from writing a smaller subset of vi or BSD?

The essential question here is how do we learn higher levels of abstraction and design systems more complicated.

First, I personally believe that we need to understand the details to do higher level designs. You won’t be able to design an efficient system without knowing the implementation costs and performance impact of critical code paths. That’s why I usually choose to dive to the bottom and solve/evaluate a few critical issues before looking at the big picture.

However, human mind is not a precise machine that can assemble low level details into high level constructs. After mastering the details sometimes we need to take it away, clear our minds to see a bigger picture, otherwise our heads will be occupied by low level details and have no time to abstract. Contradicted as it is, visualization becomes a really useful tool to hide the details for us.

With visualization as simple as a pie chart, we will suddenly able to free our minds, stand on our previous results and reach higher fruits. That, I think, is the real value of Bret Victor’s idea. What I disagree with him is that his idea doesn’t have to be a groundbreaking rule to revolutionize our ways of programming, the existing tools and IDEs can still equip that as a weapon, to help us tackle even more complex problems.

Imagine code visualization as an amplifier. Without that it is possible to work on certain issues, but with that you are suddenly able to work on harder issues in a more efficient manner.

How to make that really work? Let’s start from small. Take your favorite IDE for example, Xcode can have built-in visualizer for native Cocoa classes such as NSString, NSArray, NSColor and NSFont. The code to show them is already there, just need to find a way to present. Qt Creator can have visualizer for QString, QTransform, QColor and QFont, etc.

For visualizing the states of a program, it works just like our existing debugger with variable watching feature. Except that is presented in a structural way, specific presentation can be given by the programmer himself, just like writing a test case. We can have a drag-and-drop layout interface to do that.

Some of Bret Victor’s ideas (like parameter tagging) can already be achieved with a static analyzer such as clang. Others can only be done in execution. How to speed up execution to avoid setting up a lot of states is the most interesting problem. I think we can try to reduce the amount of input and take an independent block of code (functions without side effects) for examination in an isolated sandbox environment, supplying the required parameters in a simpler way just like preparing for test data sets.

In summary, I still expect that this visualization tool as a programming assistant. An assistant that can take a piece of code, walk through its execution and explain it to me how it works. Even if can only tell how does one variable transits in a loop, it will still be much better than me evaluating it by hand with paper and pencil. In short, hand the repeating tasks over to your machine and focus on something more creative, that’s what we humans are good at.


相比熟知的中西欧国家,北欧对于国内的朋友较为陌生,比如我有的朋友从来记不住我去的是挪威,分不清瑞典与瑞士的也大有人在,不独国人,距一位美国同事说,他的同乡一直以为他去的是南卡罗来纳周的 Norway。近期见诸媒体的有一些关于北欧的报道,比如南方周末的这篇《高物价高税负,丹麦为何走得比美国好》和纽约时报的《Investors Seek Out Safer Shores》也算是管中窥豹。来挪威已经两年,别的北欧国家不敢说,至少对于挪威可以略谈谈我自己了解到,而国内甚少报导的部分,权作参考。

  • 常说挪威是个“社会主义国家”,从我们中学学到的定义“国家控制经济命脉”来说,确实如此,挪威的核心产业基本上都是国家控制或者大力扶持,个人独资的企业极少,比如石油是国家石油公司 (Statoil) 占了绝大多数的份额、铁路是 NSB、航空业老牌的北欧航空 (SAS) 虽然受到很多挑战,但仍占最大份额。
  • 相比其他北欧国家,挪威对于农产品有非常大的贸易保护,传统的渔业不说,乳业是由国内的巨头 Tine 控制的,去年因为 Tine 对雨季后产奶量的错误判断造成了整个国家的“黄油危机”,整个圣诞季全国的商店都没有足够的黄油,只能用人造黄油替代,而进口黄油很难进入挪威市场,Tine 不仅是最大的乳业销售商,同时它还占有行业规范制定者的地位,类似“裁判下场打球”,所以国内对此有很多批评。禽业则来自 Prior 占据了超市绝大部分的鸡蛋货架。
  • 与工农业的贸易保护同时的,是零售业的封闭,整个挪威的贫富差距虽然较小,但是最富有的人往往是国内的零售业巨头,如 Rimi 等。有观点认为,高物价其实很大部分来自零售业的贪婪,同样的产品,如果在中东超市销售,价格就比常见的联锁超市链的售价要低很多。
  • 从中国到挪威,最不习惯的也许是巨大的人力成本差异,从网上购物的邮费或者快递费动辄数百不说,技术工人的收费更是令人咂舌,电工和水管工一小时的收费可以轻易上千,房屋、汽车的维修费用也极高,例如安装汽车尾部的拖车装置就要花费上万。在国内这是很难想象的。
  • 虽然整个挪威地广人稀,但首都奥斯陆的人口密度却比北京更高,高人口密度很大程度上来自移民滥用了福利制度。如最近的报导认为短期来挪威工作的人对社会的贡献要比长期的移民多得多,因为短期的工程师等往往有较高工资,纳税但基本上享受不到福利就回去了,而长期移民却有许多长期享受失业和生育福利但从来不工作的。
  • 同时这也带来了很多不稳定的因素,来挪威两年,遇到过三次在公交车上偷窃的行为,据办公室的同事说,这在十来年前的挪威是很难想象的,当时的城市用“路不拾遗”形容也毫不夸张。再比如奥斯陆警察以巡逻不必佩枪自豪,但最近大部分的警察也开始佩枪了,当然治安的不稳定因素也不只是移民造成的。
  • 挪威的 IT 产业相比美国和许多其他欧洲国家都更陈旧和封闭,相比邻国瑞典对创业的扶持都远远不及,整个产业最多的是 Java、.NET 的企业咨询,应用新技术的创业公司极少,招聘国外的员工也很少。最近见到的负面新闻有整个国家的银行登录系统 BankID 坚持使用 Java applet、投入巨资开发,处理整个国家人口与税务数据的 Altinn 系统闹了大笑话 等等。
  • 北欧的医疗保障体系广受赞誉,但身在挪威的人反映这个体制的主要问题是医疗处理的缓慢,造成许多人不夸张病情就无法就医,情况要比在中国糟糕得多。同时也有人利用体系的漏洞,比如威胁转到私立医院来加快就医的速度,或者不得不支付高昂的私人保险来减少排队时间。从某种程度上说,这个体制对于大病是很好的防护网,但对于小病和急诊的待遇其实不如中国的情况。