平时很喜欢用iPython的Notebook功能来学习和实验python程序。今天在台新电脑上办公,所以需要重新安装一下。安装iPython有两种选择:
- 通过pip安装
- 安装Anaconda从而获得iPython (这个选择比较适合新手,推荐下载Anaconda安装包然后一键搞定)
这次主要介绍如何通过pip来安装。iPython是由很多模块组成的, 为了不漏装任何组件,我用了这个命令来安装所有组件:
$ pip install ipython[all]
安装成功后通过这命令来运行Notebook:
$ ipython notebook
之后有可能会弹出这个错误(你如果选择了用Anaconda的方式来安装也会碰到这个错误):
ValueError, 'unknown locale: %s' % localename
在StackOverflow上已经有人提出了解决方案,在命令行里找到.bash_profile 然后添加下面两行代码:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
然后别忘了重新载入.bash_profile让新添加的代码生效(注意两个点之间的空格):
$ . .bash_profile
之后再运行这个命令就可以在浏览器里看到iPython Notebook的界面了:
$ ipython notebook
iPython Notebook的工作原理是在本地启动一个服务器,你通过localhost:8888/tree 这个地址就可以连接到这个服务器上与之通信。从而实现在浏览器里写代码,传给本地服务器执行,然后本地服务器传回结果并在网页上呈现这个循环。
每次使用完Notebook,只关闭网页本身是不够的。需要在你启动Notebook的那个Terminal里输 Ctrl + c 然后确认,才能把服务器关闭。
Let me use an example to illustrate this topic:
A chinese character: 汉
it's unicode value: U+6C49
convert 6C49 to binary: 01101100 01001001
Nothing magical so far, it's very simple. Now, let's say we decide to store this character on our hard drive. To do that, we need to store the character in binary format. We can simply store it as is '01101100 01001001'. Done!
But wait a minute, is '01101100 01001001' one character or two characters? You knew this is one character because I told you, but when a computer reads it, it has no idea. So we need some sort of "encoding" to tell the computer to treat it as one.
This is where the rules of 'UTF-8' comes in: http://www.fileformat.info/info/unicode/utf8.htm
Binary format of bytes in sequence
1st Byte 2nd Byte 3rd Byte 4th Byte Number of Free Bits Maximum Expressible Unicode Value
0xxxxxxx 7 007F hex (127)
110xxxxx 10xxxxxx (5+6)=11 07FF hex (2047)
1110xxxx 10xxxxxx 10xxxxxx (4+6+6)=16 FFFF hex (65535)
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (3+6+6+6)=21 10FFFF hex (1,114,111)
According to the table above, if we want to store this character using the 'UTF-8' format, we need to prefix our character with some 'headers'. Our chinese character is 16 bits long (count the binary value yourself), so we will use the format on row 3 as it provides enough space:
Header Place holder Fill in our Binary Result
1110 xxxx 0110 11100110
10 xxxxxx 110001 10110001
10 xxxxxx 001001 10001001
Writing out the result in one line:
11100110 10110001 10001001
This is the UTF-8 (binary) value of the chinese character! (confirm it yourself)
Summary
A chinese character: 汉
it's unicode value: U+6C49
convert 6C49 to binary: 01101100 01001001
UTF-8 binary: 11100110 10110001 10001001