日本語を適切に扱うための MySQL 5.7, 8.0 の character set と collation の設定
- TL;DR
- 真面目な説明
本稿の内容は、以下の環境で意図通りに動作することを確認している。
TL;DR
MySQL の文字列エンコーディングを UTF-8 にしたい場合、/etc/mysql/conf.d/charset.cnf
を作成し、内容を次のようにする。charset.cnf
の名前は何でも良いが、拡張子は .cnf
である必要がある。
character set はほぼ間違いなく utf8mb4 で良いが、collation はアプリケーションの要求によって選択の余地がある。代表的な例として、次の設定を挙げる。
MySQL 5.7
[mysql] default_character_set = utf8mb4 [mysqld] character_set_server = utf8mb4 collation_server = utf8mb4_bin
MySQL 8.0
[mysql] default_character_set = utf8mb4 [mysqld] character_set_server = utf8mb4 collation_server = utf8mb4_ja_0900_as_cs_ks
真面目な説明
Unicode と UTF-16, UTF-8
MySQL における character set や collation の扱いを見る前に、まず Unicode や UTF そのものについて概観する。
Unicode は、人類が歴史上使ってきた全ての文字を収録することを目的とする符号化文字集合である。UTF (UCS/Unicode Transformation Format) は、Unicode の符号位置をどのようなビット組み合わせに対応付けるか、という取り決め(文字符号化方式)である。
Unicode には、
- 0 群
- 0, 1, ..., 16 面
- 0, 1, ..., 255 区
- 0, 1, ..., 255 点
が定義されており、全体で 1 * 17 * 256 * 256 = 1,114,112 の符号位置がある。
現代の日本語利用者が日常的に使うラテン文字、アラビア数字、ひらがな、カタカナ、そして多くの漢字は、そのほとんどが第0群第0面の基本多言語面 (Basic Multilingual Plane, BMP) に収録されている。1つの面の符号位置の数は 256 * 256 = 216 = 65,536 であるから、基本多言語面の文字だけを利用できれば良いのであれば、ナイーブに2オクテットを1文字に対応付けるような文字符号化方式を用いれば良い。この方法が UTF-16 である。*1
UTF-16 は、ラテン文字についても1文字あたり2オクテットを要するから、英文を表現する場合には、従来の Latin-1 の2倍の記憶容量を消費する。これは非効率であり、さらに従来の Latin-1 との互換性も失われる。そこで、現代のウェブアプリケーションやその周辺技術においては、別の文字符号化方式である UTF-8 が利用されることが多い。UTF-8 は、下表に示すように、1つの文字に対して、符号位置ごとに異なる長さのビット組み合わせを対応付ける。
オクテット数 | 下限 | 上限 |
---|---|---|
1 | U+0000 | U+007F |
2 | U+0080 | U+07FF |
3 | U+0800 | U+FFFF |
4 | U+10000 | U+10FFFF |
Unicode の U+0000 から U+007F の範囲の128文字は ASCII と同一であるから、ASCII に収録されているラテン文字やアラビア数字は1文字=1オクテットで表現される。これにより、UTF-8 は UTF-16 よりも少ないデータ容量で、ラテン文字からなる文字列をエンコードできる。*2
下の例では、UTF-8 を用いる Linux 環境で、ラテン文字の C, J, K がそれぞれ1オクテット、漢字の 統, 合, 漢, 字 がそれぞれ3オクテット、絵文字の💩が4オクテットでエンコードされていることを確認している。
MySQL への応用
MySQL では、データベースサーバーの character set と collation、データベースクライアントの character set をそれぞれ設定することができる。ここで、character set とは、データベースで扱う文字の種類、並びに文字と一対一で対応するビット組み合わせの取り決めのことであり、collation とは、文字列を並び替えるときに、どのような基準に基づいて文字同士を比較するか、という取り決めのことである。
データベースサーバーの character set
MySQL 5.7, 8.0 のデフォルトの character set と collation を下表に示す。
Default character set | Default collation | |
---|---|---|
MySQL 5.7 | latin1 | latin1_swedish_ci |
MySQL 8.0 | utf8mb4 | utf8mb4_0900_ai_ci |
Latin-1 (ISO/IEC 8859-1) は、英語及び西ヨーロッパの諸言語を表現できる符号化文字集合である。Latin-1 ではひらがな、カタカナ、そして漢字を表現することはできないので、MySQL 5.7 を日本語文化圏で使う場合は、ほぼ間違いなくこれらの設定を見直す必要がある。
一方、MySQL 8.0 のデフォルト character set は utf8mb4 であり、これは、ひらがな、カタカナ、漢字、そして絵文字を全て適切に表現できる。collation の utf8mb4_0900_ai_ci については、見直す余地がある。
ところで、utf8mb4 とは大変珍妙な名前であるが、MySQL の世界には、utf8mb3 という名前の character set も存在する。先に述べたように、UTF-8 は Unicode の1つの符号位置を1オクテットから4オクテットの可変長ビット組み合わせで符号化する。基本多言語面内の符号位置は1から3オクテットで表現され、それ以外の面の符号位置は4オクテットで表現される。MySQL において、utf8mb3 は1から3オクテットの可変長ビット組み合わせに対応するエンコーディング方式である。つまり、utf8mb3 は、基本多言語面の文字しか表現できない。多くの絵文字や、アイヌ語を表現するためのカタカナ、そして一部の漢字は、第0群第1面の追加多言語面に収録されており、これらの文字を十分適切に表現するためには、1符号位置に1から4オクテットのビット組み合わせを対応付ける「フルスペックの」UTF-8 を利用する必要がある。この「フルスペックの」UTF-8 こそが utf8mb4 であり、現在は専ら utf8mb4 を利用することが推奨されている。
これまでの議論により、現代的な MySQL データベースサーバーの character set としては、utf8mb4 を設定するのが最適である。
データベースサーバーの collation
1つの character set には1つ以上の collation が紐づけられる。各 character set にはデフォルトの collation が割り当てられている。MySQL 5.7 の場合、utf8mb4 のデフォルト collation は utf8mb4_general_ci であり、MySQL 8.0 の場合、utf8mb4 のデフォルト collation は utf8mb4_0900_ai_ci である。
utf8mb4 に紐づく collation は複数あり、SHOW COLLATION WHERE Charset = 'utf8mb4';
により確認できる。下表では、日本語環境で有用であろう、代表的なものに限定して表示している。
ここで、各 collation の挙動を確かめるために、テスト用の MySQL 5.7, 8.0 データベースに次のような SQL 文を発行する。ここで、<collation>
の部分は、上表内の collation の値で置き換え、行の挿入時に duplicate error が発生しても、無視して次の行に進む。
DROP DATABASE IF EXISTS `collation_test`; CREATE DATABASE `collation_test` CHARACTER SET utf8mb4 COLLATE <collation>; USE `collation_test`; CREATE TABLE `cities` (`name` VARCHAR(10) PRIMARY KEY); -- ラテン文字の大文字・小文字は区別されるか -- ダイアクリティカルマークの有無は区別されるか INSERT INTO `cities` (`name`) VALUES ('Tokyo'); INSERT INTO `cities` (`name`) VALUES ('Tôkyô'); INSERT INTO `cities` (`name`) VALUES ('Tōkyō'); INSERT INTO `cities` (`name`) VALUES ('TOKYO'); INSERT INTO `cities` (`name`) VALUES ('TÔKYÔ'); INSERT INTO `cities` (`name`) VALUES ('TŌKYŌ'); -- ひらがな・カタカナは区別されるか -- 拗音は区別されるか INSERT INTO `cities` (`name`) VALUES ('とうきょう'); INSERT INTO `cities` (`name`) VALUES ('とうきよう'); INSERT INTO `cities` (`name`) VALUES ('トウキョウ'); INSERT INTO `cities` (`name`) VALUES ('トウキヨウ'); -- 濁音は区別されるか INSERT INTO `cities` (`name`) VALUES ('なごや'); INSERT INTO `cities` (`name`) VALUES ('なこや'); -- 半濁音・促音は区別されるか INSERT INTO `cities` (`name`) VALUES ('さっぽろ'); INSERT INTO `cities` (`name`) VALUES ('さつぽろ'); INSERT INTO `cities` (`name`) VALUES ('さっほろ'); INSERT INTO `cities` (`name`) VALUES ('さつほろ'); -- 漢字の整列順はどうか INSERT INTO `cities` (`name`) VALUES ('東京'); INSERT INTO `cities` (`name`) VALUES ('名古屋'); INSERT INTO `cities` (`name`) VALUES ('札幌'); SELECT `name` FROM `cities` ORDER BY `name`;
各 collation に対する上 SQL 文の結果は、次のとおりとなった。
MySQL 5.7
collation: utf8mb4_general_ci
ラテン文字の大文字・小文字や、ダイアクリティカルマークの有無が区別されていない。ひらがな・カタカナ・漢字は全て区別されている。
collation: utf8mb4_bin
単純なバイナリ比較であるから、例に挙げた全ての文字列が区別されている。
collation: utf8mb4_unicode_ci
ラテン文字の大文字・小文字や、ダイアクリティカルマークの有無が区別されていない。また、ひらがな・カタカナも区別されず、濁音・半濁音・拗音・促音も全て区別されない。
MySQL 8.0
collation: utf8mb4_0900_ai_ci
ラテン文字の大文字・小文字や、マクロン (¯
) やサーカムフレックス (^
) のようなダイアクリティカルマークの有無は区別されず、Tokyo
のみが挿入されている。また、ひらがな・カタカナも区別されず、濁音・半濁音・拗音・促音も全て区別されない。
collation: utf8mb4_0900_bin
単純なバイナリ比較であるから、例に挙げた全ての文字列が区別されている。
collation: utf8mb4_general_ci
ラテン文字の大文字・小文字や、ダイアクリティカルマークの有無が区別されていない。ひらがな・カタカナ・漢字は全て区別されている。
collation: utf8mb4_ja_0900_as_cs
日本語用の collation のひとつである。MySQL :: MySQL 8.0 Reference Manual :: 10.10.1 Unicode Character Sets の説明は、utf8mb4_ja_0900_as_cs はひらがな・カタカナを区別せず、次に挙げる utf8mb4_ja_0900_as_cs_ks はそれらを区別するとしている。ダイアクリティカルマークは区別されている。
For Japanese, the utf8mb4 character set includes utf8mb4_ja_0900_as_cs and utf8mb4_ja_0900_as_cs_ks collations. Both collations are accent-sensitive and case-sensitive. utf8mb4_ja_0900_as_cs_ks is also kana-sensitive and distinguishes Katakana characters from Hiragana characters, whereas utf8mb4_ja_0900_as_cs treats Katakana and Hiragana characters as equal for sorting. Applications that require a Japanese collation but not kana sensitivity may use utf8mb4_ja_0900_as_cs for better sort performance. utf8mb4_ja_0900_as_cs uses three weight levels for sorting; utf8mb4_ja_0900_as_cs_ks uses four.
collation: utf8mb4_ja_0900_as_cs_ks
utf8mb4_0900_bin と同様に、例に挙げた全ての文字列が区別されている。ただし、並び順に差異がある。
collation: utf8mb4_unicode_ci
ラテン文字の大文字・小文字や、ダイアクリティカルマークの有無が区別されていない。また、ひらがな・カタカナも区別されず、濁音・半濁音・拗音・促音も全て区別されない。
データベースクライアントの character set
MySQL に限らず、クライアント・サーバーモデルの DBMS では、多種多様なクライアントソフトウェアの選択肢がある。最もプリミティブなクライアントの一つとして、mysql
がある。これは、例えば Debian 系のシステムで apt-get install mysql-server
をすると、mysqld
とともにインストールされる。
ほとんどの場合、mysql
が利用する character set も、データベースサーバーと同様に、utf8mb4 で問題ないと思われる。
参考:
Difference between `su` and `su -` in Bash on Linux, or Login Shell and Interactive Shell
Although this topic has been discussed from B.C., I want to re-summarize the difference between su
and su -
while explaining the meaning of login shell and interactive shell.
What is Login Shell and Interactive Shell?
A login shell is one whose first character of argument zero is ‘-’, or one invoked with the --login option.
An interactive shell is one started without non-option arguments, unless -s is specified, without specifying the -c option, and whose input and output are both connected to terminals (as determined by isatty(3)), or one started with the -i option.1
When we have normally logged in to a Linux system, the shell where we are in is called login (and also, interactive) shell. We can confirm this by issuing the below command:
$ echo $0 -bash # <- the first character of the output is '-'
We can go into non-login, interactive shell by simply type bash
after logging in as above.
$ echo $0 -bash # <- we are in login shell $ printenv SHLVL 1 $ $ bash $ # <- it looks nothing has happened... $ $ echo $0 bash # <- but we are now in non-login, interactive shell $ printenv SHLVL 2
OK. Then, what is the difference between the two? We can interact with them apparently without any difference at all.
The difference becomes clear when you consider the initializing process of bash. Haven't you been in trouble determining where to put a line of export PATH="/path/to/your/bin:$PATH"
? Some document says ~/.bash_profile
and the other says ~/.profile
, ~/.bashrc
, or something like them. In fact, it depends on your situation and the criterion is below:2
- When launched, login shell firstly searches for
/etc/profile
and loads it if it exists. After that, the shell searches for the below files in that order. When it runs into one, it loads that one and stops searching. At most only one of the below is loaded.-
~/.bash_profile
-
~/.bash_login
-
~/.profile
-
- When launched, interactive shell searches for
~/.bashrc
. If it exists, the shell loads it.
So, it is better to put environment variable declarations on ~/.bash_profile
or one of its siblings. ~/.bashrc
is a good place to put shell options, functions, or aliases. Since ~/.bashrc
is NOT automatically loaded by login shell, many Linux distribution has default settings that load ~/.bashrc
inside /etc/profile
, ~/.bash_profile
, or so. As an example, Ubuntu's ~/.profile
is below:
# ~/.profile: executed by the command interpreter for login shells. # This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login # exists. # see /usr/share/doc/bash/examples/startup-files for examples. # the files are located in the bash-doc package. # the default umask is set in /etc/profile; for setting the umask # for ssh logins, install and configure the libpam-umask package. #umask 022 # if running bash if [ -n "$BASH_VERSION" ]; then # include .bashrc if it exists if [ -f "$HOME/.bashrc" ]; then . "$HOME/.bashrc" fi fi # set PATH so it includes user's private bin if it exists if [ -d "$HOME/bin" ] ; then PATH="$HOME/bin:$PATH" fi # set PATH so it includes user's private bin if it exists if [ -d "$HOME/.local/bin" ] ; then PATH="$HOME/.local/bin:$PATH" fi
In the above script loads $HOME/.bashrc
if it exists.
What is the Difference between su
and su -
?
Now that we understand login shell and interactive shell, we can grasp the meaning of su
and su -
.
su
means substitute user and we can switch to another user after we have logged in. Assume we are user alice
and we want to switch to user bob
, we can do that as follows:
alice@host:~$ su bob Password: *** # <- input Bob's password bob@host:/home/alice$ # <- we have switched to Bob, but we are still in Alice's home directory
The problem here, however, is that the user is still in Alice's home directory and doesn't have the environment set so that the shell looks as if Bob directly logged in to the system.
In a practical world, Bob would have his specific environment variables. For example, assume Bob has installed tj/n to manage multiple versions of Node.js and has an environment variable $N_PREFIX
set to /home/bob/n
and has $N_PREFIX/bin
at the beginning of $PATH
in ~/.profile
like below.
# /home/bob/.profile # ...some other settings... # Initialize tj/n export N_PREFIX="$HOME/n" export PATH="$N_PREFIX/bin:$PATH"
When Alice switches to Bob like the previous example, she would expect to have node
installed with tj/n in $PATH
. In this case, however, she doesn't have it and cannot smoothly use node
like Bob would do.
alice@host:~$ printenv N_PREFIX # <- $N_PREFIX not set alice@host:~$ which n >/dev/null 2>&1 && which node >/dev/null 2>&1 && node -v || echo "node unavailable" node unavailable alice@host:~$ su bob Password: *** bob@host:/home/alice$ printenv N_PREFIX # <- $N_PREFIX still not set bob@host:/home/alice$ which n >/dev/null 2>&1 && which node >/dev/null 2>&1 && node -v || echo "node unavailable" node unavailable
In order to solve this problem, we can use su
's -
option. With this option, we can spawn a brand new bash as a login shell when we switch to another user as if we logged in to the system as the target user.
alice@host:~$ printenv N_PREFIX # <- $N_PREFIX not set alice@host:~$ which n >/dev/null 2>&1 && which node >/dev/null 2>&1 && node -v || echo "node unavailable" node unavailable alice@host:~$ su - bob Password: *** bob@host:~$ # <- unlike the above example, we are immediately in Bob's home directory bob@host:~$ printenv N_PREFIX /home/bob/n # <- $N_PREFIX is set bob@host:~$ which n >/dev/null 2>&1 && which node >/dev/null 2>&1 && node -v || echo "node unavailable" v14.15.5 # <- now, node is available
For this reason, using -
option is safer and more natural. It is generally recommended to use -
option with su
.
When we issue su
or su -
without target username, we can switch to the root user.
$ su - Password: *** # <- input root user's password # # <- switched to root user
In the above example, we type in root user's password. In some Linux distributions including Ubuntu, however, root user doesn't have password set by default. Still, in that case, we can switch to the root user if our user is a sudoer, who can invoke sudo
.
$ sudo su - Password: *** # <- input your password (depending on setting, you may not have to input at all) # # <- switched to root user
In conclusion, we can issue sudo su -
to switch to the root user without having to set root user's password while spawning root user's environment naturally.
pyenv 不要説
macOS ローカルで Python のエフェメラルなスクリプトや小規模で実験的なプロジェクトを作る場合、pyenv いらんやろという結論に至った。 Python は Homebrew によりインストールし、これを直接使う。 現環境には次の Python formulae がインストールされている。
python@3.7
python@3.8
python@3.9
これらは、意識的にインストールしたものではないが、依存関係により、たまたまインストールされていた。
自分の目的では、ほとんどの場合、最新版が使えればよく、これら3つのバージョンが選択できれば十分である。
~/.config/fish/config.fish
にて、新しいバージョンを優先して $PATH
に通した。
set -p fish_user_paths \ "/usr/local/opt/python@3.9/bin" \ "/usr/local/opt/python@3.8/bin" \ "/usr/local/opt/python@3.7/bin"
これで、次のような python3.x
形式のコマンドが利用可能になる。
Executable | Version |
---|---|
python3 |
3.9 |
python3.9 |
3.9 |
python3.8 |
3.8 |
python3.7 |
3.7 |
Virtualenv 周りは手探りだが、まず Poetry、次点で Pipenv という感じでやっている。
$ python3 -m pip install --user poetry $ python3 -m pip install --user pipenv $ $ cd /your/workspace/dir $ $ # init virtualenv $ python3 -m poetry init $ # or $ python3 -m pipenv --python (which python3) $ $ # activate virtualenv $ source .venv/bin/activate.fish
Chap 4 | Syntax in Functions
Pattern matching
With pattern matching, you can check if a data conforms some specification and deconstruct the data. A function can have several bodies for different patterns. The below function checks if its parameter is included in [1, 3]. If the parameter is 1
, 2
, or 3
, it returns "One"
, "Two"
, or "Three"
respectively. If not, it returns "Not in [1, 3]"
.
showInt :: Integral a => a -> String showInt 1 = "One" showInt 2 = "Two" showInt 3 = "Three" showInt x = "Not in [1, 3]"
The concept of pattern matching is pretty like the switch statement in C-like languages. An argument is checked against the pattern from top to bottom. If the argument conforms to a pattern, the function body corresponding to that pattern is used and following patterns and function bodies are all ignored just like switch statements with break
keywords. If the argument doesn't conform to the first 3 specific patterns, it falls to the last pattern and is bound to x
.
We can define factorial function like below. Here, had we interchange the two patterns, the factorial 0 = 1
pattern would be never used and this recursive function never stops.
factorial :: Integral a => a -> a factorial 0 = 1 factorial n = n * factorial (pred n)
If we don't place the all-catch, general pattern at the end, the function may fail. An argument that doesn't conform to any specific patterns isn't caught throughout such a function. If a function without "all-catcher" is called with an argument that doesn't correspond to any patterns in that function, it doesn't know what to do with it and raises a runtime error.
Patterns are also used with tuples. We can define a function that takes 2 2D vectors (pairs) and adds them up.
addVectors :: Num a => (a, a) -> (a, a) -> (a, a) addVectors (x1, y1) (x2, y2) = (x1 + x2, y1 + y2) ghci> addVectors (1,0) (0,1) (1,1)
We can define our original fst
function as below.
myFst :: (a, b) -> a myFst (a, _) = a
myFst
takes a tuple that comprises of any type of values and returns the first one of the pair. As we don't care about the type of the tuple members, we wrote the type declaration as (a, b) -> a
, which means it takes a tuple of any type and returns a value whose type is the same as the first value of the tuple. Since we don't care about the second member of the tuple, the pattern says (a, _)
. _
in a pattern means that the value that is bound to that is not used in the body there.
As for lists, we can extract items or sublists with :
.
myHead :: [a] -> a myHead [] = error "Can't call myHead on an empty list." myHead (x:_) = x ghci> myHead "hello" 'h' ghci> myHead [1,2,3,4] 1 -- myTail :: [a] -> [a] myTail [] = error "Can't call myTail on an empty list." myTail (_:x) = x ghci> myTail "hello" "ello" ghci> myTail [1,2,3,4] [2,3,4]
Notice that we have to surround patterns with parentheses.
We can write a function that takes a list of any type of typeclass Show
and tells something about it.
tell :: Show a => [a] -> String tell [] = "The list is empty" tell (x:[]) = "The list has one element: " ++ show x tell (x:y:[]) = "The list has two elements: " ++ show x ++ " and " ++ show y tell (x:y:_) = "This list is long. The first two elements are: " ++ show x ++ " and " ++ show y
As [1,2,3]
is just syntactic sugar for 1:2:3:[]
, the second and third patterns can also be written as below.
tell [x] = "The list has one element: " ++ show x tell [x,y] = "The list has two elements: " ++ show x ++ " and " ++ show y
Be careful we cannot replace (x:y:_)
with [x,y,_]
or something like that.
There is also a thing called as patterns. With that, we can split a function's parameters into some components like above while keeping a reference to the whole parameters. For example, by prefixing a pattern with all@
like below, you can get a reference all
that points to the whole parameters (whole string in this example).
capital :: String -> String capital "" = "Empty string" capital all@(x:_) = "The first letter of '" ++ all ++ "' is '" ++ [x] ++ "'"
Guards, guards!
While patterns are conditional branching in a function's signature, guards are that in a function body. Patterns make sure that parameters obey some form and deconstruct them. On the other hand, guards execute tests for the deconstructed values and then, you can do something according to that results.
bmiTell :: RealFloat a => a -> a -> String bmiTell weight height | weight / height ^ 2 <= 18.5 = "You're underweight" | weight / height ^ 2 <= 25.0 = "You're normal" | weight / height ^ 2 <= 30.0 = "You're fat" | otherwise = "You're fucking awesome" ghci> bmiTell 60 1.7 "You're normal"
The function bmiTell
takes 2 RealFloat
numbers, calculates a BMI, and then, tell something based on the data. Conditions are written between |
and =
like above. Like switch
statements in C-like languages, conditions are evaluated from top to bottom and if one is evaluated as True
, expressions following =
are returned from the function. The last guard is often otherwise
, which is simply evaluated as True
and catches everything.
Guards check specific conditions of parameters while patterns check if parameters meet a function's signature. If all guards are evaluated as False
, the operation falls to the next pattern. If no suitable function body is found throughout all patterns and all guards, an error is thrown.
Where!?
The previous code is not DRY. We can solve this problem with where
.
bmiTell :: RealFloat a => a -> a -> String bmiTell weight height | bmi <= 18.5 = "You're underweight" | bmi <= 25.0 = "You're normal" | bmi <= 30.0 = "You're fat" | otherwise = "You're fucking awesome" where bmi = weight / height ^ 2
Variables defined inside where
clause are valid across all guards but not shared among patterns. You can define as many variables as you like.
bmiTell :: RealFloat a => a -> a -> String bmiTell weight height | bmi <= skinny = "You're underweight" | bmi <= normal = "You're normal" | bmi <= fat = "You're fat" | otherwise = "You're fucking awesome" where bmi = weight / height ^ 2 skinny = 18.5 normal = 25.0 fat = 30.0
You also can bind some components of values as we have done with patterns.
initials :: String -> String -> String initials firstname lastname = [f] ++ ". " ++ [l] ++ "." where (f:_) = firstname (l:_) = lastname
We could do the same thing as above with function parameters pattern matching and many times that is a better way (this is just a demo). Functions can also be defined in where
clause.
calcBmis :: RealFloat a => [(a, a)] -> [a] calcBmis xs = [bmi w h | (w, h) <- xs] where bmi weight height = weight / height ^ 2 ghci> calcBmis [(50, 1.7), (60, 1.7), (70, 1.7)] [17.301038062283737,20.761245674740486,24.221453287197235]
where
can be nested. It's a common idiom to define a function with helper functions, which also have several helper functions too.
Let it be
Let bindings are expressions with in which you can bind values to variables. Let bindings are used like:
let var1 = val1 var2 = val2 in var1 + var2
In the example above, we assume that val1
and val2
are the same type which can be added. You can define functions in the let part:
ghci> [let square x = x * x in (square 5, square 3, square 2)] [(25,9,4)]
Let can also be put inside list comprehensions.
calcBmis :: RealFloat a => [(a, a)] -> [a] calcBmis xs = [bmi | (w, h) <- xs, let bmi = w / h ^ 2]
Case expressions
case expression of pattern -> result pattern -> result pattern -> result ...
Pattern matching on parameters in function definitions are just syntactic sugar for case expressions. The two codes below are equivalent and interchangeable.
myHead :: [a] -> a myHead [] = error "Can't call myHead on an empty list." myHead (x:_) = x
myHead :: [a] -> a myHead xs = case xs of [] -> error "Can't call myHead on an empty list." (x:_) -> x
Chap 3 | Types and Typeclasses
Believe the type
We can get a type of expressions with :t
command in GHCi.
ghci> :t 'a' 'a' :: Char
::
is read as "has type of". In the example above, Haskell says "'a'
has type of Char
". Explicit types are always denoted with the first letter in capital case, for example, Char
, Bool
, or Int
. Square brackets in type name denote a list. [Char]
is read as "a list of characters" or "a string". Each tuple has its own type. (Bool, Char)
is different from (Char, Char, Char)
.
Functions also have types. We can optionally give functions an explicit type declaration when defining ones. This is generally considered to be a good practice.
removeNonUppercase :: [Char] -> [Char] removeNonUppercase st = [c | c <- st, c `elem` ['A'..'Z']]
removeNonUppercase
has type of [Char] -> [Char]
, meaning that it maps from a String
to a String
. In fact, String
is a synonym for [Char]
so we can also write the previous example as below.
removeNonUpperCase :: String -> String
For a function that takes several parameters, we do as below.
addThreeInts :: Int -> Int -> Int -> Int addThreeInts x y z = x + y + z
It takes three Int
s, adds them up, and returns the result. There is no explicit distinction between parameters and a return value in a function declaration. They are just separated by ->
. Simply the last item in the ->
chain is the return value's type.
An overview of some common types
Int
is an N-bit signed integer on N-bit machines. On a 64-bit system,Int
is bounded inside [-9223372036854775808, 9223372036854775807].Integer
is also an integer except that it isn't bounded in any range.Float
is a single precision floating point number.Double
is a double precision floating point number.Bool
is a boolean. The only two variants areTrue
orFalse
.Char
is a character. It's denoted by single quotes. A list ofChar
is aString
.
Type variables
The built-in function head
has type of [a] -> a
.
ghci> :t head head :: [a] -> a
Here, a
is a type variable. It isn't a type. (Remember, explicit types are written capital-cased) A type variable describes that it can be of any type. This is like generics in other languages. Functions that have type variables in their declaration are called polymorphic functions. head
is polymorphic. It takes a list of any type and returns an element of that type.
Typeclasses 101
Typeclasses are like interfaces in other languages. If a type is a member of a typeclass, the type supports and implements the behavior the typeclass specifies.
In Haskell, ==
is an infix function and it has type. (Infix functions have to be surrounded by parenthesis when they are not placed between operands)
ghci> :t (==) (==) :: Eq a => a -> a -> Bool -- +-------+------+ +----+ -- |Typeclass | -- |Parameters -- |Return value
Eq a =>
is a new thing here. A thing placed before =>
is called a class constraint. The previous declaration means that the ==
function takes exactly two parameters that are of the same type and returns a value of type Bool
. Besides, the two parameters have to be a member of the typeclass Eq
. Equivalence has to be defined among the parameter candidates.
Some basic typeclasses
Eq
is for types that support equality testing. Its members have to implement==
and/=
.Ord
is for types that have an ordering. Its members have to implement>
,<
,>=
, and<=
.Show
is for types that can be shown to a human. One of the most important functions a value whose type is a member ofShow
implements isshow
.Read
is sort of the opposite typeclass ofShow
. The built-inread
function takes aString
value and returns a value whose type is a part ofRead
typeclass.Enum
members are sequentially ordered types. Values that implementsEnum
have successors and predecessors, which you can get withsucc
andpred
respectively. Types in this typeclass are()
,Bool
,Char
,Ordering
,Int
,Integer
,Float
, andDouble
.Bounded
members have an upper and a lower bound. Built-in functionminBound
andmaxBound
have type ofBounded a => a
. They take no parameter. They're, so to speak, polymorphic constants. Tuples are bounded if their members are all bounded.Num
is a numeric typeclass. For example, a literal20
's type isNum p => p
so it is also a polymorphic constant.Int
,Integer
,Float
, andDouble
are member ofNum
typeclass.Integral
is also a numeric typeclass but it only includesInt
andInteger
.Floating
is also a numeric typeclass but it only includesFloat
andDouble
.
Hierarchical structure of numeric typeclasses
![f:id:tmsick:20200119182309p:plain f:id:tmsick:20200119182309p:plain](https://cdn-ak.f.st-hatena.com/images/fotolife/t/tmsick/20200119/20200119182309.png)
『Rによる実証分析 回帰分析から因果分析へ』の不親切な点に関するメモ
ISBN: 978-4274219474
pp.68-69
\[\begin{eqnarray} MSE &=& \frac{1}{n} \sum_{i = 1}^n \underbrace{ (y_i - y^*)^2 }_{展開} \nonumber \\ &=& \frac{1}{n} \sum_{i = 1}^n (y_i^2 - 2 y_i y^* + {y^*}^2) \nonumber \\ &=& \frac{1}{n} \sum_{i = 1}^n y_i^2 - \frac{2}{n} y^* \sum_{i = 1}^n y_i + \frac{1}n {y^*}^2 \underbrace{ \sum_{i = 1}^{n} 1 }_{n} \nonumber \\ &=& \frac{1}{n} \sum_{i = 1}^n y_i^2 - \frac{2}{n} y^* \sum_{i = 1}^n y_i + {y^*}^2 \nonumber \\ &=& \underbrace{ (y^* - \frac{1}{n} \sum_{i = 1}^n y_i)^2 - (\frac{1}{n} \sum_{i = 1}^n y_i)^2 }_{y^*に着目して平方完成} + \frac{1}{n} \sum_{i = 1}^n y_i^2 \nonumber\end{eqnarray}\]
上の変形より、を最小化するための条件が
\[\begin{equation} y^* - \frac{1}{n} \sum_{i = 1}^n y_i = 0 \Leftrightarrow y^* = \frac{1}{n} \sum_{i = 1}^n y_i \end{equation}\]
であるとわかる。
Synology DiskStation DS119j で Time Machine バックアップ
TL;DR
- DS119j で問題無く Time Machine バックアップができる
- 操作説明は下の動画が一番わかりやすい
Synology DiskStation DS119j + Seagate IronWolf 2TB を買った。初めての NAS。
![【NASキット】Synology DiskStation DS119j [1ベイ / デュアルコアCPU搭載 / 256MBメモリ搭載] シンプルな初心者向け 【NASキット】Synology DiskStation DS119j [1ベイ / デュアルコアCPU搭載 / 256MBメモリ搭載] シンプルな初心者向け](https://images-fe.ssl-images-amazon.com/images/I/319RHXwXOAL._SL160_.jpg)
【NASキット】Synology DiskStation DS119j [1ベイ / デュアルコアCPU搭載 / 256MBメモリ搭載] シンプルな初心者向け
- 出版社/メーカー: Synology
- 発売日: 2018/09/27
- メディア: Personal Computers
- この商品を含むブログを見る
![Seagate IronWolf 3.5 Seagate IronWolf 3.5](https://images-fe.ssl-images-amazon.com/images/I/51Uo1Yo07SL._SL160_.jpg)
Seagate IronWolf 3.5" 2TB 内蔵ハードディスク HDD 3年保証 6Gb/s 64MB 5900rpm 24時間稼動 PC NAS向け ST2000VN004
- 出版社/メーカー: SEAGATE
- 発売日: 2016/09/16
- メディア: Personal Computers
- この商品を含むブログを見る
DS119j は初心者向けとある通り、超簡単に設定ができる。パッケージはしっかりしている。ネジの作りは悪い。これは個体差があるかもしれない。
ファイルシステムについて、DS119j は EXT4 のみに対応しており Btrfs を使うことはできない。しかし、先に紹介した動画にあるように、Time Machine 用のユーザを作り、そのユーザの quota を適切に設定すれば、ボリュームがバックアップデータに占領されることは無い。自分は適当に 1TB だけ Time Machine User に割り振った。
Time Machine の初回バックアップは尋常じゃなく遅い。