Postgresql 笔记
Postgresql 学习使用中的一些技巧和心得,供以后参考。
Backup & restore
pg_dump: https://stackoverflow.com/questions/24718706/backup-restore-a-dockerized-postgresql-database
docker exec -t oms-db pg_dumpall -c -U sa | gzip > ./dump_$(date +"%Y-%m-%d_%H_%M_%S").gz
gunzip < your_dump.sql.gz | docker exec -i oms-db psql -U sa -d ck
Column alias
-
Column aliases can be used in the SELECT list of a SQL query in PostgreSQL.
-
Like all objects, aliases will be in lowercase by default. If mixed-case letters or special symbols, or spaces are required, quotes must be used.
-
Column aliases can be used for derived columns.
-
Column aliases can be used with GROUP BY and ORDER BY clauses.
-
We cannot use a column alias with WHERE and HAVING clauses.
注意: 无法在 where 和 having 使用 column alias
ck=# select id, wx_nickname || ' ' || real_name as name, case when length(real_name) > 0 then real_name else wx_nickname end as name2 from auth_user where name2 like '2%' order by id desc limit 30;
ERROR: column "name2" does not exist
LINE 1: ...lse wx_nickname end as name2 from auth_user where name2 like..
原因及解决方案:
https://dba.stackexchange.com/questions/225874/using-column-alias-in-a-where-clause-doesnt-work
The (historic) reason behind this is the sequence of events in a SELECT query. WHERE and HAVING are resolved before column aliases are considered, while GROUP BY and ORDER BY happen later, after column aliases have been applied.
使用sub query https://www.postgresqltutorial.com/postgresql-subquery/
ck=# select id, name2 from (select id, wx_nickname || ' ' || real_name as name, case when length(real_name) > 0 then real_name else wx_nickname end as name2 from auth_user order by id desc) as t where name2 like '2%';
id | name2
----+-------
15 | 22
1 | 222
(2 rows)
Explain analyze
https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan
Time series data model (时序数据)
https://medium.com/@neslinesli93/how-to-efficiently-store-and-query-time-series-data-90313ff0ec20
https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae
Graph data model(图数据)
https://news.ycombinator.com/item?id=10316872
WITH Queries (Common Table Expressions CTE)
https://www.postgresql.org/docs/current/queries-with.html
https://www.postgresqltutorial.com/postgresql-subquery/
主要用来查询层级数据,例如返回id为2的用户的所有下属
CREATE TABLE employees (
employee_id serial PRIMARY KEY,
full_name VARCHAR NOT NULL,
manager_id INT
);
WITH RECURSIVE subordinates AS (
SELECT
employee_id,
manager_id,
full_name
FROM
employees
WHERE
employee_id = 2
UNION
SELECT
e.employee_id,
e.manager_id,
e.full_name
FROM
employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SELECT
*
FROM
subordinates;
JSONB column
https://rollout.io/blog/unleash-the-power-of-storing-json-in-postgres/
关系型数据库为什么要存储 json 数据?
-
Avoid complicated joins on data that is siloed or isolated. 避免对孤立的数据进行无意义的join操作。
-
Avoid transforming data before returning it via your JSON API. json 数据无需处理可以直接返回。
例如: 供应商有多种工艺属性,两种实现方案:
-
可以选择新建工艺表与供应商表关联;
-
也可以新增 data 字段,类型为 jsonb, {“process”: [“机加工”, “版金”]}
按照中文拼音排序
https://github.com/digoal/blog/blob/master/201612/20161205_01.md
select count(*)
数据量在 100w 级别的表上,执行该语句需要 100多毫秒,因为需要全表扫描。
https://www.cybertec-postgresql.com/en/postgresql-count-made-fast/
优化方案:
如果业务上对这个数字的实时性要求不高,可以在应用这边做缓存,1小时过期。
unnest (list join table)
SELECT t.*
FROM unnest(ARRAY[1,2,3,2,3,5]) item_id
LEFT JOIN items t on t.id=item_id
https://stackoverflow.com/questions/2486725/postgresql-join-with-array-type-with-array-elements-order-how-to-implement
Listen/Notify & Trigger
https://blog.lelonek.me/listen-and-notify-postgresql-commands-in-elixir-187c49597851
postgres 支持异步pub/sub,客户端可以收到数据库的消息通知。
使用场景:
-
数据表实时备份,对应用透明。
-
不修改A应用逻辑下,B应用能实时同步到A应用的数据。
缺点
数据库连接中断后,需要重新监听,存在数据丢失的情况。
解决方案:使用触发器把数据写入新的临时表,再异步消费。参考:https://billtian.github.io/digoal.blog/2018/07/13/03.html
实现步骤:
1. 创建数据库函数,发送通知到 test channel
CREATE OR REPLACE FUNCTION notify_op_log_changes()
RETURNS trigger AS $$
BEGIN
PERFORM pg_notify(
'test',
json_build_object(
'operation', TG_OP,
'record', row_to_json(NEW)
)::text
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
2. 创建触发器,数据插入或更新后执行上面的函数
AFTER INSERT OR UPDATE
ON cybase_op_log
FOR EACH ROW
EXECUTE PROCEDURE notify_op_log_changes();
3. 测试脚本,订阅 test channel,接收消息
conn = psycopg2.connect(...)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
curs = conn.cursor()
curs.execute("LISTEN test;")
print("Waiting for notifications on channel 'test'")
while True:
if select.select([conn],[],[],5) == ([],[],[]):
print("Timeout")
else:
conn.poll()
while conn.notifies:
notify = conn.notifies.pop(0)
print("Got NOTIFY:", notify.pid, notify.channel, notify.payload)