最近發(fā)現(xiàn)比較坑的問(wèn)題,pandas.read_json在讀取長(zhǎng)整數(shù)的時(shí)候會(huì)篡改數(shù)字。
具體的代碼如下:
import json
import pandas as pd
data = {
"id1": "3661430294729648121",
"id2": "1298519559306190850",
"id3": "9999999999999999",
}
df = pd.read_json(json.dumps(data), orient='index')
print(df)
輸出的結(jié)果是:

研究了半天以后在:https://github.com/pandas-dev/pandas/issues/20608 和 https://github.com/pandas-dev/pandas/issues/33766 找到了答案。
主要原因就是默認(rèn)情況,pandas會(huì)把整數(shù)轉(zhuǎn)換為float浮點(diǎn)型,然后再轉(zhuǎn)為int類型,類似:

解決辦法就是讀取數(shù)據(jù)的時(shí)候加一個(gè)dtype={},然后代碼如下:
import json
import pandas as pd
data = {
"id1": "3661430294729648121",
"id2": "1298519559306190850",
"id3": "9999999999999999",
}
df = pd.read_json(json.dumps(data), orient='index', dtype={})
print(df)
輸出結(jié)果就正常了

贊
0
賞


