加入重复值/行的问题

分享于2023年04月15日 indexing pandas pandas-groupby python 问答
【问题标题】:Issue with joining repeated values/rows加入重复值/行的问题
【发布时间】:2023-04-14 21:49:02
【问题描述】:

python 新手,似乎无法理解如何进行。 使用 bin 并编辑我的数据框后,我想出了这个:

    Continents  % Renewable Country
0   Asia    (15.753, 29.227]    China
1   North America   (2.212, 15.753] United States
2   Asia    (2.212, 15.753] Japan
3   Europe  (2.212, 15.753] United Kingdom
4   Europe  (15.753, 29.227]    Russian Federation
5   North America   (56.174, 69.648]    Canada
6   Europe  (15.753, 29.227]    Germany
7   Asia    (2.212, 15.753] India
8   Europe  (15.753, 29.227]    France
9   Asia    (2.212, 15.753] South Korea
10  Europe  (29.227, 42.701]    Italy
11  Europe  (29.227, 42.701]    Spain
12  Asia    (2.212, 15.753] Iran
13  Australia   (2.212, 15.753] Australia
14  South America   (56.174, 69.648]    Brazil

现在,当我将 Continents 和 % Renewable 设置为 miltiindex 时,使用:

Top15 = Top15.groupby(by=['Continents', '% Renewable']).sum()

获得以下信息:

                            Country
Continents  % Renewable   
Asia    (15.753, 29.227]    China
         (2.212, 15.753]    JapanIndiaSouth KoreaIran
Australia   (2.212, 15.753] Australia
Europe  (15.753, 29.227]    Russian FederationGermanyFrance
        (2.212, 15.753] United Kingdom
        (29.227, 42.701]    ItalySpain
North America   (2.212, 15.753] United States
                 (56.174, 69.648]   Canada
South America   (56.174, 69.648]    Brazil

现在我想有一列可以告诉我每个索引中的国家数量,即:

第一排-中国=1,

日本印度韩国伊朗排在第 2 排

所以最后我想要这样的东西:

Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1

我只是不知道怎么去那里。

此外,数字需要按降序排序,同时仍保留索引分组。


【解决方案1】:

size 的解决方案:

What is the difference between size and count in pandas?

print (Top15.groupby(['Continents', '% Renewable']).size())
Name: Country, dtype: int64
Continents     % Renewable     
Asia           (15.753, 29.227]    1
               (2.212, 15.753]     4
Australia      (2.212, 15.753]     1
Europe         (15.753, 29.227]    3
               (2.212, 15.753]     1
               (29.227, 42.701]    2
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
dtype: int64

如果需要更改订单,使用 sort_values ,对于数据框添加 reset_index ,如果需要,最后使用 MultiIndex 添加 set_index

print (Top15.groupby(['Continents', '% Renewable']) \
            .size() \
            .reset_index(name='COUNT') \
            .sort_values(['Continents', 'COUNT'], ascending=[True, False]) \
            .set_index(['Continents','% Renewable']).COUNT)

Continents     % Renewable     
Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1
Australia      (2.212, 15.753]     1
Europe         (15.753, 29.227]    3
               (29.227, 42.701]    2
               (2.212, 15.753]     1
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
Name: COUNT, dtype: int64

【讨论】:

  • 谢谢,但有没有办法让它降序排列?就像在亚洲的情况下,第 4 行是第一行,第 1 行是下一行。
  • 抱歉,为此但我需要将亚洲 4 和亚洲 1 放在一起,即第一行是 asia 的值 4 然后 1 ,然后是 Australia 的行值 3 然后 2 然后 1 . 基本上索引在排序 az 中优先,然后是之后的值。
  • 感谢大家的帮助!得到了答案!