Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

Charming ['ㅡ'] Ham !

Python |구조화된 데이터(딕셔너리, 판다스, 데이터프레임) 본문

지식 정보 공유/코딩 : Coding

Python |구조화된 데이터(딕셔너리, 판다스, 데이터프레임)

Charming_ham 2021. 1. 15. 14:19

728x90

구조화된 데이터¶

데이터 내부에 자체적인 서브 구조를 가지는 데이터. 테이블(table) 형태로 전개된다. 매핑(mapping), 연관배열(associative array) 라고도 하며, 파이썬에서는 딕셔너리 형태로 {key : value} 의 형태를 말합니다.

In [1]:

# 딕셔너리 데이터
Country_PhoneNumber = {'Korea': 82, 'America': 1, 'Swiss': 41, 'Italy': 39, 'Japan': 81, 'China': 86, 'Rusia': 7}

# 키 값을 통한 데이터 조회
Country_PhoneNumber['Korea']

Out[1]:

In [2]:

# 게임 플레이어가 상자를 열었을 때 획득 물품을 보여주는 함수 만들어보기


# 상자 속 데이터 정의
treasure_box = {'rope':2, 
                'apple':10, 
                'torch': 6, 
                'gold coin': 50, 
                'knife': 1, 
                'arrow': 30}


def display_stuff(treasure_box):
    print("Congraturation!! you got a treasure box")
    
    # 상자속 데이터를 키 값과 데이터 값으로 반복
    for k, v in treasure_box.items():
        print("you have {} {}pcs".format(k, v))

display_stuff(treasure_box)

# 얻은 물품을 팔았을 때 벌 수 있는 돈을 보여주는 함수

coin_per_treasure = {'rope':1,
        'apple':2,
        'torch': 2,
        'gold coin': 5, 
        'knife': 30,
        'arrow': 1}

def total_silver(treasure_box, coin_per_treasure):
    total_coin = 0
    for treasure in treasure_box:
        coin = coin_per_treasure[treasure] * treasure_box[treasure]
        print("{} : {}coins/pcs * {}pcs = {} coins".format(
          treasure, coin_per_treasure[treasure], treasure_box[treasure], coin))
        total_coin += coin
    print('total_coin : ', total_coin)
total_silver(treasure_box, coin_per_treasure)

Congraturation!! you got a treasure box
you have rope 2pcs
you have apple 10pcs
you have torch 6pcs
you have gold coin 50pcs
you have knife 1pcs
you have arrow 30pcs
rope : 1coins/pcs * 2pcs = 2 coins
apple : 2coins/pcs * 10pcs = 20 coins
torch : 2coins/pcs * 6pcs = 12 coins
gold coin : 5coins/pcs * 50pcs = 250 coins
knife : 30coins/pcs * 1pcs = 30 coins
arrow : 1coins/pcs * 30pcs = 30 coins
total_coin :  344

In [3]:

# 간단히 처음부터 하나의 데이터에 저장
treasure_box = {'rope': {'coin': 1, 'pcs': 2},
                'apple': {'coin': 2, 'pcs': 10},
                'torch': {'coin': 2, 'pcs': 6},
                'gold coin': {'coin': 5, 'pcs': 50},
                'knife': {'coin': 1, 'pcs': 30}}

treasure_box['rope']

Out[3]:

{'coin': 1, 'pcs': 2}

In [4]:

# 위 데이터를 통해 앞서 만들어본 함수 만들어보기

# 물품상자에서 물품을 획득
def display_stuff(treasure_box):
    
    print("Congraturation!! you got a treasure box!!")
    for treasure in treasure_box:
             print("You have {} {}pcs".format(treasure, treasure_box[treasure]['pcs']))

display_stuff(treasure_box)

Congraturation!! you got a treasure box!!
You have rope 2pcs
You have apple 10pcs
You have torch 6pcs
You have gold coin 50pcs
You have knife 30pcs

In [5]:

# 획득한 물품을 얼마에 팔 수 있는지 알려주는 함수
def total_silver(treasure_box, coin_per_treasure):
    
    total_coin = 0
    for treasure in treasure_box:
        coin = coin_per_treasure[treasure] * treasure_box[treasure]['pcs']
        print("{} : {}coins/pcs * {}pcs = {} coins".format(
          treasure, coin_per_treasure[treasure], treasure_box[treasure]['pcs'], coin))
        total_coin += coin
    print('total_coin : ', total_coin)
  
total_silver(treasure_box, coin_per_treasure)

rope : 1coins/pcs * 2pcs = 2 coins
apple : 2coins/pcs * 10pcs = 20 coins
torch : 2coins/pcs * 6pcs = 12 coins
gold coin : 5coins/pcs * 50pcs = 250 coins
knife : 30coins/pcs * 30pcs = 900 coins
total_coin :  1184

Pandas¶

판다스는 구조화된 데이터를 table(표) 형태로 나타내기 위한 라이브러리로, 다음과 같은 특징이 있다.

넘파이 기반으로 개발되어 넘파이를 사용하는 어플리케이션을 쉽게 사용가능
축의 이름에 따라 데이터 정렬이 가능
다양한 방식으로 인덱스(index)하여 데이터를 다룰 수 있다.
통합된 시계열 기능과 시계열 데이터, 비시계열 데이터를 함께 다룰 수 있는 통합 자료구조
누락된 데이터 (null값) 처리에 용이
데이터베이스처럼 데이터 병합, 관계연산 수행 가능

판다스 설치¶

pip install pandas

Series¶

구조화 데이터를 표현하는 중요한 개념으로, 객체를 담을 수 있는 1차원 배열과 유사한 자료구조. 리스트, 튜플으로 만들 수 있으며, 넘파이 자료형으로도 만들 수 있다.

In [6]:

# pandas 가져오기

import pandas as pd

# 시리즈 자료형 생성
ser = pd.Series(['a', 'b', 'c', 3])
print(ser)


# 시리즈 자료형의 인덱스 값과 벨류 값
# 인덱스는 결과를 봤을 때, 왼쪽은 순서를 의미, 벨류는 자료 값을 의미

# 벨류 값 확인
# 벨류 값은 여러개이기 때문에 values 로, s 를 꼭 붙여준다.
print(ser.values)
print("-"*50)


# 인덱스 값 확인
print(ser.index)
print("-"*50)


# 인덱스 설정 : 시리즈 데이터의 인자로 넣어주기
# pd.Series([데이터 값], index = [인덱스값]) 의 서식으로 사용

ser2 = pd.Series(['a', 'b', 'c', 3], index = ['i', 'j', 'k', 'h'])
print(ser2)
print("-"*50)


# 설정된 인덱스 변경

ser2.index = ['Jhon', 'Steve', 'Jack', 'Bob']
print(ser2)
print("-"*50)


# 변경된 인덱스 확인
print(ser2.index)
print("-"*50)

# 딕셔너리 타입의 데이터를 시리즈 형태로 나타내기
# 딕셔너리의 키 값이 인덱스로, 벨류값이 데이터 값으로 저장된다.

Country_PhoneNumber = {'Korea': 82, 'America': 1, 'Swiss': 41, 'Italy': 39, 'Japan': 81, 'China': 86, 'Rusia': 7}
ser3 = pd.Series(Country_PhoneNumber)
print(ser3)
print("-"*50)


# 인덱싱
print(ser3['Korea'])
print("-"*50)

# 슬라이싱
print(ser3['Italy':])
print("-"*50)


# 시리즈 객체와 인데스의 name 속성
# 판다스의 데이터프레임에서 유용하게 쓰이며, 컬럼명으로 사용됨

ser3.name = 'Country_PhoneNumber'
ser3.index.name = 'Country_Name'
print(ser3)

0    a
1    b
2    c
3    3
dtype: object
['a' 'b' 'c' 3]
--------------------------------------------------
RangeIndex(start=0, stop=4, step=1)
--------------------------------------------------
i    a
j    b
k    c
h    3
dtype: object
--------------------------------------------------
Jhon     a
Steve    b
Jack     c
Bob      3
dtype: object
--------------------------------------------------
Index(['Jhon', 'Steve', 'Jack', 'Bob'], dtype='object')
--------------------------------------------------
Korea      82
America     1
Swiss      41
Italy      39
Japan      81
China      86
Rusia       7
dtype: int64
--------------------------------------------------
82
--------------------------------------------------
Italy    39
Japan    81
China    86
Rusia     7
dtype: int64
--------------------------------------------------
Country_Name
Korea      82
America     1
Swiss      41
Italy      39
Japan      81
China      86
Rusia       7
Name: Country_PhoneNumber, dtype: int64

In [7]:

# 판다스의 배열과 데이터 프레임

# 판다스의 데이터 구조
# 배열로 나타나며, 딕셔너리 구조로 이루어져 있다.

data = {'Region' : ['Korea', 'America', 'Chaina', 'Canada', 'Italy'],
        'Sales' : [300, 200, 500, 150, 50],
        'Amount' : [90, 80, 100, 30, 10],
        'Employee' : [20, 10, 30, 5, 3]
        }
s = pd.Series(data)
print(s)
print("-"*50)


# 위 데이터를 테이블 구조로 변환
# 데이터프레임 사용

s = pd.DataFrame(data)
print(s)
print("-"*50)


# 데이터프레임에서 컬럼 값에 접근
# .columns 매소드 사용

print(s.columns)
print("-"*50)


# 데이터프레임에서 인덱스 값 접근
# .index 매소드 사용

print(s.index)
print("-"*50)


# 인데스, 컬럼값 변경

s.index=['one','two','three','four','five']
s.columns = ['a','b','c','d']
print(s)
print("-"*50)

Region      [Korea, America, Chaina, Canada, Italy]
Sales                      [300, 200, 500, 150, 50]
Amount                        [90, 80, 100, 30, 10]
Employee                         [20, 10, 30, 5, 3]
dtype: object
--------------------------------------------------
    Region  Sales  Amount  Employee
0    Korea    300      90        20
1  America    200      80        10
2   Chaina    500     100        30
3   Canada    150      30         5
4    Italy     50      10         3
--------------------------------------------------
Index(['Region', 'Sales', 'Amount', 'Employee'], dtype='object')
--------------------------------------------------
RangeIndex(start=0, stop=5, step=1)
--------------------------------------------------
             a    b    c   d
one      Korea  300   90  20
two    America  200   80  10
three   Chaina  500  100  30
four    Canada  150   30   5
five     Italy   50   10   3
--------------------------------------------------

728x90

저작자표시 비영리 변경금지

'지식 정보 공유 > 코딩 : Coding' 카테고리의 다른 글

Python \| 시계열 데이터 시각화 (0)	2021.01.21
Python \| 데이터 시각화하기 / Visualization (0)	2021.01.20
Python \| CSV 파일과 CSV파일 읽고, 쓰기 (0)	2021.01.15
Python \| XML 파일 읽기, 쓰기 (0)	2021.01.14
Python \| Json 파일 읽기, 쓰기 (0)	2021.01.13

'지식 정보 공유/코딩 : Coding' Related Articles

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31