Study Log (2021.02)

5 minute read

2021-02-28

  • 바닥부터 배우는 강화학습
      1. 알파고와 MCTS
        • 10.1 알파고
        • 10.2 알파고 제로
      1. 블레이드 & 소울 비무 AI 만들기
        • 11.1 블레이드 & 소울 비무
        • 11.2 비무에 강화학습 적용하기
        • 11.3 전투 스타일 유도를 통한 새로운 방식의 Self-Play 학습

2021-02-27


2021-02-26


2021-02-25


2021-02-24


2021-02-23

  • 바닥부터 배우는 강화학습
      1. Deep RL 첫 걸음
        • 7.1 함수를 이용한 근사
        • 7.2 인공 신경망의 도입
      1. 가치 기반 에이전트
        • 8.1 밸류 네트워크의 학습

2021-02-22

  • 수학으로 풀어보는 강화학습 원리와 알고리즘
    • 1장. 강화학습 수학
      • 1.4 가우시안 분포
      • 1.5 랜덤 시퀀스
        • 1.5.1 정의
        • 1.5.2 평균함수와 자기 상관함수
        • 1.5.3 마르코프 시퀀스
      • 1.6 선형 확률 차분방정식
      • 1.7 표기법
      • 1.8 중요 샘플링
      • 1.9 엔트로피
      • 1.10 KL 발산
      • 1.11 추정기
        • 1.11.1 최대사후 추정기
        • 1.11.2 최대빈도 추정기
      • 1.12 벡터와 행렬의 미분
        • 1.12.1 벡터로 미분
        • 1.12.2 행렬로 미분

2021-02-21

  • 바닥부터 배우는 강화학습
      1. MDP를 모를 때 최고의 정책 찾기
        • 6.3 TD 컨트롤 2 - Q러닝
  • 수학으로 풀어보는 강화학습 원리와 알고리즘
    • 1장. 강화학습 수학
      • 1.1 확률과 랜덤 변수
        • 1.1.1 확률
        • 1.1.2 랜덤 변수
        • 1.1.3 누적분포함수와 확률밀도함수
        • 1.1.4 결합 확률함수
        • 1.1.5 조건부 확률함수
        • 1.1.6 독립 랜덤 변수
        • 1.1.7 랜덤 변수의 함수
        • 1.1.8 베이즈 정리
        • 1.1.9 샘플링
      • 1.2 기댓값과 분산
        • 1.2.1 기댓값
        • 1.2.2 분산
        • 1.2.3 조건부 기댓값과 분산
      • 1.3 랜덤벡터
        • 1.3.1 정의
        • 1.3.2 기댓값과 공분산 행렬
        • 1.3.3 샘플 평균

2021-02-20


2021-02-18

  • 바닥부터 배우는 강화학습
      1. MDP를 모를 때 밸류 평가하기
        • 5.1 몬테카를로 학습
        • 5.2 Temporal Difference 학습
        • 5.3 몬테카를로 vs TD
        • 5.4 몬테카를로와 TD의 중간?

2021-02-17

  • 바닥부터 배우는 강화학습
      1. MDP를 알 때의 플래닝
        • 4.1 밸류 평가하기 - 반복적 정책 평가
        • 4.2 최고의 정책 찾기 - 정책 이터레이션
        • 4.3 최고의 정책 찾기 - 밸류 이터레이션

2021-02-16


2021-02-15

  • 바닥부터 배우는 강화학습
      1. 강화학습이란
        • 1.1 지도학습과 강화학습
        • 1.2 순차적 의사결정 문제
        • 1.3 보상
        • 1.4 에이전트와 환경
        • 1.5 강화학습의 위력
      1. 마르코프 결정 프로세스 (Markov Decision Process)
        • 2.1 마르코프 프로세스 (Markov Process)
        • 2.2 마르코프 리워드 프로세스 (Markov Reward Process)
        • 2.3 마르코프 결정 프로세스 (Markov Decision Process)
        • 2.4 Prediction과 Control

2021-02-09

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
  • 팡요랩

2021-02-08

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
  • 팡요랩

2021-02-07

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
  • 팡요랩
  • Cross Entropy 관련

2021-02-06

  • S-K RL
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
  • 팡요랩

2021-02-05

  • S-K RL
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)

2021-02-04

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
  • 팡요랩

2021-02-03

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
  • 팡요랩

2021-02-02

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)

2021-02-01

  • S-K RL
    • train_FT10_ppo_node_only.py
      • do_simulate_on_aggregated_state()
      • value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
      • eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
      • val_performance = validation(agent, path, mode=’node_mode’)
    • SBJSSP_report_results.ipynb
      • def get_swapping_ops(blocking_op, machine_dict)
      • class blMachine(Machine)
      • class blMachineManager(MachineManager)
      • class blSimulator(Simulator)
      • def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
      • def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
      • def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
      • def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
      • def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
      • def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
      • def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
      • def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
      • def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
  • 팡요랩

Template

Updated:

Comments