[Week 7- Day 1] 회고

About Me/AI Tech

[Week 7- Day 1] 회고

green_ne 2022. 3. 1. 00:00

# 오늘 한 실험 기록

vgg16_bn 은 vgg16을 쓰는 것보다 훨씬 좋다고 하는데, 여기서 bn 뜻이 batch normalization 이라고 한다. 더 성능이 좋은 이유가 outliar를 예방할 수 있으니까.
PCA로 30, 90000개 성분으로 차원축소 했는데, 성능이 엄청 낮게 나왔다.. 마스크 부분만 잘 탐색할 수 있을 거라 생각했는데..

# Weight init

https://stackoverflow.com/questions/49433936/how-to-initialize-weights-in-pytorch

How to initialize weights in PyTorch?

How to initialize the weights and biases (for example, with He or Xavier initialization) in a network in PyTorch?

stackoverflow.com

nn.init.xavier_uniform_(model.fc.weight)

stdv = np.sqrt(1 / model.fc.in_features) # dense-layer bias 값
model.fc.bias.data.uniform_(-stdv, stdv)

# Weight freeze

model Task와 우리가 하려는 Task가 같다면 weight freeze를 수행할 수 있다.

model.features.requires_grad_(False)

# input Dimension reduction

to reduce the number of dimensions to a reasonable amount

data 분포에 따라 선택

( if the number of features is very high, PCA or TruncatedSVD 추천 )

t-SNE (T-distributed Stochastic Neighbor Embedding) (ex. 손글씨 분류)
- to visualize high-dimensional data
- similarities between data points to joint probabilities
- to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data
- has a cost function that is not convex
PCA (Principal component analysis)
- for dense data
- Linear dimensionality reduction using Singular Value Decomposition of the data
- input data is centered
- depending on the shape of the input data and the number of components to extract
TruncatedSVD (for sparse data)

from sklearn.manifold import TSNE
from sklearn.decomposition import PCA

pca = PCA(n_components=n_components, svd_solver='randomized', whiten=True).fit(imgs)
eigenfaces = pca.components_.reshape((n_components, h, w))
img_pca = pca.transform(imgs)

tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(img_pca)

# SOTA.csv 와 my_result.csv 비교

import pandas as pd
from sklearn.metrics import accuracy_score

pred = pd.read_csv('./my_result.csv')
sota = pd.read_csv('./SOTA.csv')

accuracy_score(list(pred.ans), list(sota.ans))

## 피어 세션

다들 불균형 데이터에 대해 data augmentation 을 하는 데에 몰두하는 분위기 였다.

PCA를 수행하는 과정에서 오류를 굉장히 많이 접했는데, 이를 나누고 고쳐나갔다.

근데 지금 와 생각해보니 이미지 데이터가 아닌 table형태의 데이터에 더 어울리는 처리 방식이고, 사용하게 되면 적합 모델이 따로 있는 것 같다. 내일은 이걸 찾아봐야겠다.

애쓴 것과는 달리 성능이 굉장히 낮았는데, bast parameter나 이미지 분류에 쓰인 example, 적합한 모델 등을 찾아봐야 겠다.

논문을 찾아보고 읽는 것으로 어떻게 활용할 지 생각하는 게 필요하겠다는 생각을 했다.

또 데이터에 관심을 가지고 계속 뜯어보는 것도 필요하겠다 생각했다.