Stories by Seonglae Cho on Medium

트랜스포머 리버스 엔지니어링으로 In-context Learning 이해하기

Seonglae Cho — Sun, 30 Jun 2024 08:00:45 GMT

트랜스포머 리버스 엔지니어링으로 In-context Learning 발생 이해하기

지난 4월 글에서 예상한 것처럼 Anthropic과 OpenAI 각각 두 달이 지나지 않아 대표 플래그십 모델의 뉴런 분석을 진행했는데요. 오늘은 그 연구들의 기반인, 트랜스포머 모델을 이해하려는 의미있는 시도들에 대해 알아보겠습니다.

Anthropic 발표로 발등에 불 떨어진 OpenAI도 얼른 GPT4의 기능 분석을 발표했습니다

ChatGPT는 똑똑한 만큼 답답하게 작동할 때도 많죠. LLM으로 많은 걸 할 수 있을 것 같지만, 막상 테스트를 돌려보면 간단한데도 못하는 작업들이 많습니다. 이런 일은 왜 일어날까요? 이는 LLM이 너무 사람과 닮아 있다 보니, LLM의 지능이 ‘사람과 동일한 형태의 지능’ 이라고 착각해서 문제가 발생합니다. 인공신경망은 분명히 인간의 신경망에서 영감을 받았지만, 동시에 트랜스포머는 어텐션 메커니즘이라는 완전히 다른 연산을 바탕으로 추론을 수행합니다.

트랜스포머의 작동 방식을 구성 요소에 따라 기능을 이해한다면, 우리가 하이레벨 응용에서도 의도대로 작동하는, 정확한 아이디어를 얻는 데에 유리할 텐데요. 그리고 이렇게 신경망 연산을 리버스 엔지니어링하여 분석하려는 접근 방식을 기계적 해석가능성(Mechanistic Interpretability)이라고 부릅니다. 오늘 아티클에서 이 분야가 LLM의 핵심 기능 중 하나인 In-context Learning이 트랜스포머에 어떻게 발생하는지 분석할 결과를 상변화(Phase change)와 기능 차원수(Feature dimensionality) 라는 개념을 결합하여 소개해보겠습니다.

Residual Stream

대부분 LLM의 기본이 되는 자기회기(Autoregressive) 트랜스포머의 구조를 간략히 표현하면, 먼저 토큰을 임베딩한 뒤 여러 연산을 적용하고, 마지막에 다시 텍스트로 역임베딩(un-embedding)하여 다음 토큰을 예측하는데요. 바꿔 말해 트랜스포머는 임베딩과 역임베딩 사이에서 어텐션 연산, 선형 변환 그리고 활성화 연산을 적용합니다.

여기서 잔차 흐름(residual stream)이라는 개념이 등장합니다. 잔차 흐름이란 트랜스포머에서 잔차 연결(residual connection)로 이어지는 어텐션과 활성화에 의해 업데이트되는 embedding 차원에서 벡터의 흐름을 의미하는데요. Embedding 과 un-embedding 단계 사이, 트랜스포머는 어텐션 연산과 MLP(Multi-Layer Perceptron) 연산이 반복적으로 잔차 흐름을 업데이트해 결과적으로 추론 능력을 가지게 됩니다. 이 이해의 바탕에서 트랜스포머는 유일한 벡터의 통로인 잔차 흐름을 따라 임베딩 차원이라는 메모리 대역폭 내에서 토큰 분포를 업데이트하며 최종적으로 다음 토큰을 예측합니다.

모델의 레이어가 커질수록 보틀넥의 영향은 심해지는데요. 예를 들어, 단일 레이어 트랜스포머는 잔차 흐름 차원의 4배에 해당하는 뉴런들과 통신하고, 50개의 레이어를 가지는 트랜스포머는 100배에 달하는 뉴런과 통신합니다 (동일하게 superposition hypothesis는 적용됩니다). 또한 잔차 흐름에서 대역폭을 최대한 활용하려는 수요에 기반하여 MLP 뉴런과 아래에서 설명할 어텐션 헤드 중 일부가 “메모리 관리” 역할을 수행하는 것 또한 발견되죠.

Attention head

지난 포스트에서 MLP 레이어의 활성화를 단일 의미 단위로 기능를 분리할 수 있던 것처럼, 이번에는 어텐션 레이어가 무슨 역할을 하는 지에 대하여 알아보겠습니다. 흥미롭게도, 어텐션 레이어의 개별 어텐션 헤드가 토큰 사이 정보를 이동시키고, 잔차 흐름에 독립적인 기여를 한다고 밝혀지는데요. 수학적 등치에 기반해 계산 효율성을 위해 도입된 멀티 헤드 어텐션의 개별 헤드가 각자의 역할을 수행한다는 건 짚고 넘어갈 만한 부분입니다.

Copying head

우선 어텐션 헤드의 종류 중 가장 간단한 복사 헤드(copying head) 타입부터 알아봅시다. 복사 헤드란, 말 그래도 문맥 내부에서 이전에 등장했던 토큰을복사하듯이, 비슷하거나 동일한 시퀀스에서 해당 토큰의 등장 확률을 높이는 역할을 하는데요. 어떻게 개별 어텐션 헤드가 토큰 예측에 영향을 미치는 지는, 아래에 설명할 헤드의 OV 연산에서 복사 헤드 회로의 고유값(eigenvalue)이 항상 양수라는 것으로 증명합니다.

복사 헤드 작동 예시

Induction head

복사 헤드의 부분 집합인 인덕션 헤드(induction head)는 나이브한 복사 헤드같이 단순히 동일한 토큰을 복사하는 것을 넘어, 비슷한 패턴에서 특정 패턴을 완성하는 역할을 하는데요. 그리고 바로 이 인덕션 헤드의 다양한 패턴 매칭 능력이 모여 다중 레이어 트랜스포머에서 In-context learning 능력으로 발현됩니다.

복사 헤드는 단일 레이어 트랜스포머에서부터 나타나지만 In-context learning과 인덕션 헤드는 다중 레이어 트랜스포머에서만 발생하였습니다.

놀랍게도, 하나의 인덕션 헤드가 번역을 담당하는 사례도 발견되었습니다. 단일 인덕션 헤드가 단순한 작업 뿐만 아니라 복잡하거나 정교한 작업에서도 작동한다는 증거인데요. 정확히 말해서, 인덕션 헤드가 독립적으로 이전 소스 토큰에 관여(attending)하여 번역 기능이 발현됩니다.

위 그림에서 보여주는 바와 같이, 번역 인덕션 헤드의 어텐션 패턴이 어순에 따라 물결치며 따라갑니다. 구체적으로 예시 문장을 살펴 보았을 때, 영어와 프랑스어 temple이 빨갛게 하이라이팅되는 소스 토큰일 때, 추론 시점은 독일어 Tempel의 이전 토큰인 te인데요. 이는 트랜스포머가 예측을 위해 비슷한 패턴을 찾아, 이전 시점에 다음 토큰을 유도(induction)할 확률이 높이기 때문입니다.

적은 샘플이지만 레이어 깊이와 일반화 능력에 연관성은 보이지 않았습니다

위 테이블에서는 레이어 깊이가 40인 언어모델은 다양하게 복사 헤드와 패턴 매칭 헤드도 가지고 있다고 관측됩니다. 그렇다면 다양한 어텐션 헤드들이 내부 연산을 통해 어떻게 최종적으로 로짓(logit)에 영향을 주게 되는지, 그 과정에 대해 알아봅시다.

Mathematical approach

트랜스포머의 구조는 선형 변환과 활성화의 반복으로 대부분 이루어져 있죠. 더 단순화한다면 활성화 없이 어텐션만 가지는 트랜스포머를 수학적으로 모델링할 수 있습니다. 그렇다면 어텐션 블락을 몇 개의 행렬연산으로 간단하게 표현해, 어텐션 헤드 행렬로 어턴션 헤드가 어떻게 토큰에 관여(attending) 하는지를 수식화할 수도 있고요. 그 수학적 표현을 통해, 위에서 언급한 복사 헤드와 인덕션 헤드의 역할을 연산의 관점해서 분석해 봅시다.

어텐션 메커니즘의 4가지 가중치 행렬의 특성에 따라 QK 회로와 OV 회로, 두가지 회로(circuit)로 분리할 수 있는데요. 회로란, 신경망 안의 가중치들이 조합되어 고려될 때 신경망의 계산 상 서브 그래프(computational subgraph)를 회로(circuit)라고 부릅니다. 그리고 어텐션 헤드의 연산을 분리하면 QK 회로와 OV 회로라는 독립적인 연산으로 해석될 수 있죠. 위 그림은 하나의 인덕션 헤드가 소스 토큰(source token)과 목적 토큰(destination token) 사이 연산에 관여되는 회로를 나타냅니다.

Tensor Product representation

위 그림에서 회로 연산을 나타내는 행렬을 이해의 편의성을 위해 특정 토큰에 대한 연산만 표현했습니다. 하지만 우리는 어텐션 헤드 전체의 연산을 포괄적으로 표현하기 위해서 텐서 곱(Kronecker product) 연산자가 필요하죠. 토큰 별 벡터 연산을 종합해 어텐션 메커니즘을 아래와 같이 간단하게 표현할 수 있습니다.

위 수식을 풀어서 설명하자면, 이전 레이어의 결과값인 x에서부터 해당 레이어 결과값인 h(x)가 되는 과정이 어텐션 분포인 어텐션 패턴(attention pattern) A에 OV 가중치 행렬의 텐서곱이죠. 여기서 어텐션 패턴 A는 잔차 흐름 이전 결과값에 QK 회로를 지난 어텐션 스코어(attention score)에 소프트맥스(softmax)가 적용된 값입니다.

여기서 중요한 건 두 독립적인 회로를 따로 해석할 때 효과적이라는 점입니다. QK 행렬과 OV 행렬이 항상 같이 선형적으로 같이 연산되기 때문에 묶어서 고려될 때 추상화하여 이해에 더 편리하죠.

QK Circuit

아래와 같이 QK 회로는 얼마나 어텐션 헤드의 어텐션 패턴(attention pattern)이 계산되는 지를 나타냅니다. 이 때 QK 회로는 자기회기적이기 때문에 어텐션 패턴 계산 당시에는 붉은 색의 관여되는 토큰(attended token)이 무시되고 목적지 토큰(destination token)의 쿼리에 따라 다양하게 유사한 이전 토큰들(preceding tokens)을 고려합니다.

인덕션 헤드는 이전 헤드로 인해 변형된 키 스페이스에 기반해 패턴을 매칭합니다.

OV Circuit

이후에 OV 회로는 이전 토큰들 (preceding tokens)을 종합하여 어텐션 결과(attention output)에 관여하는 정도를 계산합니다. 이 연산이 의미하는 바는 패턴 매칭이 일어나서 실제 관여되는 토큰(attended token)을 해당 위치로 복사하는 실질적 역할을 수행합니다. 이 근거로, 복사 헤드의 OV 회로의 고유값(eigenvalue)이 항상 양수로 관측되고 이로 인해 토큰의 선형 조합으로 이루어진 회로가 양의 고유값으로 인해 로짓(logit) 그리고 결과적으로 해당 토큰 에 대한 확률을 상승시킵니다.

Q-Composition, K-Composition

하지만 위에서 언급했듯이, 인덕션 헤드는 단일 레이어 모델에서 발생하지 않습니다. Q-, K-합성 (composition)이라는 멀티 레이어 트랜스포머에서 레이어 사이 어텐션 헤드간 회로의 합성에서 발생하죠. 이 중 Q-, K-합성의 효과로 인한 키와 쿼리 스페이스의 변화 덕분에 단순히 복사를 넘어서 어텐션 패턴을 파악하고 더 복잡한 패턴을 완성시킬 수 있는 능력이 생깁니다.

Phase Change

위 분석이 아니더라도 In-context learning이 패턴 매칭이라는 사실은 잘 알려져 있죠. 하지만 좀 더 정확히는, in-context learning이 인덕션 헤드의 패턴 완성 능력에 의해 발생하고, 그 인덕션는 헤드는 여러 헤드의 회로상 상호작용에서 일어납니다. 그리고 단일 레이어 트랜스포머에서는 그 능력이 발현되지 않는데요. 그렇다면 학습 도중 인덕션 헤드가 어떻게 생겨날까요? 놀랍게도 이는 특정 기간에 의존적으로 발생합니다.

인덕션 헤드가 없는 모델에서는 상변화가 일어나지 않습니다.

위 도식에서는 학습이 진행되며 특정 구간 내에서 인덕션 헤드가 늘어남과 동시에 in-context learning 점수가 증가하는 것을 보여 줍니다. 동시에 In-context learning 점수가 손실(loss)에 바로 직결된다는 점을 증명하죠. 모델이 학습될 때 손실 곡선에서 유일하게 컨벡스(convex)가 아닌 지점이 바로 이 in-context learning 능력이 향상될 때와 정확히 일치합니다! 그리고 이 지점을 상변화(Phase Change) 구간이라고 부릅니다.

인덕션 헤드를 만들 수 있는 모델은 손실 곡선에서 명확한 컨벡스 구간이 나타납니다.

Feature Dimensionality

잠깐 이전 포스트에서 다룬 신경망의 기능(feature) 이야기로 돌아가 봅시다. 우리는 신경망이 여러 중첩(superposition)되어 있는 기능의 조합으로 작동한 다는 것을 알고 있습니다. 아래 그림은 2개씩 연관된 기능(correlated feature) 4개를 2개의 평면 차원에 학습시켜 강제로 기능을 중첩시킨 예시인데요. 여기서 이 기능의 수를 3쌍으로 늘리면 맨 아래와 같이 6각형으로 연관도에 따라 배치됩니다. 이렇게 기능을 차원이 공유한다는 이해를 바탕으로 하나의 차원을 못쓰는 기능이 차원의 일부를 얼마나 차지하는지를 기능 차원수(Feature Dimensionality)로 수치화하여 정의할 수 있습니다.

그런데 이 기능 차원수(Feature Dimesnionality)가 위에서 말한 상변화(Phase Change)와 무슨 상관일까요? 소름돋는 건 모델이 학습되면서 얻는 기능들이 차원에 분배되는 상변화가 발생한다는 사실입니다.

Loss & Phase Change

위 그래프는 임의로 초기화된 가중치가 학습되는 과정을 보여주는데요. 그런데 각각의 기능들이 학습하면서 즉 손실이 줄어들면서 동시에 기능들의 기능 차원수(Feature Dimensionality)가 하나로 수렴합니다. 즉 손실의 유의미한 변화가 기능 차원수(Feature Dimensionality)의 수렴과 함께 일어납니다. 이는 효율을 위해 기능이 각각의 차원으로 적절하게 분배된다고 이해할 수 있는데, 이 과정은 흥미롭게도 에너지 레벨 도약(energy level jump)와 유사하게 작동합니다.

이는 위 손실 곡선 시각화에서 더 명확하게 확인할 수 있는데요. 학습 과정을 보여주는 손실 곡선이 기능의 가중치 경로를 통해 동시에 일어나는 변화를 보여주고 있습니다.

위 두가지 기능 차원수(Feature Dimensionality)의 수렴으로 인한 상변화와 인덕션 헤드로 인한 상변화(Phase change)의 관계가 직접적으로 밝혀지진 않았지만, 여러 논문을 통해 분명히 유사성을 확인할 수 있습니다. 더불어, 여기서 Anthropic이 검증하고 강조한 바는 단일 의미의 차원에서 다중 의미의 뉴런으로 변화함에 따라, 뉴런의 희소성(sparsity)이 증가한다는 사실인데요. 이전 포스트에서 설명한 것과 같이 이 희소성이 주는 이점과 두 상변화 사이의 깊은 연관성으로 추측해 보건데, 학습 과정 중 상변화가 모델 그로킹(Grokking)의 핵심 요소임을 알 수 있습니다.

Pretraining & Finetuning

이런 개념적 이해를 바탕으로 기존의 사전 학습(pre tranining)과 미세 조정(fine tuning)의 애매한 정의를 공학적으로 정확하게 재정의해보는 건 어떨까요? 사전 학습을 Mechanistic Interpretability관점에서 정의한다면, 인공싱경망이 추상화된 기능을 데이터에서 추출하여 각 뉴런에 분배하는 과정이라고 합리적으로 추상화할 수 있겠네요. 언어 모델의 경우 대량의 언어 데이터를 통해 모델에 기능이 분배되며 데이터에 대한 이해를 기능으로 구축합니다.

미세 조정(fine tuning)은 뉴런에 분배되어 있는 기능의 강도를 입맛에 맞게 조절하는 것으로 볼 수 있는데요. 이미 데이터에 이해 모델이 완성된 상태이기 때문에, 어떻게 이 파라미터를 조절해야 선하거나 악한 AI, 혹은 특정 작업에 최적화된 모델을 만들 수 있습니다. 언어모델에서는 instruction tuning으로 채팅 모델을 구축하거나, 답변 선호를 최적화하는 과정이 속합니다. 이들도 세부적으로는 정의하자면 해당 기능 벡터(feature vector) 차원의 활성화를 잘 조절하는 것을 선호 최적화(preference optimization)라고 한다면, 채팅 모델과 같이 적업 최적화(task optimization)를 위한 instruction tuning은 기능 벡터(feature vector)에서 항시 활성화되어야할 차원을 고정시켜두는 작업으로 볼 수 있습니다. 또한, 어텐션 기반의 모델의 경우 사전 학습은 인덕션 헤드의 생성과 in-context learning 능력의 발생을 포함합니다.

Conclusion

다양한 분석을 통해 우리는 상변화가 다양한 레벨에서 트랜스포머 모델의 성능을 향상시킨다는 것을 알아냈습니다. 구체적으로, 인덕션 헤드와 in-context learning은 트랜스포머의 어텐션 블락을 분석해냈고 단일 의미와 기능 차원수(Feature Dimensitonality)는 트랜스포머의 MLP 레이어를 분석해냈죠. 상변화 과정은 기능 차원수가 MLP 활성화 차원에 분배되는 것을 포함했고, 어텐션 블락에서는 인덕션 헤드의 생성으로 In-context learning의 일반화 능력을 생기게 했습니다. 심지어는 더 넓은 관점에서, 일부 연구는 상변화를 거대 언어 모델 능력의 그로킹의 시점으로 연관짓고 있습니다.

무언가를 이해한다는 건 개념적으로 일반화된 작동을 이해하는 탑다운(top-down) 방식 그리고 그 구조를 하나하나 분해하여 얻는 지식인 바텀업(bottom-up) 방식이 있다고 생각합니다. 둘 중에서 한 방식에만 의존하기보다는 양 쪽 관점에서 모두 경험이 있을 때 그 분야를 더 효율적으로 정밀한 이해에 다다른다고 저는 믿는데요.

우리는 트랜스포머를 다양한 레벨에서 접근하면서, 인공지능에 대한 깊은 이해를 얻어가고 있습니다. 인덕션 헤드에 대한 연구 기여와 인덕션 헤드를 통한 in-context learning은 일종의 바텀업 방식인데요. 마찬가지로, MLP 활성화를 기능 벡터로 분해하여 단일 의미의 기능을 얻어낸 최근 연구 또한 그런 맥락의 연구입니다. 이는 LLM에 대한 바텀업 방식의 연구가 최근 몇년간 빠르게 진행되고 있음을 보여줍니다.

Interpretable AI matters

스페이스 오디세이 2001에서 데이빗이 HAL 9000에게 그랬던 것처럼 또는 제가 AI의 성능을 향상시키기 위해 LLM 연구를 했을 때 마주한 문제처럼 인공지능을 잘못 이해하면 모델은 기대와 다르게 행동합니다. 의도대로 인공지능을 다루고 싶다는 점에서, 그리고 안전하게 인공지능을 사용하고 싶다는 점에서 해석 가능한 인공지능(Interpretable AI)은 인공지능 정렬 (AI Alignment)에서 핵심인 분야입니다. OpenAI와 Anthropic이 이 분야를 앞다투어 연구하는 이유도 트랜스포머 내부를 이해하는 것이 지금 AI 산업의 선두가 되는 것에 중요하기 때문입니다. 최근 Sonnet이 GPT4의 성능을 앞지른 것이 최근 Anthropic의 Sonnet의 내부 뉴런 기능 분석을 진행한 것과 무관하다고 저는 생각하지 않습니다. 이처럼 트랜스포머에 대한 깊은 이해는 곧바로 그 성능에 영향을 미치기 때문에 비영리 단체와 오픈소스 진영 또한 트랜스포머 내부 작동에서 인사이트를 얻을 것으로 예상합니다. 앞으로 많은 인공지능 응용에 대한 아이디어는 물론이고, 모델 설계 개선도 해석 가능성(interpretability)에 기반하여 제안될 것을 기대합니다.

마지막으로 글의 대부분이 Anthropic, LessWrong, 그리고 OpenAI의 연구에서 따왔고, 해당 커뮤니티에 감사를 전합니다. 저는 해석 가능한 인공지능(Interpretable AI) 커뮤니티에서 발전하고 있는 기계적 해석 가능성(Mechanistic Interpretability)에 대해 알리려 노력하고 있습니다. 운이 좋게도 올해 9월부터 런던에 있는 UCL 인공지능 석사 과정에 합격하여, 앞으로 더 다양한 글을 써볼 기회가 있을 것 같습니다. 이번 글은 생각보다 길어졌지만 다음 화에는 대형 언어 모델들의 기능 분석과 최근 OpenAI가 발표한 오픈소스 도구 TDB(Transformer Debugger)에 대해 더 흥미롭게 다뤄보겠습니다.

References

아래 세가지 논문에서 내용을 발췌하여 엮었습니다.

Reversing Transformer to understand In-context Learning with Phase change & Feature dimensionality

Seonglae Cho — Sun, 21 Apr 2024 16:57:54 GMT

ChatGPT is as smart as it is frustrating at times. Let’s analyze the reasons, continuing from the previous post.

LLM can’t see below tokenizer level

While insights suggest that LLMs can perform perfectly on some simple tasks, they often fail to answer even questions that are straightforward for humans. Why does this happen? This often stems from mistakenly equating LLM’s intelligence, which is fundamentally different, with human intelligence. Although transformers are indeed modeled after human neural networks, they operate on a completely different basic computing block called the attention mechanism.

If we separate the components of the transformer and understand the computations on a functional basis, we can gain more accurate insights for high-level applications. This approach to analyzing and reverse engineering the computing of neural networks is referred to as Mechanistic Interpretability. I will discuss how Anthropic AI has interpreted the core function of in-context learning in LLMs, which emerges from the attention mechanism, while also considering concepts like Phase change and Feature dimensionality.

Residual Stream

The basic structure of an Autoregressive Transformer involves initially embedding tokens and finally applying unembedding to perform next token prediction. In between, Attention mechanism and MLP activation occur.

The concept of a “residual stream” appears here. In a transformer, a residual stream refers to the flow of embeddings updated by the output of Attention block to residual connections. Essentially, between the embedding and unembedding stages, the transformer’s computing core, the Residual Stream, is repeatedly updated by Attention and MLP operations. Based on this understanding, the Transformer selects the next token within the memory bandwidth of the residual stream defined by the embedding dimensions, following the sole path of operations.

The larger the model’s layers, the more severe this bottleneck becomes. For example, at layer 25 of a 50 layer transformer, the residual stream contains 100 times more neurons than it has dimensions, trying to communicate with 100 times as many neurons. Perhaps due to this high demand on residual stream bandwidth, we have observed that some MLP neurons and attention heads may perform a kind of “memory management” role.

Attention head

In the last post, we discussed how MLP layers can be decomposed into mono-semantic features. This time, we’ll mainly explore the role of the Attention layer. Intriguingly, it has been discovered that individual attention heads in the Attention layer independently contribute to the residual stream mentioned above. It is impressive that each head in the multi-head attention, introduced for computational efficiency and mathematical equivalence, serves a distinct role, each presenting separate processing of information.

Copying head

Among the Attention heads, a type functioning similarly called the “copying head” has been identified. As the name suggests, a copying head increases the probability of previously appeared tokens within the context. This makes it possible for the attention head to attend to tokens based on what happened before them. How individual attention heads perform this copying function will be explained later through the positive eigenvalues of the copying matrix within the attention head.

Induction head

A subset of the copying head, the induction head, goes beyond simply copying tokens. Instead, it plays a role in completing patterns when similar representations are detected. Anthropic has precisely argued that numerous pattern matching events by induction heads lead to the emergence of in-context learning ability in multi-layer transformers.

In-context learning and induction heads manifest only in multi-layer transformers while the basic copying head appears in single-layer transformers.

Intriguingly, induction heads capable of performing translations have been identified. For instance, when the source token is ‘temple,’ the highlighted token is the one before the German ‘Tempel.’ This is because the pattern found in the translation increases the probability of inducing the next token. In relatively large models, it has been demonstrated that such patterns in sophisticated tasks like translation are also encompassed by the abilities of induction heads in the same manner.

Although these are limited examples, there seems not much correlation between layer depth and generality

This observation was made in a 40-layer language model, where the translation head operates alongside the copying and pattern-matching heads. Then, let’s delve deeper into the internal computing process that ultimately influences logits.

Mathematical approach

Analyzing the internal mechanisms of the copying head and the attention head could aid in understanding how they fulfill their roles. The structure of a Transformer is predominantly composed of relatively simple linear transformations and a number of activation functions. We can reduce the attention mechanism to a handful of matrix operations. Let’s mathematically express the computations of an induction head using the attention head matrix to attend to a token.

The diagram represents the circuit involved when a single induction head in a single-layer transformer computes the source and destination tokens. If we split the four weight parameters of the attention mechanism into two circuits, QK circuit and OV circuit. Here, a circuit, simply put, is when the weights within a neural network are combined and considered — it’s called a computational subgraph of a neural network. Anthropic has shown that the operations of an attention head can be interpreted as independent computations by these QK and OV circuits for the following reasons.

Tensor Product representation

The equation representing the circuit above is expressed in matrix multiplication for ease of understanding specific tokens. We can represent the entire operation of an attention head generally and simply by using a Tensor Product (Kronecker product) to perform vector per token operations, equivalent to the computing of the Attention mechanism.

We can easily express attention head operation h(x), which is the result of the previous layer x, as h(x) by tensor product the Value Weight and the Output-Value Weight matrix with the Attention Value.

Even when considering all Attention heads simultaneously, they can be expressed in the same form. The key here is that the QK circuit always operates together to compute the attention value, and the OV circuit operates together as well. Additionally, the QK circuit operates autoregressively, meaning the attended token is only ignored when calculating the attention pattern.

QK Circuit

The QK circuit computes the attention pattern for each attention head, essentially performing pattern matching.

OV Circuit

After pattern matching occurs, the OV circuit plays a role in copying the actual token to its position, as mathematically demonstrated by the positive eigenvalues of the copying matrix in the OV circuit within the attention head.

Q-Composition, K-Composition

However, as mentioned earlier, induction heads do not occur in a one-layer model. Q- and K-composition in a multi-layer transformer involve multiple layers of matrix combinations between attention heads. This Q- and K-composition not only facilitates simple copying but also affects the attention pattern, allowing attention heads to express much more complex patterns which emerge in induction heads.

Phase Change

It is well known that in-context learning fundamentally involves pattern matching. According to low-level causal research, in-context learning occurs as induction heads complete patterns. These induction heads arise from interactions within a circuit composed of multiple heads, which in single-layer models, does not manifest this capability. Then, how do induction heads develop during training? Surprisingly, the development of induction heads during training depends on a specific period in the learning process.

As learning progresses, the number of induction heads increases during a specific period, coinciding with a rise in the in-context learning score as illustrated in the above figure. Notably, this in-context learning score is directly correlated with the loss. The sole non-convex point on the curve occurs during the training of a language model, precisely when the capability for in-context learning enhances, as depicted in the figure below.

Feature Dimensionality

Let’s recall the neural network feature that was mentioned in a previous post. We know that neural networks operate as a combination of multiple superposed features. The figure below illustrates an example where four correlated features, grouped in pairs, are forced into superposition in two dimensions. If this is increased to three pairs, it results in a hexagonal arrangement at the bottom, organized according to their correlation. Here, since features share dimensions, we can mathematically define the fraction of dimension occupied by a feature as feature dimensionality.

What does feature dimensionality have to do with the phase change mentioned earlier? It is astonishing to note that as the model is trained and features are distributed across dimensions, a phase change also occurs, accompanied by a change in the loss curve.

Loss & Phase Change

The graph below illustrates the training process of weights that were initially randomized. As the training progresses and each feature reduces the loss, they tend to converge towards the same feature dimensionality. This can be understood as features being efficiently distributed across different dimensions, similar to an energy level jump.

It is observable that changes in loss coincide with the convergence of feature dimensionality. Additionally, the visualization below more clearly confirms this by linking the traditional loss curve, which represents the training steps, with the trajectory of feature weights to show the simultaneous changes.

Although a direct link between the phase change in feature dimensionality and the phase change in induction heads has not been explicitly proven, what Anthropic has validated is crucial: the most important aspect to focus on is the shift from mono-semantic to poly-semantic neurons as sparsity increases, which occurs concurrently with the phase change. However, the fact that the training process involves grokking and gaining generalization capabilities allows us to speculate a deep association or even a direct causal relationship between the two.

Pre-training & Fine-tuning

Given this understanding, how about we redefine the somewhat ambiguous definitions of Pre-training and Fine-tuning more precisely from an engineering perspective? If we define training from the perspective of interpretability, it can be described as the process by which artificial neural networks extract features from data and abstract separation in each neuron.

Pre-training involves abstracting each feature in terms of dimensionality, separating them, and distributing them across neurons. It can be viewed as the LLM constructing a world model through the method of feature distribution via language.

Fine-tuning, then, can be viewed as adjusting the intensity of the features distributed across neurons to one’s liking. Since a model of the world is already completed, this involves adjusting parameters to create models that are either benevolent or malevolent AIs or are optimized for specific tasks. Common processes such as instruction tuning or creating a chat model might fall into this category. If we classify more specifically, adjusting the activation of specific feature vectors could be termed preference optimization, while fixing certain features to be always active in tasks like instruction tuning or chat models can be considered task optimization.

In the case of Attention-based models, pre-training may involve creating an induction head with emerging in-context learning and fine-tuning. Furthermore, I have to mention again that the definitions provided here are based on my abstract categorization and have not been rigorously validated.

Conclusion

Through various analyses, we have discovered that phase changes enhance the performance of language models in a different context. More specifically, the induction head and in-context learning focus on the Attention Layer and feature dimensionality are centered on the MLP layer. The phase change process involves distributing features across dimensions, and at the Attention level, the generalization capabilities created as In-context learning through the emergence of induction heads. From a broader perspective, another study has linked these critical training phase changes in LLM ability with Grokking as a point of deep relativity.

To study and understand something, there are always two approaches: the Bottom-up approach and the Top-down approach. Conceptually, there is the top-down method that involves understanding generalized operations at the level of abstraction, and the bottom-up approach, which involves deconstructing the structure piece by piece or learning by actually operating it. As I studies more, relying solely on one approach over the other is less effective than having experience from both perspectives, which leads to a more precise understanding of the subject.

We need to approach the understanding of transformers at various levels, and we are enhancing the detail for alignment. The contribution of induction heads in context learning through attention heads represents a bottom-up approach. Additionally, the recent proof that MLP activation at the token level is divided into mono-semantic features, enabling analysis up to the level of GPT-2, shows that the bottom-up approach has rapidly evolved over the years.

Interpretable AI matters

Like when I conducted research on the QA performance of LLMs, I have encountered instances where parts did not function as expected due to overestimating the intelligence of artificial intelligence. Interpretable AI matters. The deep understanding of the internal architecture of transformers might be a key factor in OpenAI and Anthropic being at the forefront of the AI industry, which can be seen as their academic contribution. Therefore, non-commercial communities can focus on improving open-source model performance by gaining insights into the internal workings of transformers. I look forward to exploring ways to direct AI towards safe and real-world applications derived from interpretability.

Lastly, much of this article draws from research by Anthropic and LessWrong, to whom I extend my gratitude. I am attempting to simplify the explanation of a very interesting field that is evolving within these communities: Explainable AI’s Mechanistic Interpretability. Although longer than anticipated, in the next installment, I will discuss the recently released open-source tool by OpenAI, the Transformer Debugger (TDB) and Deepmind’s research about Steering vector.

Reference

Most of the content here is extracted from the following three papers, which are highly recommended for their precisely constructed experiments.

ChatGPT의 답변 조종을 위한 Superposition Hypothesis

Seonglae Cho — Wed, 10 Apr 2024 16:03:49 GMT

10억명의 사용자를 가진 ChatGPT의 답변을 조종할 수 있다면 어떨까요? 가령 대화에 은근슬쩍 광고를 끼워 넣는다거나, 선거에 영향을 줄 수도 있겠죠. 이렇게 AI에 대한 인간의 개입 능력이 생긴다면, 이는 분명 엄청난 권력입니다.

ChatGPT같은 인공지능은 인공신경망을 통해 구현되고, 주로 이용되는 트랜스포머 모델 또한 MLP (Multi-Layer Perceptron) 뉴런 층을 가지고 있습니다. 하지만 우리는 이런 뉴런 조합으로 인공지능이 어떻게 ‘생각’할 수 있는지는 잘 모릅니다. 그래서 인공지능의 사고를 컨트롤할 수도 없죠. 뉴런의 역할과 작동원리를 잘 안다면, 뉴런을 조작하여 인공지능을 컨트롤할 수 있을텐데 말이죠.

그런데 Anthropic이라는 회사가 트랜스포머 AI 언어모델에서 인공 신경망의 뉴런 조작을 통해 답변을 컨트롤할 수 있음을 보입니다. 바로 오늘 소개할 Superposition Hypothesis와 Sparse AutoEncoder로 말이죠. 우선 발견한 뉴런부터 간단히 살펴봅시다. 아래는 Anthropic에서 분석한 뉴런들 중 하나인 암호화폐 뉴런입니다.

암호화폐 주제에 강하게 반응하는 Cryptocurrency 뉴런

이 뉴런은 인공지능이 암호화폐에 관련된 텍스트를 생성할 때 활성화됩니다. 사진 우측을 보면 해당 뉴런이 비트코인과 관련된 발화에 강하게 활성화(activation)되는 것을 확인할 수 있습니다(진한 배경으로 표시). 이런 발견은 여러분이 ChatGPT와 대화할 때 특정 주제에 따라 활성화되는 뉴런이 있다는 것을 알려줍니다.

한국어 뉴런도 발견되었죠

Anthropic은 암호화폐 뉴런 뿐 아니라 위 사진 같은 한국어 뉴런, 전치사 뉴런 그리고 유머 뉴런 등 다양한 뉴런들을 발견했습니다. 이런 신경망 내부의 뉴런 활성화를 연구하는 분야는 빠르게 발전하고 있고, 감사하게도 AI계의 거물인 두 회사, OpenAI와 Anthropic도 앞다투어 연구를 공개하고 있습니다.

현재 AI 산업의 투탑은 단연코 OpenAI와 Anthropic입니다

Explainable AI

XAI(Explainable AI)라는 분야에는 인공지능의 인공신경망 속 뉴런들이 ‘지능’에 어떤 기여를 하는 지를 알아내기 위한 연구가 있습니다. 말 그대로 블랙박스인 AI 작동을 설명하기 위한 시도인데요. 해당 분야는 AI발전에도 중요한 역할을 해왔습니다. 대표적으로 OpenAI를 이끈 연구자 중 한명인 일리야 수츠케버에 따르면 2017년 OpenAI에서 발견한 감성뉴런(sentiment neuron)이 ChatGPT의 개발을 이끈 중요한 발견이었다고 하죠. 이처럼 인공 신경망을 이해하는 건 인공지능 자체의 발전과 맞물려 있습니다. 그리고 불과 몇 개월 전 Anthropic이 LLM의 대표격 모델인 트랜스포머에서 약 4000개의 뉴런을 대량 발견해버립니다.

정확히 표현하면 인공 신경망의 뉴런 4000개를 발견한 게 아니라 뉴런들에 분산되어 있는 4000개의 기능을 분리해 냈습니다. 여기서 ‘분리’라고 표현한 이유는, 신경망에서 하나의 기능이 여러 뉴런에 나뉘어 존재하고 또, 하나의 뉴런이 여러 개의 기능을 담당하기 때문입니다. 연구를 통해 밝혀진 재밌는 현상이죠. 그리고 이런 개념을 superposition hypothesis 라고 부릅니다. 이렇게 뉴런과 기능이 중첩되어 독립적인 것처럼 존재했기 때문에 인공 신경망을 설명하려는 연구들이 빠르게 발전하지 못했던 겁니다.

이해의 편의상 이 글에서는 분리된 기능 노드도 뉴런이라 표현하겠습니다

이렇게 ‘하나의 뉴런이 하나 이상의 기능을 한다’라는 문제에서 출발해 어떻게 혼재된 기능을 분리해낼 수 있었을까요? Anthropic은 간단한 방식으로 풀어나갑니다. “하나의 뉴런이 여러 개의 기능을 담당한다면, 기능 별로 나뉠 때까지 분리해보자!” 입니다. 정확히는 뉴런 층 전체의 활성화(activation)값으로 이루어진 벡터에서, 각각의 차원이 하나의 기능을 할 때까지 벡터를 확장하는 것입니다. 이를 위해 Sparse AutoEncoder라는 구조를 활용해 뉴런의 활성화 벡터에서 기능을 분리해냅니다.

Anthropic은 512차원의 MLP 뉴런층의 활성화 벡터를 기능별로 차원을 가진 사전 벡터로 변환하였습니다.

우선, 처음에 512 차원 길이의 빽빽한 활성화 벡터가 있다고 가정해 봅시다. 이를 서서히 확장하면서 4096차원의 벡터가 되면 8배나 차원이 늘어난 벡터는 아주 느슨해질 겁니다. 벡터 공간의 부피는 차원에 따라 기하급수적으로 늘어나기 때문이죠. 이렇게 느슨해진 벡터를 sparse하다고 합니다. 정확히는 넓은 차원을 사용하며 벡터 요소들에 0값이 많아지게 됩니다.

이 때, 우리의 목적은 빽빽했던 활성화 벡터(dense activation vector)를 기능 별로 분해한 느슨한 기능 벡터(sparse feature vector)로 분리하는 것입니다. 이런 접근 방식을 Dictionary learning이라고 부르는데, 중첩되어 있는 기능을 분해하여 4096개의 벡터 요소 요소에 기능을 사전처럼 분리하여 정돈하기 때문입니다. 이렇게 뉴런의 활성화를 사전 벡터로 변환하는 모델을 위해 Sparse AutoEncoder라는 구조를 활용하는데, 이 아이디어는 아주 중요하니 좀 더 자세히 들여다 봅시다.

Sparse AutoEncoer

https://arxiv.org/pdf/2309.08600.pdf

AutoEncoder란 대표적인 Encoder-Decoder 구조의 신경망입니다. AutoEncoder를 완전히 이해할 필요는 없습니다. 여기서 핵심은 이 구조가 인코더 파트와 디코더 파트로 이루어져 있다는 점이죠. 인코더와 디코더는 각각 벡터를 다른 크기의 차원으로 매핑해줍니다. 인코더는 뉴런 활성화 층 벡터를 사전 벡터(dictionary vector)로 변환하고, 디코더는 이 벡터로 원본 뉴런 활성화 층을 복원합니다. 그 과정에서 Sparse AutoEncoder는 중간의 사전 벡터에 희소성(sparsity)를 강제하여 각각의 차원에 하나의 기능만 들어가도록 강제합니다. Sparse해진 벡터로 분석이 쉬워지는 건 덤이고요. 자 그러면 우리는 뉴런 활성화를 사전 벡터 즉 단일 의미(Mono-semanticity)로의 분리에 성공했습니다.

Sparse AutoEncoder를 활용한 뉴런 분석은 LessWrong forum에서 Lee Sharkey에서 먼저 연구되었으나, 독립적인 연구이고 Anthropic의 Toy Models of Superposition에 일부 영향을 받았다고 합니다.

이론적 분석

Anthropic이 보여준 단일 의미(Mono-semanticity) 벡터를 음미하기 전에 superposition hypothesis를 더 깊게 이해해 봅시다. 즉 여러 뉴런에 여러 기능이 산재해 있는 이유를 이해해야 합니다. 이는 별다른 문제가 아닌 것 같지만 이상한 점이 있습니다. 바로 뉴런 개수보다 기능의 개수가 많다는 점입니다. 선형 대수를 공부한 사람들은 띠용할겁니다. 차원 개수보다 많은 기능이 존재한다는 게 말이 됩니까? (엄근진) 하지만 이를 잘 설명하는 아주 통찰력 있는 의견들이 있습니다. 바로 차원끼리 차원을 공유하여 또 다른 직교성(orthogonality)가 일어난다는 점입니다. 그래서 superposition이 생기고 이를 더 높은 차원으로 벡터를 변환해서 분해할 수도 있게 됩니다.

간섭을 인정하면 더 높은 차원까지 활용을 가능하게 합니다

우리는 차원 내에 표현할 수 있는 능력이 제한되어 있다는 것을 알고 있습니다. xy그래프에는 x와 y 값만 표현할 수 있지 z를 표현할 수는 없습니다. 그러니 위 그림을 보면 어이가 없습니다. “저게 된다고? 기저(basis)만큼 차원을 써야지!” 하지만 저희가 다루는 것은 2차원이나 3차원이 아니라 수백 차원입니다. 차원의 저주로 인해 부피가 어마어마하게 늘어나는 만큼, 서로 간섭할 일도 거의 없다는 거죠.

그리고 여기에 Compressed sensing이라는 이론적 설명이 뒷받침됩니다. Compressed sensing은 신호 처리에서 데이터가 충분히 sparse하다면 기저(basis)가 부족해도 완전히 신호를 복원할 수 있다는 이론인데요. 이전 연구에 따르면 트랜스포머의 기능은 sparse하게 사용되기 때문에, Compressed sensing의 조건에 부합합니다. 이게 더 적은 차원에서 많은 기능을 사용해도 문제가 없는 이유이고, superposition hypothesis에서 차원을 공유해서 사용한다는 중요한 근거입니다.

Superposition으로 인해, 적은 뉴런으로도 많은 뉴런을 가진 것처럼 모델을 시뮬레이션할 수 있습니다

개인적으로 놀라운 점은 이런 통찰이 트랜스포머 모델의 positional embedding이 작동하는 인사이트와 비슷하다는 점입니다. 저는 예전 트랜스포머를 처음 공부할 때 positional embedding이 token embedding과 동일한 차원을 이용해도 문제없는 게 너무 이해가 안됐습니다. 당연히 둘은 다른 정보를 전달하기 때문에 다른 차원을 이용해야 할 것 같았거든요. 그런데도 불구하고 트랜스포머 모델은 token embedding과 positional embedding을 차원을 연결(concat)하는 게 아니라 단순히 더하기만 합니다. 하지만 이는 잘 작동하는데 바로 위와 동일한 선험적 통찰에 기반합니다. 높은 차원에서 추가적인 직교성(approximate orthogonality)이 작동한다는 점이죠.

정말 흥미로운 분석입니다. 마치 기계 뿐 아니라 인간의 뉴런도 이렇게 superposition hypothesis에 기반해 활성화될 것 같죠. 만약 그렇다면 어떤 개념에 대해 생각할 때 superposition으로 인해 여러 개념이 중첩된 뉴런에 관련된 개념들을 강제로 활성화되는 것이 효율적이라 이렇게 학습된 것은 아닐까 하는 추측도 듭니다.

자 우리는 기막힌 이론적 분석을 가지고 AI 뉴런의 비밀을 파헤쳤습니다. 하지만 멋들어진 수식과 분석이 언제나 쓸모가 있는 건 아닙니다. 위의 사실들로 우리가 ChatGPT가 교수님이 내 준 과제를 더 풀어주는 데에는 도움이 안될 것 같거든요. 다른 활용 방안이 있을까요?

그래서 어디에 쓰일까

특정 주제에 따라 활성화되는 뉴런이 있다면, 특정 뉴런을 활성화시켜서 해당 주제에 대한 생성을 강제할 수 있을까요? 앞서 말했던 AutoEncoder 구조를 다시 상기해봅시다. 우리는 인코더와 디코더를 가지고 있지만, 훈련 이후에는 활성화 정도(activation)를 사전 (dictionary vector)로 변환하기 위해 인코더만 사용하게 되죠. 여기서 학습에 사용된 디코더를 이용해 사전 벡터를 활성화 벡터(activation vector)로 바꿀 수 있을까요? 다시말해 우리가 의도한 대로 뉴런을 조종할 수 있을까요. 답변은 Anthropic의 한문장으로 대체해 보겠습니다.

Sparse autoencoder features can be used to intervene on and steer transformer generation.

무서운 말입니다. 최근 생성형 AI로 만들어지는 데이터가 쏟아지는 판국에 이를 원하는 방향으로 컨트롤할 수 있다는 건 설레면서도 걱정되는 일입니다.

다른 세팅은 고정된 상태에서 뉴런조작만으로 다른 결과를 얻었습니다

몇몇 분들은 이걸 보고 AI algnment에 쓸 수 있겠는데? 하실 겁니다. 맞습니다. 이 연구는 AI safey분야 중에서도 AI algnment를 위한 연구 결과입니다. AI alignment란 인공지능의 의도와 우리의 의도를 align 즉 일치시키는 것을 의미합니다. AI Alignment는 매우 중요하고 모두가 집중하고 있는 분야입니다. 지금의 안전을 위해서도, AGI와 인류의 공존을 위해서도요. 또 주가 방어를 위해서도 중요합니다. 최근 구글 Gemini에서 생성한 아래 이미지는 일론 머스크의 트윗에 언급되며 논란이 되기도 했죠. (제 알파벳 주식 폭락은 덤입니다)

다양성과 사실 사이의 줄다리기는 앞으로도 지속될 것입니다

Anthropic의 연구의 중요성은 기존의 AI alignmen에 새로운 방향성을 제시했다는 점입니다. AI alignment는 RLHF같은 피드백 기반 강화 학습으로 선호를 최적화(preference optimization)하는 훈련에 기반한 방식, 또는 트랜스포머의 디코딩 전략(decoding strategy)에서 토큰 레벨로 생성을 조작하는 방식이 있었습니다. 하지만 AI alignment를 위해 이제 뉴런에 직접적으로 의도를 가지고 원하는 방향으로 AI의 사고에 개입할 수 있게 되었습니다.

뉴런을 컨트롤하는 건 생각보다 복잡한 일입니다. Circuit이라 불리는 뉴런의 다양한 조합이 residual stream이라는 논리 흐름에 각 신경망의 레이어들이 기여하는 방식으로 AI는 사고합니다. 그러니 이런 뉴런 회로(neural circuit)를 조심히 다뤄야겠죠. 이런 방식으로 접근해보죠. 뉴런을 비활성화하는 건 지식의 저주를 효과적으로 방지할 수 있는 방법입니다. 인간은 임의로 망각하지 못하지만 AI는 뉴런을 비활성화하여 알고 있던 사실을 모르게 할 수 있죠.

초기 LLM은 쉽게 폭탄 제조 방법을 알려줬습니다. (AI Jailbreak)

예를 들어 ChatGPT에 누가 폭탄 제조 방법을 물어봤을 때, 이는 나쁜 의도가 있을 수 있어서 그 방법을 대답해주면 안됩니다. 그런 상황에서 우리그걸 막기 위해 ‘폭탄’ 뉴런과 ‘제조’ 뉴런이 동시에 활성화되는 회로(circuit)가 감지되면 해당 회로를 비활성화시켜야겠죠. LLM은 많은 것을 알고 있는 만큼 위험하기 때문에, AI safety는 중요하고 유망한 분야입니다. 그러니 미래에는 방지해야할 혹은 켜두어야할 뉴런 회로(circuit)를 연구하는 직업이 생길지도 모르겠네요. (혹은 그마저도 AI로 자동화 하거나요)

결론

AI의 생각을 컨트롤할 수 있는 힘이 축복이기만 한 건 아닙니다.

스파이더 맨의 삼촌 ‘벤 파커’는 ‘피터 파커’에게 아주 유명한 말을 해줍니다.

큰 힘에는 큰 책임이 따른다.

마치 AI를 조종할 강력한 힘으로 인공지능의 의도를 좋은 방향을 만들 수 있는 만큼, 악한 방향을 향할 수도 있는 것 처럼요. 책임과 선택권이 기업들에게 주어진 만큼, 그들이 올바른 선택을 하도록, 그리고 그 선택의 기여하기 위해 저도 앞으로 노력해야겠습니다.

또한 Sparse AutoEncoder를 이용한 방식의 분석이 LLM이라 불리는 엄청나게 많은 뉴런을 가진 GPT4나 Claude3 모델에서도 잘 분석을 해낼지는 모릅니다. Anthropic의 연구는 단일 레이어의 트랜스포머 모델을 활용하여 512의 MLP 뉴런을 4096개의 feature로 분리한 분석이기 때문이죠. Anthropic의 Report에서 말한 것처럼 차원이 늘어날수록 부피가 기하급수적으로 늘어나기에, LLM에서 우리가 기대했던 기능으로 하나하나 뉴런이 분리될 지는 미지수입니다. 그래도 OpenAI와 Anthropic이라는 킹콩과 고질라가 열심히 분석하고 있으니 머지않아 LLM들의 뉴런 분석도 빠르게 완성되지 않을까요? 그 말은 곧 ChatGPT 뉴런이 조작된 채로 우리와 대화할 지도 모릅니다. 물론 이런 임의 조작이 성능 저하를 일으키지 않는다는 전제 하에 말이죠.

실험에 사용된 모델은 1000억 토큰을 학습했지만 요즘 시대에는 작은 모델입니다.

이번 포스팅은 Explainable AI 시리즈의 첫 화였습니다! 이번 화는 꼭 AI 엔지니어가 아니더라도 읽을 수 있도록 노력했습니다. 다음 화에서는 더 딥하게 Circuit, Feature splitting & Universality등 개념적으로 오늘 설명이 부족했던 부분과, 이것들을 이용해 어떻게 Anthropic이 신경망이 FSM(finite state machine)처럼 추론하는 지에 대하여 알아보겠습니다. 또 이 Anthropic의 발표 이후, 최근 OpenAI는 TDB라는 뉴런 분석 툴까지 오픈소스로 공개했습니다. 바로 그 분석 도구인 TDB를 활용해, 언어 모델의 뉴런을 직접 분석해보는 과정을 공유해보겠습니다. 환경 세팅부터 GPT2같은 모델 분석을 따라할 수 있도록 준비해서 올려보겠습니다. 그러면 또 누군가 엄청난 대박 뉴런을 발견할 지도 모르죠.

그럼 다음 화에 계속!

참고 링크

실제 연구 https://transformer-circuits.pub/2023/monosemantic-features
Anthropic thread https://transformer-circuits.pub/
LessWrong thread https://www.lesswrong.com/tag/interpretability-ml-and-ai
OpenAI thread https://distill.pub/2020/circuits/
개념 정리 https://texonom.com/Explainable-AI-e1b35d4b9a6342dc863578350a7b4325

Superposition Hypothesis for steering LLM with Sparse AutoEncoder

Seonglae Cho — Tue, 09 Apr 2024 15:56:39 GMT

What if OpenAI could control the responses of ChatGPT, a chatbot with a billion users? Imagine subtly incorporating advertisements into conversations or tilting the electoral playing field. The ability to intervene in AI would undoubtedly represent immense power.

Artificial intelligence like ChatGPT is implemented through neural networks, and the commonly used transformer models incorporate MLP (Multi-Layer Perceptron) neuron layers. Yet, we don’t fully understand how this neural architecture enable AI to “think”. If we understood the role and mechanics of neurons better, we could potentially manipulate them to control AI actions.

The company Anthropic AI has demonstrated the ability to manipulate neurons in transformer models to control responses. They’ve presented this with the Superposition Hypothesis and Sparse AutoEncoder. Let’s start with one of the neurons Anthropic analyzed, the cryptocurrency neuron.

Analysis of a Cryptocurrency neuron.

This neuron strongly reacts to cryptocurrency topics, showing activation in discussions related to Bitcoin, highlighted by the increased opacity in the right part of the image. This discovery informs you that there are neurons activated by specific topics when you converse with ChatGPT.

A base64 neuron was also discovered.

Additionally, Anthropic has identified neurons responsive to various languages like Latin and Korean including base64 and RegEx, as well as neurons specialized in mathematics and humor. In this rapidly advancing field, AI giants OpenAI and Anthropic have made generous contributions, openly sharing their research findings.

Explainable AI

In the field of Explainable AI (XAI), some research aim to decipher how individual neurons contribute to “intelligence,” attempting to shed light on the black box AI. This area of research plays a crucial role in the advancement of AI. Notably, Ilya Sutskever, one of the leading researchers at OpenAI, identified the discovery of a sentiment neuron in 2017 as a pivotal development in the creation of ChatGPT. In this manner, understanding neural networks is intertwined with progression of AI. At this time, Anthropic identified about 4,000 neuron features in transformer models.

To be precise, it wasn’t that 4000 individual neurons were found, but rather 4000 features distributed across various neurons. This distinction is made because a single feature can be spread across multiple neurons, and conversely, a single neuron can carry multiple features. This intriguing phenomenon led to the concept known as the superposition hypothesis. Neurons and features coexist in a superposed state, appearing independent when they are not.

For simplicity, this text will refer to even the separated feature nodes as neurons.

The question then arises, how can entangled features be separated? Anthropic approached this with a straightforward strategy: “If a single neuron is responsible for multiple features, let’s separate them until each is distinct.” Specifically, they expanded the activation vector of a neuron layer, ensuring each dimension represented a unique feature, by utilizing what is known as Sparse AutoEncoder to separate features.

Anthropic has transformed the activation vector of a MLP neuron layer into a dictionary vector.

First, let’s assume there is a dense activation vector with a length of 512 dimensions. As we gradually expand this vector to become 4096-dimensional, it will become much looser, expanding by eight times its original dimensionality. The volume of the vector space increases exponentially with its dimensions. This expanded vector is referred to as sparse, characterized by a primarily consist of zero values.

At this point, our goal is to separate the dense activation vector into a sparse feature vector, each part of which represents a distinct feature. This approach is called Dictionary learning because it involves breaking down overlapping features and organizing them into 4096 vector elements, each serving as a neatly arranged feature akin to a dictionary. For this purpose, we utilize a structure known as the Sparse AutoEncoder, an idea that is crucial for converting neuron activation into dictionary vectors. Let’s take a closer look at this.

Sparse AutoEncoer

https://arxiv.org/pdf/2309.08600.pdf

An AutoEncoder is a type of neural network with a characteristic Encoder-Decoder structure. It’s not essential to fully grasp AutoEncoders here; the key point is that this architecture is divided into encoder and decoder parts. Both encoder and decoder map vectors to different dimensional sizes. The encoder transforms the neuron activation layer vector into a dictionary vector, while the decoder reconstruct the original neuron activation layer from this vector. During this process, the Sparse AutoEncoder imposes sparsity on the dictionary vector, ensuring that each dimension corresponds to a single feature. This makes the analysis of the now sparse vector easier. Thus, we have successfully separated neuron activation into mono-semantic features.

The neuron analysis using Sparse AutoEncoders was initially researched by Lee Sharkey in the LessWrong forum, albeit as an independent study influenced in part by Anthropic’s Toy Models of Superposition.

Theoretical Analysis

Before delving into how to use dictionary vectors, let’s gain a deeper understanding of the superposition hypothesis. You need to got the reason why multiple neurons can have multiple features scattered across them. This might not seem problematic at first glance, but it starts from here; there are more features than neurons. For those familiar with linear algebra, this might seem weird— how can there be more features than dimensions? However, insightful explanations exist for this phenomenon. They suggest that dimensions share among themselves, creating additional orthogonality. This leads to the occurrence of superposition, allowing for the decomposition of these features into higher dimensions.

Acknowledging interference allows for the utilization of higher dimensions.

We know that the ability to represent within dimensions is limited. For example, on an XY-plane, we can only represent x and y values but not z. Thus, looking at the above figure might seem absurd. “How is that possible? You need to use as many dimensions as there are basis!” However, what we’re dealing with isn’t two or three dimensions but hundreds of dimensions. Due to the curse of dimensionality, the volume increases enormously, so interference between elements is almost zero.

This is further supported by the theory of Compressed Sensing. Compressed Sensing suggests that if data is sparse enough, complete signal recovery is possible even with an insufficient basis. Moreover, previous research indicates that the features used in transformer language models are limited and sparsely utilized, fitting the conditions of Compressed Sensing. This explains why using many features in fewer dimensions isn’t problematic, serving as a key justification for the superposition hypothesis, where dimensions are shared for use.

Due to superposition, a model can simulate having more neurons than it actually does.

Personally, I find it astonishing that this insight parallels the functioning of the transformer’s positional embeddings. When I first studied the transformer, the fact that positional embeddings could use the same dimensions as token embeddings without issue was baffling. It seemed logical that different types of information should require different dimensions. Aside my concern, transformer models simply add token embeddings and positional embeddings together, rather than concatenating. Even so, this approach works effectively, based on the same pre-empirical insights. It suggests that additional approximate orthogonality operates in high dimensions.

This analysis is intriguing, suggesting that not only machines but potentially human neurons could also activate based on the superposition hypothesis. If so, when contemplating a concept, the forced activation of neurons related to overlapping concepts through superposition might be an efficient learning mechanism, leading to the speculation that this might be how learning is facilitated.

We’ve delved into the secrets of AI neurons with a fancy theoretical analysis. However, elegant analyses don’t always turn into practical usages. Given these insights, it seems unlikely that they would help ChatGPT solve assignments from a professor. So, where could this be applied?

Application

If there are neurons that activate based on specific topics, could activating certain neurons force generation on those topics? Let’s recall the AutoEncoder structure discussed earlier. While we have both an encoder and a decoder, after training, only the encoder is used to convert the activation vector into a dictionary vector. Could we then use the decoder to reconstruct a activation vector from a chosen feature? In other words, could we manipulate neurons as we intend? The answer can be summarized in one sentence from Anthropic:

Sparse autoencoder features can be used to intervene on and steer transformer generation.

It’s a fearsome statement. In an era flooded with data by Generative AI, the ability to control its direction is both thrilling and worrisome.

Manipulating neurons alone to achieve different outcomes, while other settings remain constant.

Some may consider if this could be applied to AI alignment. Indeed, this research contributes to the field of AI safety, specifically towards AI alignment. AI alignment is about ensuring the preference of artificial intelligence align with ours, a crucial and widely focused area for the coexistence of AGI with humanity. It’s also vital for safeguarding interests, as seen in the controversy over an image generated by Google’s Gemini, which Elon Musk mentioned in a tweet (and my Alphabet stock’s dip was just a bonus).

The tug-of-war between diversity and truth will continue into the future.

The significance of Anthropic’s research lies in its introduction of a new direction for AI alignment. Previously, AI alignment strategies included training-based methods based on feedback-driven reinforcement learning, like RLHF, to optimize preferences, or decoding-based token level manipulation when generating a token through several decoding strategies. However, now it’s possible to directly influence AI’s thought processes by intentionally manipulating neurons, steering AI towards desired directions.

Controlling neurons is more complex than it might seem. The various combinations of neurons, referred to as a circuit, contribute to the logical flow of thought within AI, known as the residual stream. Thus, careful handling of these neural circuits is necessary. Approaching it this way, deactivating neurons can effectively prevent AI from recalling or generating unwanted information. Unlike humans, who cannot choose to forget at will, AI can be made to “forget” known facts by deactivating neurons, effective way to handle the curse of knowledge.

Initially, LLMs could easily divulge methods for making bombs (AI Jailbreak).

For instance, ChatGPT should not provide an answer if asked how to make a bomb. To prevent this, a circuit that activates simultaneously for ‘bomb’ and ‘manufacture’ neurons should be deactivated upon detection. Given the vast knowledge contained within LLMs, AI safety is a crucial and promising field. Therefore, in the future, there might be jobs dedicated to researching which neural circuits should be activated or deactivated to prevent harmful outcomes (or even automating this process with AI).

Conclusion

The power to steer AI’s thought processes is not solely a blessing.

Uncle Ben of Spider-Man told Peter Parker an important lesson.

With great power comes great responsibility.

Having the power to control an AI system towards good means it can also be used for harm. This imposes a significant responsibility and choice on companies, and they bear the responsibility to make the right decisions. As part of this industry, I, too, am committed to striving for a better future.

Furthermore, it’s uncertain whether the analysis method using Sparse AutoEncoders will be effective for disentangling the feature in massive models like GPT-4 or Claude3. Anthropic’s research involved analyzing a single layer of a transformer model, separating 512 MLP (Multi-Layer Perceptron) neurons into 4096 features (with 168 dead neurons). As Anthropic’s report suggests, as dimensions increase, the volume exponentially grows, making it harder to each neuron can be separated into the anticipated features in LLMs. Nonetheless, with giants like OpenAI and Anthropic deeply involved in such analyses, we might soon see rapid completion in neuron analysis of LLMs. This means, soon, we would interacting with a ChatGPT whose neurons have been manipulated, if such manipulations do not degrade performance.

The model used in the experiment, although trained on 100B tokens, is considered a small model.

This post marks the first part in the Explainable AI series! For the next episode, we’ll delve deeper into concepts such as Circuits, Feature Splitting, and Universality — areas where today’s explanation might have fallen short. We’ll explore how Anthropic uses these concepts to enable neural networks to reason like Finite State Machines (FSMs).

Following Anthropic’s announcement, OpenAI recently made its neuron analysis tool, TDB, available as open source. I will also share the process of using TDB to directly analyze the neurons of language models like GPT-2, starting from setting up the environment on the next post. Who knows, maybe someone who learned from my article will discover an incredibly groundbreaking neuron .

Stay tuned for the next post!

References

Actual research: https://transformer-circuits.pub/2023/monosemantic-features
Anthropic thread: https://transformer-circuits.pub/
LessWrong thread: https://www.lesswrong.com/tag/interpretability-ml-and-ai
OpenAI thread: https://distill.pub/2020/circuits/
XAI notes: https://texonom.com/Explainable-AI-e1b35d4b9a6342dc863578350a7b4325

6 free AIs you’ll use more than ChatGPT

Seonglae Cho — Fri, 07 Apr 2023 08:41:46 GMT

ChatGPT is smart but not perfect

Having to log in and out frequently
Too wordy
Stating Fiction as Fact
The more powerful GPT4 version is paid
Page experience and design also suck

familiar screen

There are a lot of different AIs are outing there, and depending on your needs, there are better apps than ChatGPT.

I’ve tried many AI tools that can be used in various ways, and I’m going to recommend the ones that I still use today.

1. ChatPDF

For complex papers, understanding class resources

This tool, whose name is confusingly similar to ChatGPT, is smarter than you think and can be your own personal tutor. Sometimes he gives amazing answers about complex domain knowledge that make you wonder if you’re asking a real professor.

If you put in a PDF file that needs a description, the AI will become an expert on that PDF and answer all your questions…!

He made my finger dance

Even answers are very smooth and he explains the math part by writing down the formula. It’s like having a tiny professor in your pocket.

Of course, as with most AI apps, it’s only free up to a certain amount of usage. But even the free version is pretty good.

ChatPDF AI | Chat with any PDF | Free

2. Perplexity.AI

Searching, Summarizing webpage

It also links to annotations about the answer, like below

Hyperlinks

Unlike ChatGPT, which says bullshit sometimes, your credibility skyrockets by referencing links. It can be frustrating at times because they seem to be looking for facts. When they run out of things to say, the information drops dramatically.

There’s also a browser extension that lets you ask questions or summarize the page you’re currently viewing right in the top right corner of your browser. It’s great because you don’t have to renew your API key as often as with ChatGPT extensions like Google ChapGPT.

There’s a similar browser extension called Monica, but my favorite is Perplexity.AI.

Perplexity - Ask AI

3. Ora.sh

Personal Customizable AI

It’s a service that lets you create and use an AI by customizing its name, personality, and more. If you set it up right, you can even have a virtual girlfriend or boyfriend to cure your loneliness.

Jusk joke

Looking at this, I can imagine myself spending all day talking to NPCs in a future game.

1-click chatbot | ora.sh

4. Tome

Storytelling PPT AI

If you have a PPT you want to create, describe the topic and it will also create images of slides for you! AI is infiltrating office software.

some random words

The images are AI-generated, so the quality is not guaranteed, but there are some good AI-inspired ones, like this one

He never laugh

Since there hasn’t been much research on AI for slide creation, it doesn’t show the results of various styles, but I think it would be quite helpful to review and work with it before creating a PPT. It would be good to familiarize yourself with this kind of office workflow before MS Office Copilot is released.

Tome - The AI assistant for sales

5. Jenni

Writing Assistant AI

It’s a ghostwriter AI that does work of Grammarly, but also writes for you. It’s perfect for fill in, as you can just press the right arrow then it will write for you.

Written in a minute

I think NotionAI is also good in this area, but it is paid. If you’ve used Notion, you might want to give it a try.

Jenni AI

6. DeepL

Translation

There is a browser extension, so when you drag the text, the translation button pops out, so the usability is good.

Any BBC link

Translation is also a type of AI, so we put it last

DeepL Translate: Reading & writing translator

Conclusion

The tool has to be easy to use for the user. Otherwise, they won’t use it.

In this article, there are no image processing AI tools such as Dall-E or Stable Diffusion, but I did not include them because there is nothing useful in the free version. Also, AI-generated images are not practical for the general public, and the cost of memory and computation is large, so it is not yet more practical than the article.

That’s it for AI tools for everyone!

I am also preparing a developer’s edition, but since AI is created by developers, the development of AI tools for developers is overwhelmingly faster than other fields. Anyway, if I have time (if the response is good), I will think about the developer side as well.

ChatGPT보다 자주 쓸 6가지 무료 AI

Seonglae Cho — Thu, 06 Apr 2023 18:01:47 GMT

그놈의 ChatGPT.. 분명 똑똑하신 AI님이지만 단점도 명확합니다

매번 들어가기, 로그인하기 귀찮다
쓸데없이 장황하게 말한다
구라를 팩트처럼 말한다
성능좋은 GPT4 버전은 유료다
페이지 사용성과 디자인도 구림

이제는 익숙해진 이화면

세상에는 다양한 AI들이 아주 빠른 속도로 나오고 있고, 쓸모에 따라 ChatGPT보다 훌륭한 친구들도 많다는 사실

여러모로 사용할 법한, 많은 AI 도구들을 직접 사용해보고 지금까지도 제가 사용하는 도구들로 간추려 추천 해드리겠습니다

1. ChatPDF

복잡한 서류, 수업 이해용

이름부터 ChatGPT 짝퉁같은 이친구는 생각보다 똑똑하고 당신의 개인 교습 교수님이 될 수 있습니다. 가끔 정말 교수님한테 물어보는건가 싶을 정도로 전공지식에 대해서 놀라운 답변을 하는 경우도 많습니다.

앱에 설명이 필요한 PDF 파일을 넣으면 AI가 그 PDF에 전문가가 되어 모든 질문들에 대답해줍니다..!

존댓말이 절로 나오는 능지

심지어 한글도 아주 매끄럽게 답변하고, 수학적인 부분도 수식까지 적어주면서 설명해줍니다. 이정도면 교수님을 주머니에 넣고 다닌다고 볼 수 있습니다.

물론 대부분의 AI앱이 그렇든 일정 사용량까지만 무료입니다. 하지만 무료버전도 어느정도 쓸만하다는 사실

ChatPDF AI | Chat with any PDF | Free

2. Perplexity.AI

검색, 요약용

아래와 같이 답변하는 정보들의 주석까지 링크로 제공해주는 보여주기까지 하는 AI

AI 도 참고하는 킹무위키

걸핏하면 사기치는 ChatGPT와 달리 링크를 참조해두어 신뢰성이 급상승합니다. 아무래도 fact를 추구하는 듯 해서 할말없으면 급격히 줄어드는 말수 때문에 가끔 답답할 수는 있습니다.

또한 브라우저 확장 프로그램도 있어서 브라우저 우상단에서 바로 질문하거나 현재 보고있는 페이지를 요약할 수도 있습니다. Google ChapGPT같은 ChatGPT Extension보다 자주 API key 갱신할 필요가 없어서 아주 좋습니다

비슷한 브라우저 확장 프로그램으로 Monica도 있는데 개인적으로 Perplexity.AI 을 주로 이용합니다.

Perplexity - Ask AI

3. Ora.sh

개인용 커스텀 가능한 AI

직접 이름과 성격 등을 설정해서 AI를 생성하고 사용할 수 있는 서비스입니다. 잘 설정하면 가상 여자친구나 남자친구로 외로움을 달랠수 있을수도?

저는 이렇게 생각하지 않습니다 농담입니다..

이런걸 보다보면 머지않은 미래의 게임에서는 NPC 들과 하루종일 대화만 하고 지내도 재미있겠다 싶습니다.

1-click chatbot | ora.sh

4. Tome

자칭 스토리텔링 AI

만들고 싶은 PPT가 있고 주제를 설명해주면 대신 슬라이드 이미지까지 만들어준다! 오피스 소프트웨어들까지 침투하고 있는 AI입니다

대충 아무말이나 적어보자

이미지는 AI가 생성한 거라 퀄리티는 보장안되지만 아래같은 AI 감성으로 쓸만한 것들도 나옵니다

꽤 진지해진 AI님

슬라이드 제작이라는게 AI쪽으로 많은 연구가 없었을 것이기에 다양한 스타일의 결과를 보여주지는 못하지만, PPT 제작 전에 한번 돌려두고 참고하면서 작업하면 꽤 도움이 될것 같습니다. MS Office Copilot 이 출시하기 전에 이런 오피스 작업 플로우에도 익숙해지면 좋을 수 있겠습니다

Tome - The AI assistant for sales

5. Jenni

글쓰기 도우미 AI

Grammarly 의 기능도 있으면서 글도 대신 적어주는 대필가 AI입니다. 오른쪽 화살표만 눌러주면 자동으로 글을 적어줘서 글자채우기 용으로 딱입니다.

NotionAI 역시 이분야에서 괜찮다고 생각하지만 유료입니다. Notion 사용해보신 분들은 한번쯤 체험해보면 좋을듯 합니다. 또

Jenni AI

6. DeepL

번역

해외에서는 꽤나 유명한 디플. 한국어 번역은 최근에 출시했는데 성능이 매끄럽다고 합니다. 브라우저 익스텐션이 있어서 텍스트 드래그시, 번역버튼이 떡 나와서 사용성이 좋습니다.

아무 영어뉴스 링크입니다

번역도 AI의 일종이라 마지막으로 끼워넣었습니다

DeepL Translate: Reading & writing translator

결론

역시 도구는 사용자 입장에서는 사용하기 편해야 합니다. 아니면 안쓰게 됩니다.

이 글에서 Dall-E 나 Stable Diffusion같은 이미지 처리 AI 도구는 없는데 아무래도 무료버전으로 쓸만한게 없어 넣지 않았습니다. 또한 AI생성 이미지에는 일반인에게 실사용할 분야도 그렇고, 메모리랑 연산의 비용이 크다보니 글보다는 아직 실용성이 없는게 문제입니다.

여기까지 모두를 위한 AI도구 편이었습니다!

개발자편도 준비중인데, AI란게 개발자들이 만들다 보니 개발자를 위한 AI도구 발전속도가 다른 분야보다 압도적으로 빠릅니다. 어쨌든 시간나면 (반응이 좋다면) 개발자 편도 고민해봐야겠습니다

Build web service with Handbag Cycle

Seonglae Cho — Tue, 01 Mar 2022 08:43:59 GMT

The Web is the most successful and well-maintained non-OS platform which has an ecosystem with unprecedently good compatibility. For example, there are a lot of apps started from web-based services like medium, notion.so, facebook, etc. We call them web-based SaaS which can be accessed by anybody using a web browser.

A Web browser makes developers it so easy to deploy, develop an application without several concerns, unlike the normal local applications which are deployed by execution file. The Web gave liberty to developers to not care about OS. However, that liberty can confuse developers because of diverse forms of Service lifecycle. For this reason, I’ll introduce one form of web service cycle Handbag Cycle with examples per 3 levels.

Simple Access

Saas handback cycle

Components

The Handbag Cycle divides the service cycle into Four components.

Developer who create and improve Software
Software composed with source code
Service side which runs Software
User side which uses Service

Cycle Arrows

The developer participates or is assigned to software feature (upward arrow)
Developers contribute(downward arrow) to source code.
Source codes are packed into Software and delivered as a form of Service.
Service is accessed and used by users
Users give feedback based on user experience.

The upper diagram is minimal abstraction describes the Handbag Cycle. We will go deep inside components per level with examples.

Web example with Components

Web SaaS Handback Cycle

Components

User access service in Web Browser
Software source code is stored to Repository
Service run inside Linux Container
Developers develop inside IDE

Four blue-gray cards are de facto for each component. These instances are what you have to know about. Kubernetes is a Linux Container Orchestration Platform. Github is the most influential remote git repository. I do not need to mention Chrome’s importance in the web ecosystem.

https://insights.stackoverflow.com/survey/2021

Recommanded Stack

Recommanded stack for web SaaS with handback cycle

This structure is a composition of the Handbag Cycle with personal preference components. Let’s talk about the more specific flow of the Service Cycle.

The developer clones software source code to PC (upward arrow)
Developers push (downward arrow) to Github Repository.
Changes are merged into source code by pull request
source code build to software while passing through CI pipeline like Github action
software are deployed and become service pass through CD pipeline like ArgoCD
service in the container platform like GKE use fetch stateful domain data in DB
Some DNS services like Cloudflare make a point to users to access the web application
Service runtime log access data to Time Series storage
User share usage experience to SNS like Twitter
Another user participates in using the service.
Users give readable feedback to Software in Github Issues or email or something

Other Recommendations

Github Issue can be substituted by Azure DevOps Board or Jira etc. If you want free Kubernetes, Okteto is one solution. Vercel, Cloudflare Worker is another option if you do not need Kubernetes. IPFS, Cloudflare page, Netlify are other options if your application is static.

Conclusion

The most important thing to maintain a service is feedback which motives development. For more feedback, a bigger community is needed. The community can be a user community or an open-source community. Nowadays several open source web applications that support SaaS are increasing. For instance, Standard Note and Outline are very useful software that is open source and can be self-hosted. This liberty is maintaining the Web ecosystem.

Also, a nice CI/CD pipeline is important because it lowers entries to deploy service. There are several ways and new technologies in this area. Linux Container is hard to understand, set up, and manage. However, that method should be a supporter of your service, not an object. It is important to know fancy CI/CD’s purpose is seamless development.

To be honest, this Handbag Cycle is not a brand new structure as you know. Many organizations’ services can be interpreted as a Handbag Cycle. However, recognizing the four components of the service cycle will make it easier to improve your software.

Thanks for reading. Below is the diagram link.

Handbag Cycle

개인과 조직은 생산성을 어떻게 올릴 수 있을까?

Seonglae Cho — Fri, 10 Dec 2021 08:43:27 GMT

생산성은 시간당 목표 결과에 대한 성과입니다. 그러므로 생산성이 높기 위해선 무조건 일을 빨리 처리해야 하는 것도 아니고, 성과에만 치중해서도 안됩니다. 우리는 생산성을 위해 시간과 결과 사이의 trade-off 관계를 잘 파악해서 노력을 잘 배분해야 합니다.

같은 노력에 대해 더 큰생산성을 얻기 위한 방안은 여러가지가 있을 수 있습니다. 거기에는 개인의 생산성을 올리기 위한 관점과 우리가 속한 조직의 생상성을 높이기 위한 관점이 존재합니다. 개인과 조직에서 생산성을 효율적으로 올릴 방법을 각각 알아본 후, 생산성 자체가 가지는 의미에 대해 살펴보고자 합니다.

개인의 생상성 증대

단기적 생산성

개인의 일상은 처리해야할 일들의 연속입니다. 양치하기, 밥먹기, 출근하기, 화장실 가기… 결과에서 차이가 없다면 당연히 처리하는 시간을 줄이는 게 효율적입니다. 이를 위해서는 일들의 구조를 짜는 것이 중요합니다. 작고 연속적인 일들의 의존성을 잘 파악해서, 항상 할 일의 구조를 형성해둔다면 이들을 효율적으로 처리할 수 있습니다. 그 의존성에는 위치, 시간, 함께할 사람 등이 될 수 있습니다. 실제 그 일이 발생하기 전에 의존성을 파악해두는 것 만으로도 일의 실행여부와 순서를 정하는 것을 수월하게 합니다.

☑️ 아래는 추가로 추천하고 싶은 키워드들 몇가지입니다.

📌Temptation Bundling (유혹 묶기)
📌Structured Procrastination (구조적 미루기)

장기적 생산성

우리는 한눈에 파악하기 쉬운 익숙한 일들도 많이 하지만, 그렇지 않은 일들을 할 때 가장 어려움을 겪습니다. 장기적으로 엮여있는 업무들의 묶음을 프로젝트라고 통칭할 때, 프로젝트가 성공적이려면 장기적인 관점으로 업무를 진행해야 합니다. 즉 개별 업무의 선택 분기점에서 장기적으로 이기는 선택을 해야 합니다. 가장 많이 하는 실수로, 사람들은 문제가 생겼을 때에 딱 그 문제에 대한 해결방안까지만 실행합니다. 조금 더 고려해 대처방법을 결정하면 비슷한 문제들까지 방지할 수 있는데, 단기적인 생산성만 생각해서 이를 놓칩니다. 장기적인 관점을 가지고 분기점에서 옳바를 대처를 하는 게 중요합니다.

보통 장기적인 선택은 선택과 집중이라는 키워드와 함께 갑니다. 우리에게는 수많은 요청과 정보들이 input으로 들어옵니다. 그 중에서 어떤 걸 내 output으로 만들지에 대한 선택이 중요합니다. 요즘 우리에게는 조직 내에서의 업무뿐 아니라 뉴스레터들, 유튜브, SNS 등 수많은 정보창구가 있습니다. 프로젝트를 명확히 하려면, 이 input들 중 지속적으로 받아들일 리스트를 장기적인 비전을 기준으로 관리해야 합니다. 더불어 그 input들 속에서도 output으로 만들기 위해 실행시킬 일들로 필터링해야 프로젝트를 효율적으로 진행시킬 수 있습니다. 그렇지 못한다면, 프로젝트는 단기적인 요청들의 무작위 방향성이 합해져, 시간만 늘어나고 이도저도 아닌 결과물이 나올게 될 것입니다. input 정보들을 정리해 프로젝트 영역을 명확히 하고, output을 그 기준에 따라 필터링하면 효율적이고 의미있는 결과를 낼 수 있습니다.

☑️ 구체적인 프로젝트 관리에 대한 방법론은 정말 많습니다. 저는 GTD라는 방법론을 소개하고 싶습니다.

📌GTD (일 마무리하는 법)
📌The Law of Triviality (사소함의 법칙)

조직의 생산성 증대

회사는 생산성이 가장 중요한 조직 중 하나일 겁니다. 회사 개개인의 생산성을 높이면 회사의 생산성도 높아지겠지만, 구성원들이 복잡하게 얽혀 있다면, 모든 구성원의 생산성을 동시에 올리기는 쉽지 않습니다. 하지만 사람을 다양한 세포들의 모임이라 생각하는 것 보다, 하나의 개체라 생각하는게 훨씬 이해하기 쉬운 것처럼, 조직도 사람의 모임이라고 생각하는 것 보다 독립된 개체라고 여기는 게 이해에 더 도움이 될 수 있습니다. 이런 관점에서 회사는 목표하는 시장의 도메인 지식(input)을 재료로 조직문화(people)라는 과정을 겨쳐 만든 제품(output)을 시장에 테스트하는 개체라고 볼 수 있습니다. 여기서 조직이라는 핵심이 되는 조직문화의 입장에서 생산성을 분석해 보려 합니다.

조직 문화에 대한 큰 오해는 어떤 문화가 모든 회사의 생산성에 도움이 될 수 있다는 것입니다. 회사는 제품과 시장에 맞는 최적의 조직문화로 맞춰가야 합니다. 자유로운 분위기는 창의성이 중요한 조직에서는 도움이 되더라도, 원칙이 중요한 제품을 만드는 회사에서는 독이 될 수도 있습니다. 하지만 그와 동시에 조직문화라는 것 자체에 매몰되어, 외부 변화에 대한 조직문화 자체의 유동성을 잃지 않는 것이 중요합니다.

조직의 생산성이 개인의 생산성을 뛰어넘게 만드는 건 조직원들 사이의 상호작용입니다. 조직문화가 명확하지 않은 시점에서 조직원들 사이 커뮤니케이션은 굉장히 비효율적으로 느껴질 수 있습니다. 실제로 병렬적이지 않은 목표에 대한 생산성은 1+1 이 2는 커녕 1.1이 되기도 어렵습니다. 그럼에도 불구하고 커뮤니케이션과 협업이 중요한 이유는 개인의 생산성이 도달이 불가능한 영역에 조직은 도달이 가능하기 때문입니다. 그래서 조직은 한 사람의 능력과 판단에 조직의 생산성이 정해지지 않도록 해야합니다. 한 사람의 머리 속 맥락에서 일어나는 정보 프로세싱은 굉장히 빨라서 효율적이고 빠르게 느껴지지만, 이런 방식은 장기적인 관점에서 조직이라는 개체의 생산성에 밀리게 됩니다. 그러므로 조직문화의 핵심은 조직원들 사이의 커뮤니케이션에 초점을 맞춰야 합니다.

조직이 개인의 집합이라는 점에서 개개인의 생산성은 중요합니다. 개인의 지속적인 생산성은 의지가 아니라 습관에 달려 있고, 습관은 조직 문화와 규칙에 의해 만들어집니다. 그러므로 각 회사에 맞는 조직 문화는 굉장히 구체적이어야 합니다. 책임감의 주체가 개인이 아닌 조직이기 때문에, 개인들은 생산성을 늘여야할 동기가 필요합니다. 사람들은 돈과 감정에 따라 움직이기 때문에, 이를 바탕으로 회사의 구체적인 지침이 필요합니다. 칭찬과 질책에 대한 지침을 정해두는 것이 한 예시가 될 수 있습니다. 사람들은 칭찬에 동기를 얻고, 명확하지 못한 질책에 감정적으로 흔들립니다. 조직 내에 칭찬과 질책에 대한 명확한 가이드라인을 제시해주는 것만으로도 쓸데없는 감정소비를 줄이면서 일에 대한 동기를 줄 수 있습니다. 이와 같이 조직은 최적의 생산성을 위해 조직원의 습관에 대한 가이드라인을 정해야 합니다.

결론

생산성을 가져서 궁극적으로 원하는 게 무엇일까요? 개인과 조직은 생산성을 도구로 성공을 추구합니다. 성공은 여러모로 주관적인 개념입니다. 코인으로 벼락부자가 된 것도 성공일 수 있고, 지리산 자락에서 책을 읽으며 바람을 맞는 것도 성공일 수 있습니다. 성공은 주관적이기 때문에 성공은 설득에 가깝습니다. 내가 주변 사람들에게 성공했다고 설득할 수 있다면 그렇게 인정받을 것입니다. 마찬가지로 조직은 사회에 대한 기여로 성공을 사회에게 설득시킬 겁니다.

사람들을 설득하는 건 어렵습니다. 특히 말로만 설득하는 건 정말 어렵죠. 저는 개인의 꿈과 회사의 비전이 말과 같다고 생각합니다. 사업 아이템에 대한 비전만을 가지고 사람들을 쉽게 납득시킬 수는 없죠. 하지만 우리가 어떤 사람을 평가할 때, 그 사람이 하는 행동과 결과가 좋으면 믿음이 가는 것처럼, 그 회사가 만든 아이템들이 정말 좋을 때 우리는 가장 크게 설득됩니다.

이루고자 하는 게 무엇인지는 생각보다 중요하지 않습니다. 그걸 이루는 과정에서 선택한 뛰어난 결정들과, 그 도메인에서 최적의 tradeoff를 가지고 나온 최대의 생산성을 기반하는 결과물에 우리는 설득당합니다. 제품에서 설득력은 기능 하나하나에 대한 설명보다 제품 자체의 매력에서 오는 것처럼, 여러분들도 말만 하기보다 행동과 결과를 보여주는 사람이 되길 바랍니다.

PC와 스마트폰, 그 다음은?

Seonglae Cho — Mon, 27 Sep 2021 17:14:33 GMT

하나의 제품이 사람들의 삶에 변화를 가져다 주면서, 새로운 시장을 열고, 막대한 부까지 창출해낸 경우는 많지 않습니다. 그 중에서 최근까지 우리에게 큰 영향을 주고 있는 두 제품이 유튜브나 앱스토어같은 거대 IT 플랫폼 형성에 준 영향을 분석해, 앞으로 만들어질 플랫폼 시장을 예측해보려 합니다.

네, 이 두개입니다

🖥️ PC, 📱스마트폰

두 제품이 우리에게 미치는 영향이 큰 건 모두 동의할 겁니다. 비슷한 기능만큼 스마트폰과 PC는 비슷한 성장 과정을 거쳤습니다. 두 제품 모두 역사가 굉장히 복잡하지만 공통적으로 아래 과정을 거칩니다.

새로운 하드웨어 폼팩터를 가지는 매력적인 제품 출시 (Ex. 아이폰)
그 제품을 많은 사람이 사용하게 되면서 엄청난 제품 시장 생성
제품의 자유도를 이용한 소프트웨어 플랫폼 시장 형성

이런 과정으로 스마트폰과 PC라는 두 폼팩터에서 탄생한 플랫폼들이 현대 거대 IT 기업들의 발전을 이끌었습니다. 그러면 우리가 궁금한 건 그 제품 위에서 어떤 플랫폼들이 성공적이었고 왜 이런 플랫폼들이 두 제품 위에서만 나온 걸까요?

여기서 폼팩터는 대부분의 PC나 스마트폰 또는 마우스나 침대같이 구조가 유사한, 제품의 정형화된 형태를 의미합니다

소프트웨어 플랫폼

PC나 스마트폰같은 혁신적인 하드웨어는 그 자체로도 큰 시장을 만들어냈지만, 차량이나 전자레인지같은 제품들과는 근본적으로 구별되는 점이 있었습니다. 바로 제품 자체에는 목적성이 없지만, 다양한 소프트웨어를 탑재해 제품이 다양한 목적을 위해 활용될 수 있다입니다. 이 특성을 가지냐가 하드웨어 폼팩터에서 소프트웨어 플랫폼이 형성될 수 있느냐는 구별짓는 차별점이 됩니다.

하지만 뭉뚱그려 설명하면 이해하기가 힘들죠. 소프트웨어 플랫폼이 형성되는 과정은 공통적으로 아래의 형태를 띕니다

플랫폼 발전 과정과 컨텐츠 고리

해당 제품이나 제품 위 소프트웨어를 사용하는 사람이 많아짐 (커뮤니티)
여러 사람의 공통된 프로세스를 대신해주는 자유도 높은 플랫폼이 개발됨(Ex. 앱스토어, 쿠팡, 유튜브)
일부 창의적인 사용자가 플랫폼 위에 창작물 배포 (컨텐츠)
플랫폼을 운영하는 주체가 독점적 지위로 막대한 수익 실현

☑️그래서 저는 앱스토어는 물론이고, OS, 브라우저, 플랫폼 역할을 하는 앱 모두 PC와 스마트폰에 기반한 소프트웨어 플랫폼으로 해석합니다. 하드웨어 폼팩터 위에 올라가는 플랫폼의 구조는 아래 형태를 띄는데, 유명한 플랫폼들의 예로 아래 그림을 해석해 봅시다.

브라우저는 범용 소프트웨어로서 앱스토어의 역할을 하고, 아마존과 쿠팡이 파는 제품은 하드웨어 컨텐츠로 해석할 수 있습니다

성공적인 소프트웨어 플랫폼

📌Windows — Windows OS 판매 뿐 아니라, 하나의 플랫폼으로서의 인터넷 익스플로러, 오피스 등의 제품을 사용하게 만든 수익이 막대합니다.
📌Android, iOS — 최근 가장 논란이 많은 앱스토어 이슈가 있죠. 하지만 그것 말고도 제품 생태계가 주는 경제적 효과나 삼성과 구글의 관계를 보면 독점적인 지위의 중요성을 알 수 있습니다
📌Chrome—구글이 무료 소프트웨어인 크롬을 통해 얻는 게 별로 없어 보일 수 있지만, 구글의 검색과 광고를 제외하고도, 브라우저는 사람들이 가장 많이 사용하는 소프트웨어 중 하나입니다. 그걸 이용해 얻는 데이터와 수익이 구글이 웹 기술에 집착하는 이유입니다.
📌Youtube — 커뮤니티와 컨텐츠, 영상공유라는 기능을 간편화해서 독점적인 수익구조를 아주 잘 나타내는 어플리케이션 플랫폼의 대표적인 예시입니다. OS위에 새로운 플랫폼이 만들어질 수는 있지만 유튜브 위에서 새로운 플랫폼이 만들어지기 어려운 것처럼, 오락성 컨텐츠를 기반으로 하는 플랫폼이 최상위 소프트웨어 플랫폼이 됩니다.

유튜브, OTT같은 컨텐츠 플랫폼이 최상위 플랫폼이 됩니다

성공적이었던 플랫폼들은 자유도와, 커뮤니티, 컨텐츠라는 특징이 보이는데, 그래서, 이런 소프트웨어 플랫폼들이 펼쳐질 다음 제품, 하드웨어 폼팩터는 뭐가 될까요? 🤷‍♂️당연히 저도 모릅니다.

다음 세대의 키워드

하지만 다음 IT 물결 예측을 할 때 항상 나오는 후보들이 있습니다

바로 4차 산업혁명을 필두로 줄줄이 나오는 블록체인, 클라우드, 인공지능, 자율주행, 메타버스등이 있습니다. 이것들을 하나하나 봐봅시다.

📌Public Cloud는 클라우드 플랫폼 시장을 만들었지만, PC나 스마트폰같이 우리 삶에 근본적인 변화를 주기 어렵다고 생각합니다. 동의하지 않으시는 분들이 많겠지만 아래에서 한번 설명해보겠습니다.
📌블록체인도 마찬가지로 Ethereum같은 좋은 플랫폼을 만들었지만 클라우드와 같은 한계를 가진다고 생각합니다
📌인공지능도 클라우드 블록체인과 같이 무 하드웨어 종속이 없기 때문에 시장에서 한 주체가 독점적인 지위를 가지기 어렵습니다

이 세개의 기술은 무한한 활용도를 가지면서, 당연히 앞으로 우리의 삶에 간접적으로든 직접적으로든 많은 영향을 미칠 겁니다. 하지만 이 기술들의 장점이자 단점인 독점적인 폼팩터가 없다는 점이 우리 인지 속에서 이 기술들이 뒤쳐질 거라고 생각합니다. 마치 우리가 이불 속에서 유튜브를 보다가 핸드폰 뭐살까는 고민해도, 클라우드 위 유튜브 서버나 그닥 의식하지 않는 것처럼요.

📌자율주행차가 그래서 많은 사람들에게 다음 세대의 PC나 스마트폰이라고 여겨집니다. 저는 개인적으로 이동을 위한 하드웨어 폼펙터에서 얼마다 자유도높은 소프트웨어가 개발될 수 있을가를 생각해보면, 제품의 플랫폼 생성 능력에 한계가 있다고 봅니다. 하지만 일론 머스크가 어떤 마술을 부릴 수도 있겠죠. 하지만 제 기준에서 자율주행차가 PC와 스마트폰을 이을거라고는 생각하지 않습니다.

이 안에서 스마트폰이나 PC같은 활용도가 나올 수 있을까요?

PC와 스마트폰을 이을 후보는 다음 두가지라고 생각합니다. 메타버스 접속기기, 휴머노이드

새로운 하드웨어 폼펙터

☑️메타버스

메타버스 자체는 하드웨어 제품이 아니죠. 하지만 우리는 메타버스에 접속하는 새로운 형태의 제품을 알고 있습니다. 바로 VR기기입니다. 저는 메타버스의 광신도는 아니지만, VR이 가지는 새로운 형태의 디스플레이가 새로운 사용자 경험을 만들어줄 것으로 생각합니다. 그런 면에서 VR기기는 PC나 스마트폰과 굉장히 유사합니다. PC에서 모니터, 스마트폰에서 터치패널이 그랬듯, 디스플레이 기반의 VR기기도 새로운 플랫폼을 만드는 하드웨어에 중요한 요소인 다양한 사용자 경험이 있습니다.

하지만 지금 시장에 나와있는 VR기기는 과거 PDA폰을 보는 것 같습니다. 투박하고 과도기적 제품이며, 사람들을 끌어모으기 쉽지 않습니다. 이걸 개선하고 혁신적이면서 매력적인 메타버스 접속 기기를 출시한다면 또 다른 애플같은 회사가 나올 수 있지 않을까요? 메타버스는 여기서 마치 미리 그어진 결승선 역할을 하고 있네요.

☑️휴머노이드

PC와 스마트폰, VR기기가 화면 기반의 사용자 경험(UX)을 주는 제품이라면, 휴머노이드는 그 궤를 넘어 휴머노이드의 행동으로 사용자경험을 주는 제품이 될 것이라 예상합니다. 휴머노이드 위에서는 앱스토어같은 행동 어플리캐이션 판매 플랫폼이 큰 성공을 거둘만 합니다. 행동 어플리캐이션의 종류로 친구 어플, 가사도우미 어플, 심지어 애인 어플은 누구에게나 많이 매력적입니다. 개인적으로 가능성을 높게 보고, 제 창업 아이템으로 생각하고 있었는데, 얼마전 Tesla에서 일론머스크가 휴머노이드를 만든다고 발표하니 괜히 아쉬워한 경험이 있네요.

결론

석탄과 석유의 무한한 활용도가 만들어낸 거대한 시장과 사회의 변화만큼 PC와 스마트폰의 무한한 활용도도 그만큼의 삶의 변화를 만들어냈습니다. 역사에서 거대한 부를 만드는 과정에서 변하지 않는 건 제품의 활용도와 플랫폼 독점입니다. 독점적인 플랫폼은 막대한 부를 가져다 주지만, 플랫폼을 여는 건 새로운 제품이고, 자유로운 플랫폼을 만드는 능력입니다.

하지만 이런 플랫폼의 독점적인 지위가 대부분의 사람들에게 필수불가결한 서비스가 될 때 국가와 마찰을 빚기도 합니다. 예전에는 ISP와 같은 하드웨어 플랫폼에 제한되는 이야기였다면, 요즘에는 앱스토어, 타다같은 소프트웨어 플랫폼에도 관여되고 있습니다. 이에 관련해선 옳고 그름을 가리기 힘들기 때문에 관련 링크를 하나 첨부하는 걸로 대신하겠습니다.

왜 인터넷은 근본부터 글러먹었는가: 코로나19와 한국 인터넷의 해외접속 장애, 그리고 넷플릭스 전쟁에 관한 이야기 | by Daniel Hong | Medium (unifiedh.com)

서울 작은 원룸에서 큰 꿈을 꾸는 개발자가 평소에 하는 생각을 한번 글로 정리해보았습니다 좋은 하루 되세요

JS symbol

Seonglae Cho — Thu, 27 May 2021 02:11:10 GMT

This is JS Symbol Docuemnt

Symbol - JavaScript | MDN

Javascript implements iteration(for and for await and so on..) by setting Iterable protocol using Symbol Symbol.iterator and Symbol.asyncIterator

for example

const iterable = {}

iterable1[Symbol.iterator] = function* () {
  yield 1
  yield 2
  yield 3
}

for (const a of iterable) console.log(a)

// 1
// 2
// 3

There is more information

Iteration protocols

There is Local Symbol and Global Symbol

Local Symbol

let’s make one simple js custom protocol by using local symbol

object prototype is not recommended (you’re case will be class or function) but for example

const symbol = Symbol('blabla')

Object.prototype.blablaProtocol = function(){
  const symbol = Symbol.for('blabla')
  if (this[symbol] !== undefined) console.log(this[symbol])
  else throw new Error()

  // just do something with that
}

and make instance like this

const blablable = {}
const another = {}

blablable[symbol] = "hello protocol!"
blablable.blablaProtocol()
another.blablaProtocol() // Error

Global Symbol

Symbol.for is static method to make and get global symbol which return global symbol with given key or make new symbol

let’s see a code

symbol = Symbol.for('blabla')
symbol === Symbol.for('blabla') // true
Symbol('blabla') === Symbol.for('blabla') // false

Global symbol is stored in GSR(global symbol registry) which are managed by JS Engine. So Global symbol can be accessed by another window like iframe

You can implement compare variable like this.

Symbol.for('blabla')
const ADMIN = Symbol.for('blabla')

There is many ways to make unique variable, but symbol is very basic and native way to make unique variable