<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Abizar Egi on Medium]]></title>
        <description><![CDATA[Stories by Abizar Egi on Medium]]></description>
        <link>https://medium.com/@abizaregi21?source=rss-48afad7fdbcc------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*N3545JKHVx5YyERCKUMfLg.png</url>
            <title>Stories by Abizar Egi on Medium</title>
            <link>https://medium.com/@abizaregi21?source=rss-48afad7fdbcc------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 28 May 2026 04:56:35 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@abizaregi21/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Responsive Portfolio Website with HTML, CSS, Javascript and Wordpress Website]]></title>
            <link>https://medium.com/@abizaregi21/responsive-portfolio-website-with-html-css-javascript-and-wordpress-website-7d2abc88ab94?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/7d2abc88ab94</guid>
            <category><![CDATA[portfolio]]></category>
            <category><![CDATA[website]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Fri, 28 Jan 2022 08:01:18 GMT</pubDate>
            <atom:updated>2022-01-28T08:06:08.053Z</atom:updated>
            <content:encoded><![CDATA[<ul><li><a href="https://portfolio.abizaregi.repl.co/">Abizar Egi</a></li><li><a href="https://www.abizaregi.my.id/">Portfolio Website - Abizar Egi Mahendra</a></li><li><a href="https://www.abizaregi.my.id/blog/">Blog - Portfolio Website</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7d2abc88ab94" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Sales Data Analysis with Python (Part II)]]></title>
            <link>https://medium.com/@abizaregi21/sales-data-analysis-with-python-part-ii-f09ddf134f9d?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/f09ddf134f9d</guid>
            <category><![CDATA[keras]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[sales]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Thu, 16 Sep 2021 09:58:31 GMT</pubDate>
            <atom:updated>2021-09-16T09:58:31.450Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vYwJ3WfUYb9INKZAXi4-TQ.jpeg" /><figcaption>source image: <a href="https://www.digitalcommerce360.com/wp-content/uploads/2020/11/shutterstock_1361019032-1024x493.jpg">https://www.digitalcommerce360.com/wp-content/uploads/2020/11/shutterstock_1361019032-1024x493.jpg</a></figcaption></figure><p>Jika pada part I membahas mengenai eksplorasi data penjualan, pada part II akan dilakukan prediksi data penjualan pada bulan depan dengan data yang telah diolah pada part I. Pada sales data analysis ini diperlukan beberapa package dan model sebagai berikut:</p><pre>import pandas as pd<br>import numpy as np<br>import matplotlib.pyplot as plt<br>import seaborn as sns<br>from sklearn.model_selection import train_test_split<br>from sklearn.linear_model import LinearRegression, Lasso, Ridge<br>from sklearn.metrics import mean_absolute_error, mean_squared_error<br>import pickle<br>from tensorflow import keras</pre><pre>df = pd.read_csv(&#39;../Data ML/Sales Harian 2019&#39;)<br>df.drop(&#39;Order Date&#39;, axis=1, inplace=True)<br>df.head()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/775/1*WVRPJTfZdlG70x3G3_nukg.png" /></figure><pre>df_predict = pd.read_csv(&#39;../Data ML/Harian Bulan Januari&#39;).drop(&#39;Unnamed: 0&#39;, axis=1)<br>df_result = pd.Series(lr.predict(df_predict), pd.date_range(start=&#39;1-1-2020&#39;, end=&#39;31-01-2020&#39;, freq=&#39;1D&#39;))<br>df_result.tail()</pre><pre>Output:<br><strong>2020-01-27    69245.403389<br>2020-01-28    69145.125634<br>2020-01-29    69044.847879<br>2020-01-30    68944.570123<br>2020-01-31    68844.292368<br>Freq: D, dtype: float64</strong></pre><p>Menvisualisasikan pendapatan harian tahun 2019 dan pendapatan harian bulan januari 2020</p><pre>plt.figure(figsize=(10,8))<br>df[&#39;Price Total&#39;].append(np.log(df_result)).plot(label=&#39;Prediksi (log)&#39;)<br>df[&#39;Price Total&#39;].append(df_result).plot(label=&#39;Prediksi&#39;)<br>df[&#39;Price Total&#39;].plot(label=&#39;Data Actual&#39;)<br>plt.title(&#39;Pendapatan Harian (2019 + Prediksi pada Januari 2020&#39;)<br>plt.ylabel(&#39;Pendapatan ($)&#39;)<br>plt.grid()<br>plt.legend()<br>plt.savefig(&#39;../Output/Pendapatan Harian dan Prediksi januari 2020&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*O2S3HGtKCrLdfY81Jab34Q.png" /></figure><p>Grafik diatas memberikan prediksi dan prediksi dalam bentuk logaritma pada penjualan harian di bulan januari 2020.</p><p>Sebelum dilakukan prediksi, perlu melakukan split data menjadi data training dan data testing, kemudian membentuk sebuah model yang akan digunakan untuk modeling.</p><pre>def split_sequence(sequence, n_steps=3):<br>    sequence = list(sequence)<br>    X, Y = list(), list()<br>    for i in range(len(sequence)):<br>        end_ix = i + n_steps<br>        if end_ix &gt; len(sequence)-1:<br>            break<br>        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]<br>        X.append(seq_x)<br>        Y.append(seq_y)<br>    def reshape(d):<br>        d = np.array(d)<br>        d = np.reshape(d,(d.shape[0], d.shape[1],1))<br>        return d<br>    return reshape(X), np.array(Y)</pre><pre>train_data = df[&#39;Price Total&#39;].iloc[:250]<br>test_data = df[&#39;Price Total&#39;].iloc[250:]</pre><pre>x_train, y_train = split_sequence(train_data)<br>x_test, y_test = split_sequence(test_data)</pre><pre>model = keras.Sequential([<br>    keras.layers.LSTM(64, input_shape=(3,1,), activation=&#39;relu&#39;, return_sequences=True),<br>    keras.layers.LSTM(64, activation=&#39;relu&#39;),<br>    keras.layers.Dense(1)<br>])</pre><pre>model.compile(loss=&#39;mse&#39;, optimizer=&#39;adam&#39;)<br>model.summary()</pre><pre>Output:<br><strong>Model: &quot;sequential&quot;<br>_________________________________________________________________<br>Layer (type)                 Output Shape              Param #   <br>=================================================================<br>lstm (LSTM)                  (None, 3, 64)             16896     <br>_________________________________________________________________<br>lstm_1 (LSTM)                (None, 64)                33024     <br>_________________________________________________________________<br>dense (Dense)                (None, 1)                 65        <br>=================================================================<br>Total params: 49,985<br>Trainable params: 49,985<br>Non-trainable params: 0<br>_________________________________________________________________</strong></pre><p>Melakukan modeling</p><pre>stoping = keras.callbacks.EarlyStopping(monitor=&#39;loss&#39;, patience=3)<br>history = model.fit(x_train, y_train, epochs=100, batch_size=32, callbacks=[stoping], verbose=2)</pre><pre>Output:<br><strong>Epoch 1/100<br>8/8 - 2s - loss: 7164496896.0000<br>Epoch 2/100<br>8/8 - 0s - loss: 6721865728.0000<br>Epoch 3/100<br>8/8 - 0s - loss: 6399132160.0000<br>Epoch 4/100<br>8/8 - 0s - loss: 6050901504.0000<br>Epoch 5/100<br>8/8 - 0s - loss: 5638317568.0000<br>Epoch 6/100<br>8/8 - 0s - loss: 5142379008.0000<br>Epoch 7/100<br>8/8 - 0s - loss: 4539011072.0000<br>Epoch 8/100<br>8/8 - 0s - loss: 3798559488.0000<br>Epoch 9/100<br>8/8 - 0s - loss: 2954702336.0000<br>Epoch 10/100<br>8/8 - 0s - loss: 2144060160.0000<br>Epoch 11/100<br>8/8 - 0s - loss: 1340466560.0000<br>Epoch 12/100<br>8/8 - 0s - loss: 392662464.0000<br>Epoch 13/100<br>8/8 - 0s - loss: 183184576.0000<br>Epoch 14/100<br>8/8 - 0s - loss: 114023488.0000<br>Epoch 15/100<br>8/8 - 0s - loss: 132803712.0000<br>Epoch 16/100<br>8/8 - 0s - loss: 105440448.0000<br>Epoch 17/100<br>8/8 - 0s - loss: 94067408.0000<br>Epoch 18/100<br>8/8 - 0s - loss: 97465088.0000<br>Epoch 19/100<br>8/8 - 0s - loss: 95151520.0000<br>Epoch 20/100<br>8/8 - 0s - loss: 94614400.0000</strong></pre><p>Menvisualisasikan perkembangan model</p><pre>plt.plot(history.history[&#39;loss&#39;], marker=&#39;.&#39;)<br>plt.title(&#39;Grafik perkembangan Model&#39;)<br>plt.xlabel(&#39;Epochs&#39;)<br>plt.ylabel(&#39;Error (MSE)&#39;)<br>plt.grid()<br>plt.savefig(&#39;../Output/Training NN Model&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*fr4ueHa3RxCmvrSARR6v6w.png" /></figure><pre>plt.figure(figsize=(10,8))<br>plt.plot(model.predict(x_test), label=&#39;Prediction&#39;)<br>plt.plot(y_test, label=&#39;Actual&#39;)<br>plt.legend()<br>plt.grid()<br>plt.title(&#39;Data Prediksi vs Data Actual&#39;)<br>plt.xlabel(&#39;Waktu&#39;)<br>plt.ylabel(&#39;Pendapatan ($)&#39;)<br>plt.savefig(&#39;../Output/Demonstrasi Prediksi NN Model&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*EOL61XpMyFjdhb0c1Ctffg.png" /></figure><p>Data prediksi dengan data aktual tidak terdapat perbedaan yang sangat menonjol. Kedua tren data menunjukkan tren yang hampir sama, sehingga model dapat digunakan dengan baik untuk memprediksi penjualan yang akan datang.</p><pre>def predict_future(shift_count):<br>    def reshape(three):<br>        return np.array(three).reshape(1,3,1)<br>    array = list(df[&#39;Price Total&#39;]) + []<br>    now = len(df[&#39;Price Total&#39;])-3<br>    last = len(df[&#39;Price Total&#39;])<br>    for _ in range(shift_count):<br>        converted = reshape(array[now:last])<br>        array.append(model.predict(converted)[0][0])<br>        now += 1<br>        last += 1<br>    return array</pre><pre>future_prediction = predict_future(30)</pre><pre>plt.figure(figsize=(10,5))<br>plt.plot(np.arange(29,60), future_prediction[-31:], &#39;--&#39;, label=&#39;Prediksi&#39;)<br>plt.plot(np.arange(30), df[&#39;Price Total&#39;][-30:], label=&#39;Data Aktual&#39;)<br>plt.title(&#39;Prediksi Pendapatan dalam 30 hari ke depan&#39;)<br>plt.grid()<br>plt.savefig(&#39;../Output/Prediksi Dengan NN Model&#39;)<br>plt.legend();</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*ZyXvd6gGzhoseNXViG-NJQ.png" /></figure><p>Garis biru menunjukkan prediksi penjualan 30 hari kedepan. Berdasarkan data prediksi, penjualan mengalami tren positif disetiap harinya hingga tertinggi pada hari ke-30 akhir.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f09ddf134f9d" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Sales Data Analysis with Python (Part I)]]></title>
            <link>https://medium.com/@abizaregi21/sales-data-analysis-with-python-9bb25754eb29?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/9bb25754eb29</guid>
            <category><![CDATA[sales]]></category>
            <category><![CDATA[word-cloud]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[data-analysis]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Thu, 16 Sep 2021 06:52:47 GMT</pubDate>
            <atom:updated>2021-09-16T10:02:53.902Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7TfEE94zd3R4d4AdgAXcVA.jpeg" /><figcaption>source image: <a href="https://computermarketresearch.com/channel-data-analytics-software/">https://computermarketresearch.com/channel-data-analytics-software/</a></figcaption></figure><p>Sales data analysis merupakan analisis pada data penjualan suatu perusahaan dengan mengambil nilai-nilai yang terkandung didalam hasil eksplorasi data untuk keperluan prospek bisnis kedepannya. Dengan menggunakan algoritma tertentu pada bahasa pemrograman python, kita juga dapat melakukan prediksi penjualan pada bulan berikutnya. berikut ini package yang diperlukan pada sales data analysis:</p><pre>import os<br>import pandas as pd<br>import numpy as np<br>import matplotlib.pyplot as plt<br>import seaborn as sns <br>from wordcloud import WordCloud</pre><p>untuk membaca beberapa data dalam satu file yang sama, kita dapat menggunakan code sebagai berikut:</p><pre>dataset = [f&#39;../Data/{i}&#39; for i in os.listdir(&#39;../Data&#39;)]</pre><p>sementara untuk menggabungkan data-data penjualan yang kita miliki menjadi satu buah data agar lebih efektif untuk dilakukan analisis, dapat menggunakan code:</p><pre>li = []<br>for data in dataset:<br>    df_data = pd.read_csv(data, index_col=None, header=0)<br>    li.append(df_data)<br>df = pd.concat(li, axis=0, ignore_index=True)<br>df.head()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9zlDzSvMKOCsj26vCsezMg.png" /></figure><pre>df[&#39;Quantity Ordered&#39;].unique()</pre><pre>Output:<br>array([&#39;2&#39;, nan, &#39;1&#39;, &#39;3&#39;, &#39;5&#39;, &#39;Quantity Ordered&#39;, &#39;4&#39;, &#39;7&#39;, &#39;6&#39;, &#39;8&#39;, &#39;9&#39;], dtype=object)</pre><p>Data dengan nilai unik ‘Quantity Ordered’ dan nan akan menghambat dalam proses analisis dan prediksi, maka data dengan nilai unik ‘Quantity Ordered’ dan data dengan nilai nan perlu dilakukan drop, dengan code sebagai berikut:</p><pre>df = df[df[&#39;Quantity Ordered&#39;] != &#39;Quantity Ordered&#39;]<br>df = df.dropna(how=&#39;all&#39;)<br>df.reset_index(drop=True, inplace=True)<br>print(df.head())<br>print(df.info())</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IfkSdDnoQn9jang89Kmd4g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/998/1*23YEQyJ6gOm2hyReEAjdRQ.png" /></figure><p>seluruh type data berupa object, hal ini akan berpotensi mengalami error saat dilakukan analisis, maka data perlu diubah menjadi tipe data yang seharusnya:</p><pre>df[&#39;Order Date&#39;] = pd.to_datetime(df[&#39;Order Date&#39;])<br>col_int = [&#39;Order ID&#39;, &#39;Price Each&#39;, &#39;Quantity Ordered&#39;]</pre><pre>for col in col_int:<br>    df[col] = pd.to_numeric(df[col])<br>for col in [&#39;Product&#39;, &#39;Purchase Address&#39;]:<br>    df[col] = df[col].astype(np.str)</pre><pre>df.info()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/980/1*njaum6kaBUbruItRbr2jCA.png" /></figure><p>selanjutnya melakukan split data pada datetime order date dan melakukan encoder pada column product dan month-year:</p><pre>df[&#39;Day&#39;] = pd.DatetimeIndex(df[&#39;Order Date&#39;]).day<br>df[&#39;Month&#39;] = pd.DatetimeIndex(df[&#39;Order Date&#39;]).month<br>df[&#39;Year&#39;] = pd.DatetimeIndex(df[&#39;Order Date&#39;]).year<br>df[&#39;Month-Year&#39;] = df[&#39;Order Date&#39;].apply(lambda x: x.strftime(&#39;%Y-%m&#39;))<br>df[&#39;Price Total&#39;] = df[&#39;Quantity Ordered&#39;] * df[&#39;Price Each&#39;]<br>df = df.sort_values(by=[&#39;Order Date&#39;])<br>df.head()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XyRz0bdVp7DnladNFwNLbw.png" /></figure><pre>from sklearn.preprocessing import LabelEncoder</pre><pre>encoder = LabelEncoder()<br>df[&#39;Product_Encoded&#39;] = encoder.fit_transform(df[&#39;Product&#39;])<br>df[&#39;Month_Year_Encoded&#39;] = encoder.fit_transform(df[&#39;Month-Year&#39;])<br>df.head()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6lzf4nyunoF6WNEzw20bAg.png" /></figure><p>Untuk melihat product terjual terbanyak dapat menggunakan visualisasi wordcloud:</p><pre>wordcloud = WordCloud(max_font_size=50, max_words=100, background_color=&#39;white&#39;).generate(&#39; &#39;.join(df[&#39;Product&#39;]))</pre><pre>plt.subplots(figsize=(10,8))<br>plt.imshow(wordcloud, interpolation=&#39;bilinear&#39;)<br>plt.axis(&#39;off&#39;)<br>plt.title(&#39;Most Product Sold (by Count)&#39;)<br>plt.savefig(&#39;../Output/Product Count&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*PGDuYFRkXzFl9KpjOQwW4g.png" /></figure><p>Berdasarkan grafik wordcloud dapat terlihat bahwa Charging Cable adalah product terbanyak yang terjual. Kemudian USB C, batteries pack, C Charging, dan item-item lainnya.</p><p>Sementara untuk melihat product yang terjual dengan pendapatan tertinggi atau dapat dilihat pada kolom price total dapat menggunakan code sebagai berikut:</p><pre>df_sales = df.groupby(&#39;Product&#39;).sum()[[&#39;Quantity Ordered&#39;, &#39;Price Total&#39;]]<br>df_sales.sort_values(by=[&#39;Price Total&#39;], ascending=False).head()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/821/1*S0pYOF0tQhLiJKrB0GMbEw.png" /></figure><p>berikut ini kode untuk mengetahui pendapatan per bulan secara descending dan visualisasi pada penjualan per bulan</p><pre>df_month_year = df.groupby(&#39;Month-Year&#39;).sum()[[&#39;Quantity Ordered&#39;, &#39;Price Total&#39;]]<br>df_month_year = df_month_year.iloc[:-1]<br>print(df_month_year.sort_values(by=[&#39;Price Total&#39;], ascending=False))</pre><pre>sales = df.groupby(&#39;Month-Year&#39;).sum()[&#39;Price Total&#39;].round(2)<br>sales.plot(kind=&#39;line&#39;, x=&#39;Month-Year&#39;, y=&#39;Price Total&#39;, figsize=(12,8))<br>plt.legend()<br>plt.grid()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/611/1*9RpkVioEWzGhONFpzA69fA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/697/1*iis-jYkZnvKFwRPRjBnGrg.png" /></figure><pre>fig, ax = plt.subplots(1, 2, figsize=[15, 5])<br>df_month_year[&#39;Price Total&#39;].plot.bar(ax=ax[0])<br>df_month_year[&#39;Price Total&#39;].plot(ax=ax[0], color=&#39;red&#39;, marker=&#39;*&#39;)<br>df_month_year[&#39;Quantity Ordered&#39;].plot.bar(ax=ax[1])<br>df_month_year[&#39;Quantity Ordered&#39;].plot(ax=ax[1], color=&#39;red&#39;, marker=&#39;*&#39;)<br>ax[0].tick_params(labelrotation=90)<br>ax[1].tick_params(labelrotation=90)<br>ax[0].set_xlabel(&#39;Total Pendapatan&#39;)<br>ax[1].set_xlabel(&#39;Unit Terjual&#39;)<br>plt.suptitle(&#39;Penjualan per Bulan&#39;)<br>plt.savefig(&#39;../Output/Grafik Penjualan per Bulan&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dd-6VQnBdcRFTakH1wJeHw.png" /></figure><pre>df[&#39;Purchase Address City&#39;] = df[&#39;Purchase Address&#39;].apply(lambda x: x.split(&#39;,&#39;)[1][1:])</pre><pre>def cityProduct(city):<br>    return &#39; ,&#39;.join(df[&#39;Product&#39;][df[&#39;Purchase Address City&#39;] == city].value_counts()[:3].index)</pre><pre>df_city = df.groupby(&#39;Purchase Address City&#39;).sum()[[&#39;Quantity Ordered&#39;, &#39;Price Total&#39;]].sort_values(by=&#39;Price Total&#39;, ascending=False)<br>df_city[&#39;Top 3 Product&#39;] = list(map(cityProduct, df_city.index))<br>df_city.head()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/836/1*ZVMjl6vuTPAdtnzWJLVJug.png" /></figure><pre>Qty = df.groupby(&#39;Product&#39;).sum()[&#39;Quantity Ordered&#39;].sort_values(ascending=False).head()<br>Qty = pd.DataFrame(Qty)<br>Qty</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/560/1*L6XZXaI5CAZ1-IrXgNdXVw.png" /></figure><pre>df_string_date = df.copy()<br>df_string_date[&#39;Order Date&#39;] = df[&#39;Order Date&#39;].dt.date.astype(np.str)<br>df_string_date = df_string_date.groupby(&#39;Order Date&#39;).sum().iloc[:-1]</pre><pre>plt.figure(figsize=(10, 8))<br>df_string_date[&#39;Price Total&#39;].plot()<br>plt.title(&#39;Pendapatan Harian (2019)&#39;)<br>plt.ylabel(&#39;Pendapatan ($)&#39;)<br>plt.grid()<br>plt.savefig(&#39;../Output/Pendapatan Harian.png&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*1qPkl3-7hfPWa6uPJFaQwg.png" /></figure><p>tahapan terakhir dalam sales data analysis yaitu menyimpan data olahan menjadi format csv agar dapat dilakukan prediksi penjualan pada bulan depan.</p><pre>df_string_date[&quot;Day Order&quot;] = pd.DatetimeIndex(df_string_date.index).day<br>df_string_date[&quot;Month Order&quot;]= pd.DatetimeIndex(df_string_date.index).month<br>df_string_date[&quot;Year Order&quot;] = pd.DatetimeIndex(df_string_date.index).year</pre><pre>df_ml = df_string_date.copy()<br>df_ml = df_ml[[&#39;Day Order&#39;,&#39;Month Order&#39;,&#39;Year Order&#39;,&#39;Price Total&#39;]]<br>df_ml.to_csv(&#39;../Data ML/Sales Harian 2019&#39;) #Digunakan untuk menyimpan data dengan format csv untuk dilakukan prediksi</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9bb25754eb29" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Predicting Exited / Churn for Bank Customers]]></title>
            <link>https://medium.com/@abizaregi21/predicting-exited-churn-for-bank-customers-7d02a72ce510?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/7d02a72ce510</guid>
            <category><![CDATA[gradient-boosting]]></category>
            <category><![CDATA[logistic-regression]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[random-forest]]></category>
            <category><![CDATA[churn]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Mon, 13 Sep 2021 08:52:15 GMT</pubDate>
            <atom:updated>2021-09-13T08:52:15.754Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/619/1*NopMiYUYt9Loihhn9Soxsg.jpeg" /><figcaption>image source: <a href="https://www.balipolitika.com/ini-10-bank-yang-eksis-di-masa-pandemi-covid-19/">https://www.balipolitika.com/ini-10-bank-yang-eksis-di-masa-pandemi-covid-19/</a></figcaption></figure><p>Predicting Exited / Churn for Bank Customers brtujuan untuk memprediksi potensi customer berpindah ke kompetitor atau dalam hal ini ke bank lain. Data merupakan data dummy yang diperoleh dari kaggle.com, data terdiri dari informasi customer dan informasi pinjaman. pada prediksi kali ini kita akan menggunakan beberapa package dan model dari python sebagai berikut:</p><pre>import numpy as np<br>import pandas as pd<br>import matplotlib.pyplot  as plt<br>import seaborn as sns<br>from sklearn.preprocessing import LabelEncoder<br>from sklearn.model_selection import train_test_split<br>from sklearn.linear_model import LogisticRegression<br>from sklearn.ensemble import RandomForestClassifier<br>from sklearn.ensemble import GradientBoostingClassifier<br>from sklearn.metrics import confusion_matrix, classification_report</pre><pre>data = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Predicting Churn for Bank Customer/Churn_Modelling.csv&#39;)<br>print(data.head())<br>print(data.info())</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VVD-ZLI38ZQtxBbsHspYbA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/954/1*eoZGMjnTgbXLVNZ9ynYl2g.png" /></figure><p>Sebelum dilakukan modeling perlu dilakukan Exploratory Data Analysis untuk melihat sebaran data. EDA dapat dilakukan dengan membuat visualisasi pada data yang akan dieksplorasi.</p><pre>import matplotlib.pyplot  as plt<br>fig = plt.figure()<br>ax= fig.add_axes([0,0,1,1])<br>ax.axis(&#39;equal&#39;)<br>labels=[&#39;no (0)&#39;,&#39;yes (1)&#39;]<br>churn=data.Exited.value_counts()<br>ax.pie(churn, labels=labels, autopct=&#39;%.0f%%&#39;)<br>plt.savefig(&#39;Churn.png&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*W8oI7s8FsoFygRMJGSUkOQ.png" /></figure><p>visualisasi pie chart menunjukkan sebesar 20% customer keluar atau berpindah ke bank lain, dan sebanyak 80% masih tetap menjadi pelanggan.</p><pre>num_columns = [&#39;CreditScore&#39;,&#39;Tenure&#39;,&#39;Balance&#39;,&#39;EstimatedSalary&#39;]</pre><pre>fig, ax = plt.subplots(1,4,figsize=(20, 5))<br>data[data.Exited==0][num_columns].hist(bins=20,color=&#39;blue&#39;,alpha=0.5,ax=ax)<br>data[data.Exited==1][num_columns].hist(bins=20,color=&#39;purple&#39;,alpha=0.5,ax=ax)<br>plt.savefig(&#39;EDA.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*llcNoNBzuaDOtHBgPtCXiw.png" /></figure><pre>sns.set(style=&#39;darkgrid&#39;)<br>fig, ax = plt.subplots(2,3,figsize=(14,12))<br>sns.countplot(data=data, x=&#39;Gender&#39;, hue=&#39;Exited&#39;, ax=ax[0][0])<br>sns.countplot(data=data, x=&#39;Geography&#39;, hue=&#39;Exited&#39;, ax=ax[0][1])<br>sns.countplot(data=data, x=&#39;NumOfProducts&#39;, hue=&#39;Exited&#39;, ax=ax[0][2])<br>sns.countplot(data=data, x=&#39;HasCrCard&#39;, hue=&#39;Exited&#39;, ax=ax[1][0])<br>sns.countplot(data=data, x=&#39;IsActiveMember&#39;, hue=&#39;Exited&#39;, ax=ax[1][1])<br>plt.tight_layout()<br>plt.savefig(&#39;EDA(1).png&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1008/1*mogUvUcTh447iW4B6EKm1w.png" /></figure><p>langkah selanjutnya, melakukan drop column yang tidak diperlukan untuk tahapan modeling</p><pre>cleaned_data = data.drop(columns=[&#39;RowNumber&#39;,&#39;CustomerId&#39;,&#39;Surname&#39;])</pre><pre>for i in cleaned_data.columns:<br>    if cleaned_data[i].dtype==np.number:<br>        continue<br>    cleaned_data[i] = LabelEncoder().fit_transform(cleaned_data[i])</pre><pre>X = cleaned_data.drop(&#39;Exited&#39;, axis=1)<br>y = cleaned_data[&#39;Exited&#39;]</pre><pre>x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)<br>print(&#39;Data Training:\n&#39;,x_train.shape)<br>print(y_train.value_counts(normalize=True))<br>print(&#39;\nData Testing: \n&#39;,x_test.shape)<br>print(y_test.value_counts(normalize=True))</pre><pre>Output:<br>Data Training:<br> (7000, 10)<br>0    0.792429<br>1    0.207571<br>Name: Exited, dtype: float64</pre><pre>Data Testing: <br> (3000, 10)<br>0    0.805333<br>1    0.194667<br>Name: Exited, dtype: float64</pre><p>Setelah melakukan split menjadi data training dan data test, tahapan berikutnya adalah melakukan prediksi dengan algoritma logistic regression, random forest classifier, dan gradient boosting classifier.</p><h3>Logistic Regression</h3><pre>lr = LogisticRegression().fit(x_train, y_train)<br>y_train_pred = lr.predict(x_train)<br>print(classification_report(y_train, y_train_pred))</pre><pre>Output:</pre><pre><strong>                 precision  recall  f1-score   support</strong></pre><pre><strong>           0       0.80      0.97      0.88      5547<br>           1       0.34      0.06      0.10      1453</strong></pre><pre><strong>    accuracy                           0.78      7000<br>   macro avg       0.57      0.51      0.49      7000<br>weighted avg       0.70      0.78      0.71      7000</strong></pre><p>membuat visualisasi heatmap pada confusion matrix data training</p><pre>confusion_matrix_train = pd.DataFrame((confusion_matrix(y_train, y_train_pred)), (&#39;No Exited&#39;, &#39;Exited&#39;), (&#39;No Exited&#39;, &#39;Exited&#39;))</pre><pre>plt.figure(figsize=(6,5))<br>heatmaps = sns.heatmap(confusion_matrix_train, annot=True, annot_kws={&#39;size&#39;:11}, fmt=&#39;d&#39;, cmap=&#39;YlGnBu&#39;)<br>heatmaps.yaxis.set_ticklabels(heatmaps.yaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>heatmaps.xaxis.set_ticklabels(heatmaps.xaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>plt.title(&#39;Confusion matrix for training model\n w/ Logistic regression\n&#39;, fontsize=16, color=&#39;darkblue&#39;)<br>plt.ylabel(&#39;True label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.xlabel(&#39;\nPredicted Label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.savefig(&#39;Confusion matrix for training model with Logistic regression.png&#39;)</pre><pre>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*FGUy-9d-KMo84oitzjSV4g.png" /></figure><p>Accuracy pada taining model menggunakan algoritma logistic <br>regression sebesar 78%. Berdasarkan confusion matrix pada <br>training model diperoleh hasil:<br>* Prediksi no exited yang sebenarnya exited sebanyak 1367<br>* Prediksi no exited yang benar sebanyak 5381<br>* Prediksi exited yang sebenarnya no exited sebanyak 166<br>* Prediksi exited yang benar sebanyak 86</p><pre>y_test_pred = lr.predict(x_test)<br>print(classification_report(y_test, y_test_pred))</pre><pre><strong>                precision    recall  f1-score   support</strong></pre><pre><strong>           0       0.82      0.97      0.88      2416<br>           1       0.41      0.09      0.15       584</strong></pre><pre><strong>    accuracy                           0.80      3000<br>   macro avg       0.61      0.53      0.52      3000<br>weighted avg       0.74      0.80      0.74      3000</strong></pre><p>membuat visualisasi heatmap pada confusion matrix data testing</p><pre>confusion_matrix_test = pd.DataFrame((confusion_matrix(y_test, y_test_pred)),(&#39;No Exited&#39;, &#39;Exited&#39;),(&#39;No Exited&#39;, &#39;Exited&#39;))</pre><pre>plt.figure(figsize=(6,5))<br>heatmaps = sns.heatmap(confusion_matrix_test, annot=True, annot_kws={&#39;size&#39;:11}, fmt=&#39;d&#39;, cmap=&#39;YlGnBu&#39;)<br>heatmaps.yaxis.set_ticklabels(heatmaps.yaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>heatmaps.xaxis.set_ticklabels(heatmaps.xaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>plt.title(&#39;Confusion matrix for testing model\n w/ Logistic regression\n&#39;, fontsize=16, color=&#39;darkblue&#39;)<br>plt.ylabel(&#39;True label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.xlabel(&#39;\nPredicted Label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.savefig(&#39;Confusion matrix for testing model with Logistic regression.png&#39;)</pre><pre>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*-A6XIyzvq_Zi1w5lQil_bQ.png" /></figure><p>Accuracy pada testing model menggunakan algoritma logistic <br>regression sebesar 80%. Berdasarkan confusion matrix pada <br>testing model diperoleh hasil:<br>* Prediksi no exited yang sebenarnya exited sebanyak 529<br>* Prediksi no exited yang benar sebanyak 2336<br>* Prediksi exited yang sebenarnya no exited sebanyak 80<br>* Prediksi exited yang benar sebanyak 55</p><h3>Random Forest Classifier</h3><pre>rfc = RandomForestClassifier().fit(x_train, y_train)<br>y_train_pred_rfc = rfc.predict(x_train)<br>print(classification_report(y_train, y_train_pred_rfc))</pre><pre><strong>               precision    recall  f1-score   support</strong></pre><pre><strong>           0       1.00      1.00      1.00      5547<br>           1       1.00      1.00      1.00      1453</strong></pre><pre><strong>    accuracy                           1.00      7000<br>   macro avg       1.00      1.00      1.00      7000<br>weighted avg       1.00      1.00      1.00      7000</strong></pre><p>membuat visualisasi heatmap pada confusion matrix data training</p><pre>confusion_matrix_train_rfc = pd.DataFrame((confusion_matrix(y_train, y_train_pred_rfc)), (&#39;No Exited&#39;, &#39;Exited&#39;), (&#39;No Exited&#39;, &#39;Exited&#39;))</pre><pre>plt.figure(figsize=(6,5))<br>heatmaps = sns.heatmap(confusion_matrix_train_rfc, annot=True, annot_kws={&#39;size&#39;:11}, fmt=&#39;d&#39;, cmap=&#39;YlGnBu&#39;)<br>heatmaps.yaxis.set_ticklabels(heatmaps.yaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>heatmaps.xaxis.set_ticklabels(heatmaps.xaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>plt.title(&#39;Confusion matrix for training model\n w/ Random forest classifier\n&#39;, fontsize=16, color=&#39;darkblue&#39;)<br>plt.ylabel(&#39;True label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.xlabel(&#39;\nPredicted Label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.savefig(&#39;Confusion matrix for training model with Random Forest Classifier.png&#39;)</pre><pre>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*4mk-N4MNGOuoUNbTTAlf7A.png" /></figure><p>Accuracy pada training model menggunakan algoritma Random <br>Forest Classifier sebesar 100%. Berdasarkan confusion matrix <br>pada training model diperoleh hasil:<br>* Prediksi no exited yang sebenarnya exited sebanyak 0<br>* Prediksi no exited yang benar sebanyak 5547<br>* Prediksi exited yang sebenarnya no exited sebanyak 0<br>* Prediksi exited yang benar sebanyak 1453</p><pre>y_test_pred_rfc = rfc.predict(x_test)<br>print(classification_report(y_test, y_test_pred_rfc))</pre><pre><strong>                precision    recall  f1-score   support</strong></pre><pre><strong>           0       0.88      0.97      0.92      2416<br>           1       0.77      0.45      0.57       584</strong></pre><pre><strong>    accuracy                           0.87      3000<br>   macro avg       0.83      0.71      0.74      3000<br>weighted avg       0.86      0.87      0.85      3000</strong></pre><p>membuat visualisasi pada confusion matrix data testing</p><pre>confusion_matrix_test_rfc = pd.DataFrame((confusion_matrix(y_test, y_test_pred_rfc)),(<em>&#39;No Exited&#39;</em>, <em>&#39;Exited&#39;</em>),(<em>&#39;No Exited&#39;</em>, <em>&#39;Exited&#39;</em>))</pre><pre>plt.figure(figsize=(6,5))<br>heatmaps = sns.heatmap(confusion_matrix_test_rfc, annot=True, annot_kws={&#39;size&#39;:11}, fmt=&#39;d&#39;, cmap=&#39;YlGnBu&#39;)<br>heatmaps.yaxis.set_ticklabels(heatmaps.yaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>heatmaps.xaxis.set_ticklabels(heatmaps.xaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>plt.title(&#39;Confusion matrix for testing model\n w/ Random forest classifier\n&#39;, fontsize=16, color=&#39;darkblue&#39;)<br>plt.ylabel(&#39;True label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.xlabel(&#39;\nPredicted Label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.savefig(&#39;Confusion matrix for testing model with Random Forest Classifier.png&#39;)</pre><pre>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*heeVRRRiSfrjim-TdYxyPQ.png" /></figure><p>Accuracy pada testing model menggunakan algoritma Random <br>Forest Classifier sebesar 87%. Berdasarkan confusion matrix <br>pada testing model diperoleh hasil:<br>* Prediksi no exited yang sebenarnya exited sebanyak 322<br>* Prediksi no exited yang benar sebanyak 2339<br>* Prediksi exited yang sebenarnya no exited sebanyak 77<br>* Prediksi exited yang benar sebanyak 262</p><h3>Gradient Boosting Classifier</h3><pre>gbc = GradientBoostingClassifier().fit(x_train, y_train)<br>y_train_pred_gbc = gbc.predict(x_train)<br>print(classification_report(y_train, y_train_pred_gbc))</pre><pre><strong>                precision    recall  f1-score   support</strong></pre><pre><strong>           0       0.88      0.97      0.92      5547<br>           1       0.81      0.48      0.61      1453</strong></pre><pre><strong>    accuracy                           0.87      7000<br>   macro avg       0.85      0.73      0.76      7000<br>weighted avg       0.86      0.87      0.86      7000</strong></pre><p>membuat visualisasi pada confusion matrix data training</p><pre>confusion_matrix_train_gbc = pd.DataFrame((confusion_matrix(y_train, y_train_pred_gbc)), (<em>&#39;No Exited&#39;</em>, <em>&#39;Exited&#39;</em>), (<em>&#39;No Exited&#39;</em>, <em>&#39;Exited&#39;</em>))</pre><pre>plt.figure(figsize=(6,5))<br>heatmaps = sns.heatmap(confusion_matrix_train_gbc, annot=True, annot_kws={&#39;size&#39;:11}, fmt=&#39;d&#39;, cmap=&#39;YlGnBu&#39;)<br>heatmaps.yaxis.set_ticklabels(heatmaps.yaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>heatmaps.xaxis.set_ticklabels(heatmaps.xaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>plt.title(&#39;Confusion matrix for training model\n w/ Gradient boosting classifier\n&#39;, fontsize=16, color=&#39;darkblue&#39;)<br>plt.ylabel(&#39;True label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.xlabel(&#39;\nPredicted Label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.savefig(&#39;Confusion matrix for training model with Gradient Boosting Classifier.png&#39;)</pre><pre>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*fSI9ne3DCGYQcuAMTFQ5Bg.png" /></figure><p>Accuracy pada training model menggunakan algoritma <br>Gradient Boosting Classifier sebesar 87%. Berdasarkan <br>confusion matrix pada training model diperoleh hasil:<br>* Prediksi no exited yang sebenarnya exited sebanyak 750<br>* Prediksi no exited yang benar sebanyak 5387<br>* Prediksi exited yang sebenarnya no exited sebanyak 160<br>* Prediksi exited yang benar sebanyak 703</p><pre>y_test_pred_gbc = gbc.predict(x_test)<br>print(classification_report(y_test, y_test_pred_gbc))</pre><pre><strong>                precision    recall  f1-score   support</strong></pre><pre><strong>           0       0.88      0.97      0.92      2416<br>           1       0.78      0.46      0.58       584</strong></pre><pre><strong>    accuracy                           0.87      3000<br>   macro avg       0.83      0.71      0.75      3000<br>weighted avg       0.86      0.87      0.86      3000</strong></pre><p>membuat visualisasi pada confusion matrix data testing</p><pre>confusion_matrix_test_gbc = pd.DataFrame((confusion_matrix(y_test, y_test_pred_gbc)),(<em>&#39;No Exited&#39;</em>, <em>&#39;Exited&#39;</em>),(<em>&#39;No Exited&#39;</em>, <em>&#39;Exited&#39;</em>))</pre><pre>plt.figure(figsize=(6,5))<br>heatmaps = sns.heatmap(confusion_matrix_test_gbc, annot=True, annot_kws={&#39;size&#39;:11}, fmt=&#39;d&#39;, cmap=&#39;YlGnBu&#39;)<br>heatmaps.yaxis.set_ticklabels(heatmaps.yaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>heatmaps.xaxis.set_ticklabels(heatmaps.xaxis.get_ticklabels(), rotation=0, ha=&#39;right&#39;, fontsize=14)<br>plt.title(&#39;Confusion matrix for testing model\n w/ Gradient boosting classifier\n&#39;, fontsize=16, color=&#39;darkblue&#39;)<br>plt.ylabel(&#39;True label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.xlabel(&#39;\nPredicted Label&#39;, fontsize=14, color=&#39;darkblue&#39;)<br>plt.savefig(&#39;Confusion matrix for testing model with Gradient Boosting Classifier.png&#39;)</pre><pre>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*0PVtGpXDV5U3ktjfm2xSDQ.png" /></figure><p>Accuracy pada testing model menggunakan algoritma Gradient <br>Boosting Classifier sebesar 87%. Berdasarkan confusion matrix <br>pada testing model diperoleh hasil:<br>* Prediksi no exited yang sebenarnya exited sebanyak 316<br>* Prediksi no exited yang benar sebanyak 2342<br>* Prediksi exited yang sebenarnya no exited sebanyak 74<br>* Prediksi exited yang benar sebanyak 268</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7d02a72ce510" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Bank Customer Segmentation with KMeans]]></title>
            <link>https://medium.com/@abizaregi21/bank-customer-segmentation-with-kmeans-d9cf95e297c9?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/d9cf95e297c9</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[customer-segmentation]]></category>
            <category><![CDATA[k-means-clustering]]></category>
            <category><![CDATA[k-means]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Fri, 10 Sep 2021 00:41:47 GMT</pubDate>
            <atom:updated>2021-09-10T00:44:26.984Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/736/1*ELm8eeDuAIAWQP_eGbf9GA.jpeg" /><figcaption>Source Image: <a href="https://www.dictio.id/t/apa-yang-dimaksud-dengan-segmentasi-atau-pembagian-pasar/8098">https://www.dictio.id/t/apa-yang-dimaksud-dengan-segmentasi-atau-pembagian-pasar/8098</a></figcaption></figure><p>Segmentasi customer dilakukan untuk membagi atau mengelompokkan customer dengan karakteristik tertentu yang mirip. Segmentasi customer dapat dilakukan dengan cara clustering menggunakn algoritma KMeans dari bahasa pemrograman python. segmentasi customer kali ini dilakukan pada customer bank. Package yang digunakan sebagai berikut:</p><pre>import numpy as np<br>import pandas as pd<br>import seaborn as sns<br>import matplotlib.pyplot as plt<br>from sklearn.cluster import KMeans</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2eN8Ij-7F9Crj5rIY3azKg.png" /></figure><p>Dataset merupakan data dummy customer pada sebuah bank yang bersumber dari kaggle.com. Data terdiri dari informasi data customer yang bersifat umum dan informasi pinajaman.</p><pre>data = data.drop(columns=[&#39;pdays&#39;], inplace=False)</pre><pre>for col in data.columns:<br>    if data[col].dtype==np.object:<br>        print(col + &quot;: \n&quot;, data[col].value_counts(), &quot;\n&quot;)<br>        continue</pre><pre>Output:<br>job: <br> blue-collar      9536<br>management       8851<br>technician       7223<br>admin.           4810<br>services         4033<br>retired          1880<br>self-employed    1500<br>entrepreneur     1453<br>unemployed       1193<br>housemaid        1178<br>student           718<br>unknown           264<br>Name: job, dtype: int64</pre><pre>marital: <br> married     25868<br>single      11806<br>divorced     4965<br>Name: marital, dtype: int64</pre><pre>education: <br> secondary    22066<br>tertiary     12302<br>primary       6581<br>unknown       1690<br>Name: education, dtype: int64</pre><pre>default: <br> no     41828<br>yes      811<br>Name: default, dtype: int64</pre><pre>housing: <br> yes    24590<br>no     18049<br>Name: housing, dtype: int64</pre><pre>loan: <br> no     35554<br>yes     7085<br>Name: loan, dtype: int64</pre><pre>contact: <br> cellular     27218<br>unknown      12776<br>telephone     2645<br>Name: contact, dtype: int64</pre><pre>month: <br> may    13532<br>jul     6587<br>aug     5987<br>jun     5128<br>nov     3895<br>apr     2718<br>feb     2296<br>jan     1224<br>oct      518<br>sep      282<br>mar      258<br>dec      214<br>Name: month, dtype: int64</pre><pre>poutcome: <br> unknown    36085<br>failure     4271<br>other       1517<br>success      766<br>Name: poutcome, dtype: int64</pre><pre>term_deposit: <br> no     38678<br>yes     3961<br>Name: term_deposit, dtype: int64</pre><p>Eksplorasi data dengan menggunakan visualisasi</p><pre>plt.subplots(figsize=(15,5))<br>sns.countplot(x=&#39;job&#39;, data=data)<br>plt.title(&#39;Distribution of Job&#39;)<br>plt.savefig(&#39;Distribution of Job.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_tIW--TCtceY4j24ImTz-w.png" /></figure><p>Customer dengan pekerjaan sebagai blue-collar memiliki jumlah terbanyak. Customer dengan pekerjaan sebagai management terbanyak setelah blue-collar, dan technician terbanyak ketiga setelah management. Sementara terendah adalah customer dengan status pekerjaan sebagai student</p><pre>sns.countplot(x=&#39;marital&#39;, data=data)<br>plt.title(&#39;Distribution of Martial&#39;)<br>plt.savefig(&#39;Distribution of Martial.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*rajn1Un_yOvawlARGyWlIg.png" /></figure><p>Mayoritas customer bank berstatus menikah, customer dengan status single sebanyak 10.000 lebih. Sementara customer dengan status bercerai (divorced) sekitar 5000.</p><pre>sns.countplot(x=&#39;education&#39;, data=data)<br>plt.title(&#39;Distribution of Education&#39;)<br>plt.savefig(&#39;Distribution of Education.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*GtXqHpL5k56qtCG2irq2zA.png" /></figure><p>Bar chart pada distribusi ‘Education’ menunjukkan bahwa 20.000 lebih customer bank memiliki pendidikan ‘Secondary’</p><pre>sns.countplot(x=&#39;contact&#39;, data=data)<br>plt.title(&#39;Distribution of Contact&#39;)<br>plt.savefig(&#39;Distribution of Contact.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*XYwZye5dSIWrBCmQPVmGJg.png" /></figure><p>Mayoritas customer bank memiliki perangkat komunikasi seluler (cellular), dan masih sebanyak 10.000 lebih customer bank dengan perangkat komunikasi yang tidak diketahui jenisnya.</p><pre>data.hist(&#39;age&#39;, bins=35)<br>plt.title(&#39;Distribution of Age&#39;)<br>plt.ylabel(&#39;Count&#39;)<br>plt.xlabel(&#39;Age&#39;)<br>plt.savefig(&#39;Distribution of Age&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*6T175-4gFkkd2AMQ8oTQkA.png" /></figure><p>Histogram pada distribusi data ‘Age’ menampilkan bahwa mayoritas customer berumur 30 tahun keatas</p><pre>sns.scatterplot(&#39;age&#39;, &#39;balance&#39;, hue=&#39;term_deposit&#39;, data=data)<br>plt.title(&#39;Age to Balance, Colored by Term Deposit&#39;)<br>plt.savefig(&#39;Age to Balance, Colored by Term Deposit.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*xOpP7S6BEiZy5eZY2WOkbw.png" /></figure><p>Mayoritas customer bank masih tidak memiliki term deposit. Customer yang memiliki term deposit rata-rata memiliki balance dibawah 20.000.</p><pre>x = data[[&#39;age&#39;,&#39;balance&#39;]]<br>wcss = []<br>for i in range(1, 11):<br>    km = KMeans(n_clusters = i, init = &#39;k-means++&#39;, random_state = 0)<br>    km.fit(x)<br>    wcss.append(km.inertia_)</pre><pre>plt.plot(range(1, 11), wcss, linewidth=2, color=&#39;blue&#39;, marker=&#39;8&#39;)<br>plt.axvline(x=5, ls=&#39;--&#39;)<br>plt.title(&#39;The Elbow Method&#39;)<br>plt.xlabel(&#39;no of clusters&#39;)<br>plt.ylabel(&#39;wcss&#39;)<br>plt.savefig(&#39;Elbow Method.png&#39;)<br>plt.show</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*hUM645H4GVr8rG9lASVx_w.png" /></figure><p>Dengan menggunakan metode elbow diperoleh kesimpulan bahwa centroids atau titik tengah kelas cluster sebanyak 5 titik</p><pre>model = KMeans(n_clusters=5, init=&#39;k-means++&#39;, random_state=0)<br>clusters = data.copy()<br>clusters[&#39;Cluster_Prediction&#39;] = model.fit_predict(x)<br>clusters.head(10)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9Pi3caaGnZhbGt4NT02-lg.png" /></figure><pre>print(&#39;Before Clustering :\n&#39;)<br>sns.scatterplot(&#39;age&#39;, &#39;balance&#39;, data=data)<br>plt.title(&#39;Age to Balance&#39;)<br>plt.xlabel(&#39;Age&#39;)<br>plt.ylabel(&#39;Balance&#39;)<br>plt.savefig(&#39;Before Clustering.png&#39;)<br>plt.show()</pre><pre>print(&#39;After Clustering :\n&#39;)<br>plt.scatter(x=clusters[clusters[&#39;Cluster_Prediction&#39;]==1][&#39;age&#39;]<br>           , y=clusters[clusters[&#39;Cluster_Prediction&#39;]==1][&#39;balance&#39;],<br>           s=30, edgecolor=&#39;black&#39;, linewidth=0.3, c=&#39;blue&#39;, label=&#39;Cluster 1&#39;)<br>plt.scatter(x=clusters[clusters[&#39;Cluster_Prediction&#39;]==2][&#39;age&#39;]<br>           , y=clusters[clusters[&#39;Cluster_Prediction&#39;]==2][&#39;balance&#39;],<br>           s=30, edgecolor=&#39;black&#39;, linewidth=0.3, c=&#39;red&#39;, label=&#39;Cluster 2&#39;)<br>plt.scatter(x=clusters[clusters[&#39;Cluster_Prediction&#39;]==3][&#39;age&#39;]<br>           , y=clusters[clusters[&#39;Cluster_Prediction&#39;]==3][&#39;balance&#39;],<br>           s=30, edgecolor=&#39;black&#39;, linewidth=0.3, c=&#39;pink&#39;, label=&#39;Cluster 3&#39;)<br>plt.scatter(x=clusters[clusters[&#39;Cluster_Prediction&#39;]==4][&#39;age&#39;]<br>           , y=clusters[clusters[&#39;Cluster_Prediction&#39;]==4][&#39;balance&#39;],<br>           s=30, edgecolor=&#39;black&#39;, linewidth=0.3, c=&#39;deepskyblue&#39;, label=&#39;Cluster 4&#39;)<br>plt.scatter(x=clusters[clusters[&#39;Cluster_Prediction&#39;]==0][&#39;age&#39;]<br>           , y=clusters[clusters[&#39;Cluster_Prediction&#39;]==0][&#39;balance&#39;],<br>           s=30, edgecolor=&#39;black&#39;, linewidth=0.3, c=&#39;purple&#39;, label=&#39;Cluster 5&#39;)</pre><pre>plt.scatter(x=model.cluster_centers_[:,0], y=model.cluster_centers_[:,1], s=30, c=&#39;grey&#39;, label=&#39;Centroids&#39;, edgecolor=&#39;black&#39;, linewidth=0.3)<br>plt.legend(loc=&#39;right&#39;)<br>plt.xlabel(&#39;Age&#39;)<br>plt.ylabel(&#39;Balance&#39;)<br>plt.title(&#39;Clusters&#39;)<br>plt.savefig(&#39;Clustering.png&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*omoExbdOOnK3LCGQt9-HTg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*sDdn2fOxip0IH2DAVShm_Q.png" /><figcaption>Before Clustering | After Clustering</figcaption></figure><pre>Cluster 1 = Customer rata-rata  berumur 20 - 60 dengan balance  10.000 keatas <br>Cluster 2 = Cluster dengan jumlah  customer terendah, akan tetapi  memiliki balance tertinggi  <br>Cluster 3 = Customer berumur 20 -  85 tahun dengan balance 10.000  kebawah <br>Cluster 4 = Cluster rata-rata  berumur 22 - 60 dengan balance  15.000 - 40.000 <br>Cluster 5 = Cluster dengan balance  terendah</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d9cf95e297c9" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Analysis of Return Rate Stock]]></title>
            <link>https://medium.com/@abizaregi21/analysis-of-return-rate-stock-208e40b32979?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/208e40b32979</guid>
            <category><![CDATA[bollinger-bands]]></category>
            <category><![CDATA[macd]]></category>
            <category><![CDATA[stock-market]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[return-rate]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Thu, 09 Sep 2021 14:45:09 GMT</pubDate>
            <atom:updated>2021-09-09T14:47:09.849Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*crDy8dkvHY0fOTzdpFKpAg.jpeg" /></figure><p>Analysis of return rate stock digunakan untuk mengetahui return rate dari saham yang akan kita beli di pasar saham ataupun saham yang sudah ada di portofolio. Data yang digunakan pada analisis adalah data dari yahoo finance dan untuk mendapatkan insight menggunakan bahasa pemrograman python. Pada analisis kali ini kita menggunakan beberapa package sebagai berikut:</p><pre>import re<br>import time<br>import matplotlib<br>import numpy as np<br>import pandas as pd<br>import seaborn as sns<br>import matplotlib.pyplot as plt<br>import urllib.request as urllib2<br>from datetime import datetime, timedelta</pre><pre>def dl():<br>    df = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/^JKSE.csv&#39;, index_col=&#39;Date&#39;, parse_dates=True)<br>    df = df.sort_index(ascending=True)<br>    df = df.tail(80)<br>    # analysis<br>    macd = pd.Series(df[&#39;Adj Close&#39;]).rolling(window=12).mean()<br>    # bollinger bands<br>    movavg = pd.Series(df[&#39;Adj Close&#39;]).rolling(window=20).mean()<br>    movstddev = pd.Series(df[&#39;Adj Close&#39;]).rolling(window=20).std()<br>    upperband = movavg + 2*movstddev<br>    lowerband = movavg - 2*movstddev</pre><pre>    # plot settings<br>    matplotlib.rcParams.update({&#39;font.size&#39;: 8})<br>    s = datetime.now()<br>    plt.subplots(figsize=(20, 5))</pre><pre>    # begin plot<br>    df[&#39;Adj Close&#39;].plot(label=&#39;Close&#39;)<br>    macd.plot(label=&#39;macd&#39;, linestyle=&#39;--&#39;, color=&#39;r&#39;)<br>    upperband.plot(color=&#39;green&#39;)<br>    lowerband.plot(color=&#39;green&#39;)</pre><pre>    plt.title(&#39;Analisis Teknikal MACD dan Bollinger Bands IHSG&#39;)<br>    plt.legend([&#39;Adjusted Close&#39;, &#39;MACD&#39;, &#39;Upper Bollinger&#39;, &#39;Lower  Bollinger&#39;])<br>    plt.xlim(s - timedelta(days=130), s + timedelta(days=7))<br>    plt.ylabel(&#39;Adjusted Close&#39;)<br>    plt.xlabel(&#39;Tanggal&#39;)<br>    <br>    plt.savefig(&#39;ihsg.png&#39;)<br>    plt.show()<br>dl()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7YR8tPkfyEIOl2Zjz1svNQ.png" /></figure><p>Garis MACD (Moving Average Convergence Divergence) menentukan tren harga IHSG. pada tanggal 15 Mei 2021 hingga 1 Juni 2021 tren harga IHSG mengalami penurunan, namun kembali meningkat pada 2 Juni 2021 hingga 15 Juni 2021. Selanjutnya, tren harga IHSG tidak mengalami perubahan signifikan hingga 1 Agustus 2021.</p><p>MACD juga merupakan garis tengah antara dua signal pada bollinger bands. Bollinger Bands umumnya digunakan untuk keputusan sell pada saat harga IHSG melampui garis upper bollinger, dan keputusan buy saat harga IHSG melampui lower bollinger, namun keputusan ini tidak sepenuhnya mutlak untuk berhasil mendapatkan profit karena kondisi pasar saham yang tidak menentu.</p><pre>ANTM = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/ANTM.JK.csv&#39;)<br>ANTM[&#39;simple_retrate&#39;] = (ANTM[&#39;Adj Close&#39;]/ANTM[&#39;Adj Close&#39;].shift(1)) - 1<br>print(&#39;Simple Return Rate Saham ANTM.JK: \n&#39;, ANTM[&#39;simple_retrate&#39;], &#39;\n&#39;)</pre><pre>Output:<br>Simple Return Rate Saham ANTM.JK: <br> 0           NaN<br>1      0.007143<br>2      0.063830<br>3      0.113333<br>4      0.005988<br>         ...   <br>235    0.003817<br>236   -0.038023<br>237   -0.003953<br>238    0.027778<br>239   -0.027027<br>Name: simple_retrate, Length: 240, dtype: float64</pre><pre>ANTM[&#39;simple_retrate&#39;].plot(figsize=(20,5))<br>plt.title(&#39;Simple Return Rate Saham ANTM.JK&#39;, fontsize=15)<br>plt.savefig(&#39;Simple Return rate ANTM.png&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LodMsYCw_CVXOY5SVClyrw.png" /></figure><p>Berdasarkan Simple Return Rate Saham ANTM.JK, tren return rate tidak banyak mengalami perubahan. Tertinggi terjadi pada indeks ke-50 dengan pencapaian sebesar 25% simple return rate pada saham ANTM.JK.</p><pre>ANTM[&#39;Adj Close&#39;].plot(figsize=(20,5))<br>plt.title(&#39;Adj Close Saham Aneka Tambang Tbk.&#39;)<br>plt.savefig(&#39;Adj Close Saham ANTM.png&#39;)<br>plt.show()</pre><pre>BBCA = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/BBCA.JK.csv&#39;)<br>BBCA[&#39;Adj Close&#39;].plot(figsize=(20,5))<br>plt.title(&#39;Adj Close Saham Bank BCA&#39;)<br>plt.savefig(&#39;Adj Close Saham BCA.png&#39;)<br>plt.show()</pre><pre>BMRI = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/BMRI.JK.csv&#39;)<br>BMRI[&#39;Adj Close&#39;].plot(figsize=(20,5))<br>plt.title(&#39;Adj Close Saham Bank Mandiri&#39;)<br>plt.savefig(&#39;Adj Close Saham Bank Mandiri.png&#39;)<br>plt.show()</pre><pre>TLKM = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/TLKM.JK.csv&#39;)<br>TLKM[&#39;Adj Close&#39;].plot(figsize=(20,5))<br>plt.title(&#39;Adj Close Saham Telkom Indonesia (Persero) Tbk.&#39;)<br>plt.savefig(&#39;Adj Close Saham TLKM.png&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sdxGFoqiGUXEhJLdX4da7w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Hrsf8MiRF7YttgXGTOzX3Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RSjPd1G1B6tRZ8ffktOEsw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TEToAQmrjWcQf_y49irwng.png" /></figure><p>agar kita dapat melihat perbedaan adj close antar saham maka kita dapat menggabungkan fluktuasi adj close keempat saham menjadi satu visualisasi.</p><pre>ANTM = ANTM.rename(columns={&#39;Adj Close&#39;:&#39;Adj Close ANTM&#39;})<br>BBCA = BBCA.rename(columns={&#39;Adj Close&#39;:&#39;Adj Close BBCA&#39;})<br>BMRI = BMRI.rename(columns={&#39;Adj Close&#39;:&#39;Adj Close BMRI&#39;})<br>TLKM = TLKM.rename(columns={&#39;Adj Close&#39;:&#39;Adj Close TLKM&#39;})<br>mydata = [ANTM[&#39;Adj Close ANTM&#39;], BBCA[&#39;Adj Close BBCA&#39;], BMRI[&#39;Adj Close BMRI&#39;], TLKM[&#39;Adj Close TLKM&#39;]] <br>mydata = pd.DataFrame(mydata)<br>mydata = mydata.transpose()</pre><pre>(mydata / mydata.iloc[0]*100).plot(figsize=(20,5))<br>plt.title(&#39;Perbandingan Adj Close&#39;)<br>plt.savefig(&#39;Perbandingan Adj Close&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sgDFpu3YsX7bI3GpJXPwYA.png" /></figure><p>Perbandingan pada keempat saham perusahaan besar (ANTM.JK, BMRI.JK, BBRI.JK, TLKM.JK) terlihat harga saham pada ANTM.JK lebih unggul dari saham lainnya dengan perbedaan yang cukup besar. Tren harga saham pada BMRI.JK terbaik setelah ANTM.JK, kemudian disusul oleh saham TLKM.JK, dan BBRI.JK. perbedaan dari ketiga saham dibawah ANTM.JK tidak terlalu jauh.</p><pre># membandingkan fluktuasi &#39;ANTM.JK&#39; dengan indeks market<br>JKSE = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/^JKSE.csv&#39;)<br>LQ45 = pd.read_csv(&#39;C:/Users/abiza/Downloads/Project &amp; Publikasi/Analysis of Return Rate Stock/input/^JKLQ45.csv&#39;)<br>JKSE = JKSE.rename(columns={&#39;Adj Close&#39;:&#39;Adj Close JKSE&#39;})<br>LQ45 = LQ45.rename(columns={&#39;Adj Close&#39;:&#39;Adj Close LQ45&#39;})<br>mydata = [ANTM[&#39;Adj Close ANTM&#39;], TLKM[&#39;Adj Close TLKM&#39;], JKSE[&#39;Adj Close JKSE&#39;], LQ45[&#39;Adj Close LQ45&#39;]] <br>mydata = pd.DataFrame(mydata)<br>mydata = mydata.transpose()<br>print(mydata.head(), &#39;\n&#39;)<br>print(mydata.describe(), &#39;\n&#39;)</pre><pre>(mydata / mydata.iloc[0]*100).plot(figsize=(20,5))<br>plt.savefig(&#39;Perbandingan Saham dan Indeks&#39;)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Wv5pEDT3N3SKkN9UvnOogQ.png" /></figure><p>Saham ANTM.JK lebih tinggi dibandingkan indeks pada pasar saham (JKSE &amp; LQ45), namun TLKM.JK masih berada dikisaran indeks pada pasar saham. Hal ini membuktikan bahwa saham ANTM.JK berkontribusi besar pada pasar saham dan IHSG ataupun indeks sejenis.</p><pre># menghitung return rate tahunan<br>mydata = [BBCA[&#39;Adj Close BBCA&#39;], BMRI[&#39;Adj Close BMRI&#39;], ANTM[&#39;Adj Close ANTM&#39;]] <br>mydata = pd.DataFrame(mydata)<br>mydata = mydata.transpose()<br>    <br>saham_ret = np.log(mydata / mydata.shift(1))<br># corelasi antar saham<br>corr_matrix = saham_ret.corr()<br>print(corr_matrix)<br>sns.heatmap(corr_matrix, annot=True)<br>plt.savefig(&#39;Heatmap Saham&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*knSOZqqq36Mv2tZWdgC-Rg.png" /></figure><p>Heatmap (korelasi) pada saham ANTM.JK, BMRI.JK, BBCA.JK:</p><p>● Saham BMRI.JK dengan BBCA.JK berkorelasi sebesar 59%</p><p>● Saham BMRI.JK dengan ANTM.JK berkorelasi sebesar 33%</p><p>● Saham BBCA.JK dengan ANTM.JK berkorelasi sebesar 23%</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=208e40b32979" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Analisis Regresi Linear Berganda menggunakan Ordinary Least Square]]></title>
            <link>https://medium.com/@abizaregi21/analisis-regresi-linear-berganda-menggunakan-ordinary-least-square-79fb7ebfe576?source=rss-48afad7fdbcc------2</link>
            <guid isPermaLink="false">https://medium.com/p/79fb7ebfe576</guid>
            <category><![CDATA[ordinary-least-square]]></category>
            <category><![CDATA[harga-emas]]></category>
            <category><![CDATA[regression]]></category>
            <category><![CDATA[bitcoin]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Abizar Egi]]></dc:creator>
            <pubDate>Wed, 08 Sep 2021 17:35:06 GMT</pubDate>
            <atom:updated>2021-09-09T14:07:03.508Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/590/1*ReXN_54B6-SxJQjx0gW_AQ.jpeg" /><figcaption><strong>Gambar 1. Statistik</strong></figcaption></figure><p>Analisis regresi linear berganda digunakan untuk melihat hubungan antara variabel independent dan dependent secara signifikan. Salah satu metode sederhana yang sering digunakan pada analisis regresi linear berganda adalah Ordinary Least Square (OLS). Persamaan pada analisis regresi linear berganda sebagai berikut:</p><blockquote><strong>y = c + b1x1 + b2x2 + e</strong></blockquote><p>Analisis kali ini akan melihat hubungan antara Nilai tukar rupiah terhadap dolar amerika (IDR/USD), Indeks harga saham gabungan (IHSG), Dow jones industrial average (DJIA) serta pengaruhnya terhadap harga konversi bitcoin ke rupiah (y1) dan harga emas berjangka (y2). Analisis menggunakan bahasa pemrograman python, tahapan awal yaitu melakukan import package dan model yang akan digunakan:</p><pre>import numpy as np<br>import pandas as pd<br>import seaborn as sns<br>import statsmodels<br>import patsy<br>import statsmodels.api as sm<br>import matplotlib.pyplot as plt</pre><pre>!pip install hidrokit<br>from sklearn.linear_model import LinearRegression<br>from sklearn.model_selection import train_test_split<br>from sklearn import metrics</pre><p>Selanjutnya, upload data dan baca data yang akan dianalisis, berikut ini tampilan data yang digunakan pada analisis ini:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/830/1*wKDqd437Dv64hzyteavRUA.png" /><figcaption><strong>Gambar 2. Lima Baris Data Pertama</strong></figcaption></figure><pre>&lt;class &#39;pandas.core.frame.DataFrame&#39;&gt; <br>Index: 27 entries, Jan &#39;19 to Mar &#39;21 <br>Data columns (total 5 columns):<br>  <br>#   Column               Non-Null Count  Dtype   <br>--- ------               --------------  -----    <br>0   IDX Composite          27 non-null  float64  <br>1   USD/IDR                27 non-null   int64    <br>2   Dow Jones Industrial   27 non-null  float64  <br>3   BTC/IDR                27 non-null   int64    <br>4   Emas Berjangka         27 non-null  float64<br> <br>dtypes: float64(3), int64(2) memory usage: 1.3+ KB</pre><p>Suatu data perlu dilakukan exploratory data analysis, EDA dapat menggunakan visualisasi untuk menegtahui fluktuasi dan hubungan data yang digunakan dalam sebuah analisis. Berikut adalah visualisasi EDA dari data yang digunakan:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/724/1*gEznqVrTHwjVpdHSB2xfOA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/712/1*pPiAhzVRU2MfwgE_549B8w.png" /><figcaption><strong>Gambar 3. Exploratory Data Analysis</strong></figcaption></figure><p>Berdasarkan pada visualisasi EDA diatas: Y1 dengan variabel lain berkorelasi positif. Y2 dengan variabel y1, x3, dan x2 berkorelasi positif, sementara dengan x1 berkorelasi negatif.</p><p>IHSG mengalami penurunan tajam, Nilai tukar rupiah melemah hingga menyentuh 16.000 / US dollar, dan Indeks DJIA juga mengalami penurunan tajam dari periode sebelumnya. Namun disisi lain, bitcoin tidak terdampak dan memberikan sinyal positif, emas berjangka juga cenderung stagnan pada periode April 2020.</p><p><em>Keterangan:</em></p><p><em>X1 = IDX Composite / IHSG ;</em></p><p><em>X2 = USD/IDR (Nilai Tukar) ;</em></p><p><em>X3 = Dow Jones Industrial Average ;</em></p><p><em>Y1 = BTC/IDR (Harga Konversi Bitcoin) ;</em></p><p><em>Y2 = Emas Berjangka</em></p><p>Tahapan analisis berikutnya yaitu menentukan variabel independent dan variabel dependet. variabel dependent dalam analisis ini adalah harga konversi bitcoin ke rupiah dan harga emas berjangka, sementara variabel independent adalah:</p><pre>dataset = dataset.rename(<em>columns</em>={<em>&#39;IDX Composite&#39;</em> : <em>&#39;x1&#39;</em>, <em>&#39;USD/IDR&#39;</em> : <em>&#39;x2&#39;</em>, <em>&#39;Dow Jones Industrial&#39;</em>: <em>&#39;x3&#39;</em>, <em>&#39;BTC/IDR&#39;</em> : <em>&#39;y1&#39;</em>, <em>&#39;Emas Berjangka&#39;</em> : <em>&#39;y2&#39;</em>})<br>independent=[<em>&#39;x1&#39;</em>, <em>&#39;x2&#39;</em>, <em>&#39;x3&#39;</em>]</pre><pre>x = dataset[independent]<br>y1 = dataset[<em>&#39;y1&#39;</em>]<br>print(<em>&quot;Variabel Independent:\n&quot;</em>, x.head())<br>print(<em>&quot;\nVariabel Dependent:\n&quot;</em>, y1.head())</pre><p>menentukan data train dan data test:</p><pre>x_train, x_test, y_train, y_test = train_test_split(x, y1, <em>random_state</em>=1)</pre><pre>Linreg = LinearRegression()<br>Linreg.fit(x_train,y_train)<br>y_pred = Linreg.predict(x_test)<br>print(<em>&#39;Prediksi pada Data Pengujian:\n&#39;</em>, y_pred)<br>print(<em>&#39;RMSE:\n&#39;</em>, np.sqrt(metrics.mean_squared_error(y_test,y_pred)))</pre><pre>x = sm.add_constant(x)<br>model=sm.OLS(y1,x).fit()<br>model.summary()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/865/1*ML12yg4X3512CiJEt9uh-w.png" /><figcaption><strong>Gambar 4. Output OLS Regression Results (Y1)</strong></figcaption></figure><p>Interpretasi Output:</p><blockquote><strong><em>y1 = -4.895 + 2.899 x1 + 1.925 x2 + 7.988 x3 + e</em></strong></blockquote><p>1. apabila nilai x1, x2, dan x3 adalah 0 maka nilai y1 sebesar <strong>-4.895</strong></p><p>2. setiap perubahan kenaikan x2 atau Indeks Harga Saham Gabungan (IDX Composite) sebesar satu satuan maka akan menyebabkan perubahan kenaikan y1 atau harga konversi bitcoin ke rupiah sebesar <strong>1.925</strong></p><p>3. setiap perubahan kenaikan x3 atau Dow Jones Industrial Average sebesar satu satuan maka akan menyebabkan perubahan kenaikan y1 atau harga konversi bitcoin ke rupiah sebesar <strong>7.988</strong></p><p>4. nilai R-Squared sebesar <strong>0.779</strong> menjelaskan variabel independent dalam pengujian mempengaruhi variabel dependent sebesar <strong>78%</strong>, sementara <strong>22%</strong> lainnya dipengaruhi oleh variabel lain diluar model pengujian.</p><p>Kemudian, melakukan tahapan yang sama pada variabel y2 atau harga emas berjangka.</p><pre>independent=[<em>&#39;x1&#39;</em>, <em>&#39;x2&#39;</em>, <em>&#39;x3&#39;</em>]<br>x = dataset[independent]<br>y2 = dataset[<em>&#39;y2&#39;</em>]</pre><pre>x_train, x_test, y_train, y_test = train_test_split(x, y2, <em>random_state</em>=1)</pre><pre>Linreg = LinearRegression()<br>Linreg.fit(x_train,y_train)<br>y_pred = Linreg.predict(x_test)<br>print(<em>&#39;Prediksi pada Data Pengujian:\n&#39;</em>, y_pred)<br>print(<em>&#39;RMSE:\n&#39;</em>, np.sqrt(metrics.mean_squared_error(y_test,y_pred)))</pre><pre>x = sm.add_constant(x)<br>model=sm.OLS(y2,x).fit()<br>model.summary()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/842/1*o-1qOw94BaunD-oUOa19WQ.png" /><figcaption><strong>Gambar 5. Output OLS Regression Results (Y2)</strong></figcaption></figure><p>Interpretasi Output:</p><blockquote><strong><em>y1 = 3324.0061–0.3170 x1–0.0838 x2 + 0.0498 x3 + e</em></strong></blockquote><p>1. apabila nilai x1, x2, dan x3 adalah 0 maka nilai y2 sebesar <strong>3324.0061</strong></p><p>2. setiap perubahan kenaikan x1 atau nilai tukar rupiah ke dolar (IDR/USD) sebesar satu satuan maka akan menyebabkan perubahan penurunan y2 atau harga emas sebesar <strong>0.3170</strong></p><p>3. setiap perubahan kenaikan x3 atau Dow Jones Industrial Average sebesar satu satuan maka akan menyebabkan perubahan kenaikan y2 atau harga emas sebesar <strong>0.0498</strong></p><p>4. nilai R-Squared sebesar <strong>0.802</strong> menjelaskan variabel independent dalam pengujian mempengaruhi variabel dependent sebesar <strong>80%</strong>, sementara <strong>20%</strong> lainnya dipengaruhi oleh variabel lain diluar model pengujian.</p><p>Untuk melihat korelasi pada variabel independent dapat menggunakan visualisasi heatmap, tujuan melihat korelasi pada variabel independent adalah untuk mendeteksi adanya multikolonieritas. Apabila antar variabel independent memiliki korelasi diatas 0.8 maka dapat dikatakan bahwa terdapat multikolonieritas pada data pengujian.</p><pre>ind = dataset.drop(<em>columns</em>=[<em>&#39;y1&#39;</em>, <em>&#39;y2&#39;</em>])<br>cor = ind.corr(<em>method</em>=<em>&#39;pearson&#39;</em>)<br>print(<em>&#39;\nCorelasi Pearson:\n&#39;</em>, cor)<br>sns.heatmap(cor, <em>annot</em>=True)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/358/1*ekCZfJ1Z-NQUy94rgD6dPg.png" /><figcaption><strong>Gambar 6. Heatmap</strong></figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=79fb7ebfe576" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>