Predict Ethereum Blockchain Price using ARIMA

Dhiraj K
6 min readJan 24, 2023


Ethereum Blockchain Price Prediction using ARIMA


Have you ever considered trying to forecast the price of Ethereum in the future? Can machine learning be used to forecast the price of Ethereum? Do we have the data necessary to train the machine-learning model? Fortunately, all three of these questions have a positive response.

Now is a wonderful moment to start learning more about cryptocurrencies if you have yet to become familiar with them. They are promoted as the future of numerous activities and processes, not just financial ones, that run our daily life.

In this article, we’ll use machine learning and on-chain data to forecast the price of Ethereum in the future. However, let’s first go over a few key ideas.

What is a blockchain?

A blockchain is a computer network that keeps a shared and up-to-date transactional database. A block is a new set of transactions introduced into the system.

What is Ethereum?

The popular blockchain platform Ethereum is open-source, decentralized, programmable, safe, and scalable. It is the blockchain of choice for the majority of developers and companies.
The native coin of Ethereum is called ether. Additionally, it is written as ETH. Because ether is entirely digital, it can be sent instantly to anyone, anywhere in the world.

What is on-chain data?

A lot of data is produced by the Ethereum blockchain because of its inherent transparency and decentralized structure. It is also known as “on-chain data,” and anyone can easily access it. It functions as a massive data repository for complex prediction algorithms that are capable of accurately identifying systemic trends and forecasting future behavior.
On-chain metrics, such as the size of the blockchain, the number of connected blocks, or the complexity of mining blocks, are discrete data points based on information provided by the blockchain network.

Because on-chain data is usually maintained as a time series, each measure offers details about the previous actions of a blockchain. Everyone involved in law enforcement will consequently find a use for this information, whether they are investigating crimes or working in finance to make better decisions and assess the profitability of a proposed venture.
We’ll use Python and machine learning to forecast the price of Ethereum in the following section.

On-chain analysis: what is it?
An emerging subject in the study of data analysis is on-chain analysis. It focuses on blockchain data’s transactional activities. A blockchain like Ethereum can have its future price values predicted via on-chain analytics.

Predict the price of Ethereum using on-chain analysis.
This post aims to introduce Ethereum price prediction using on-chain data in a straightforward manner. If you’re interested in time series analysis, finance, web3, or blockchain in general, then predicting the price of Ethereum can be a rewarding subject to work on.
Let’s begin with gathering information.

Data Gathering

To retrieve Ethereum pricing data or on-chain data, we’ll utilize the yfinance library, which is a Yahoo Finance market data downloader. Every time you run this notebook, the date for today will be changed because we’ll also be using the today function from the Date Time package.

In the code below, we specify the start date as January 2020 and the end date as Jan 2023 for data gathering.

starting_date = ‘2020–01–02’
‘today_date =‘%Y-%m-%d’)
df =‘ETH-USD’,starting_date, today_date)


Data Exploration

One of the most important benefits of visualization is the capacity to view enormous amounts of data in simple images. Let’s examine the information to see how the price of Ethereum has changed over the past three years.
We must use the matplotlib library to plot the pricing. A great Python visualization library for 2D array charts is Matplotlib. The SciPy stack is compatible with Matplotlib, a multi-platform data visualization package built on NumPy arrays. There are many different types of plots in Matplotlib, including line, bar, scatter, histogram, and more.

#Plot closing price of Ethereum
plt.ylabel(‘Closure Price’)
plt.title(‘Ethereum Price in the Last 3 Years)

Ethereum price in Last 3 years

Data Preparation

For forecasting and economic analysis, logarithms make use of numerous factors. In time series analysis, the log transformation is widely used to reduce the variance of a series.

Log transformation:
Each variable in a data transformation known as a log transformation is substituted for by a log (x). The logarithm basis is typically left up to the analyst, depending on the goals of the statistical modeling. This article will concentrate on the natural log transformation.

dfclose = df[‘Close’]

dflog = np.log(dfclose)

# Separate the test and training sets of data.
For the purpose of assessing machine learning models, it is crucial to divide the provided data into training and testing sets. As a result, the given data set is split into training and testing sets, with the vast majority of the data being used for training and a very small portion being utilized for testing.
We will show how to use Python to split a dataset into Train and Test sets in the code below. By default, the Training set includes 70% actual data, while the Test set includes 30% actual data.

training_data, testing_data = dflog[3:int(len(dflog)*0.9)], dflog[int(len(dflog)*0.9):]



plt.ylabel(‘Closing Prices’)

plt.plot(dflog, ‘green’, label=’Train data’)

plt.plot(testing_data, ‘blue’, label=’Test data’)


Model Creation

We will talk about the development of the machine learning model in this section. The model we’ll employ is known as ARIMA.
The autoregressive integrated moving average is referred to as ARIMA. This model is a statistics-based model that analyses and forecasts future trends using time series data.
Based on previous time series values, ARIMA models forecast future time series values. For instance, using the stock’s previous prices, an ARIMA model can predict future stock price values.

model = ARIMA(training_data, order=(1, 1, 1))

Lagged moving averages are used in ARIMA to smooth time series data. In technical analysis, they are widely used to predict future security prices. Using past data, autoregressive integrated moving average (ARIMA) models predict future values.
The implicit premise of autoregressive models is that the future will resemble the past. As a result, they might turn out to be incorrect in specific market circumstances, including financial crises or times of fast technological change.

Model Training

We will talk about the machine learning model’s training in this part. Machine learning models can be trained in many different ways, including quickly analyzing enormous volumes of data, finding patterns, and seeing abnormalities that would be challenging for humans to notice.
As was previously explained, the training data part produced during the train-test split is being used to fit the model. To train the model, we apply the fit method to the data.

fitted_model =


Model Evaluation

We’ll talk about the machine learning model’s evaluation in this part. Early in the development phase, model evaluation is necessary to gauge how well a machine learning model is performing. Various assessment measures are used to evaluate the machine learning model.

# Evaluate performance

rmse = math.sqrt(mean_squared_error(testing_data, fcast))

print(‘RMSE: ‘+str(rmse))

We can use the root mean square error metric to assess how well the ARIMA model fits the data. A model that fits the data more accurately has lower values of RMSE.

We will talk about prediction using a trained machine-learning model in this part. When a trained machine learning model is applied to the input data for forecasting, prediction is the information output that is produced. We are utilizing the forecast method on the model to make predictions.

# Prediction

fcast, se, conf = fitted.forecast(136, alpha=0.06)

fc_series = pd.Series(fcast, index=testing_data.index)

upper_series = pd.Series(conf[:, 1], index=testing_data.index)

lower_series = pd.Series(conf[:, 0], index=testing_data.index)

plt.figure(figsize=(12,8), dpi=150)

plt.fill_between(lower_series.index, lower_series, upper_series,color=’k’, alpha=.09)

plt.title(‘Prediction of Blockchain Ethereum Price’)


This post aims to introduce Ethereum price prediction using on-chain data briefly. If you are interested in learning more, you are welcome to continue reading and convert this into a portfolio project.
You may compare the performance of this model with alternatives like the LSTM model or deep learning to anticipate the price of Ethereum.
It is important to keep in mind, though, that the inner workings of the financial markets will only sometimes be fully understood because they are mostly controlled by speculation and human mistake. However, it is possible to predict some short-term prices of cryptocurrencies like ether.



Dhiraj K

Data Scientist & Machine Learning Evangelist. I like to mess with data.