Python Pandas - Get the current date and time from Timestamp object
Python Pandas - Get the current date and time from Timestamp object
Chapter -1
how to get current date and time using python and pandas
```
date | value | |
---|---|---|
0 | 05/05/2022 05:46:09 | 2 |
1 | 05/05/2022 05:46:09 | 3 |
2 | 05/05/2022 05:46:09 | 4 |
3 | 05/05/2022 05:46:09 | 5 |
```
Get the current date and time from Timestamp object, use the timestamp.today() method.
At first, import the required libraries −
import pandas as pd import datetime
Create the timestamp in Pandas
timestamp = pd.Timestamp(datetime.datetime(2021, 10, 10))
Display the Timestamp
print("Timestamp: ", timestamp)
Getting the current date and time
res = timestamp.today()
Example
Following is the code
import pandas as pd import datetime # set the timestamp in Pandas timestamp = pd.Timestamp(datetime.datetime(2021, 10, 10)) # display the Timestamp print("Timestamp: ", timestamp) # display the day from given timestamp print("Day Name:", timestamp.day_name()) # getting the current date and time res = timestamp.today() # display the current date and time print("\nToday's Date and time...\n", res) # display the current day print("Today's Day:", res.day_name())
Output
This will produce the following code
Timestamp: 2021-10-10 00:00:00 Day Name: Sunday Today's Date and time... 2021-10-03 13:22:28.891506 Today's Day: Sunday
Chapter -2
Working with datetime in Pandas DataFrame
Some Pandas tricks to help you get started with data analysis
Datetime is a common data type in data science projects. Often, you’ll work with it and run into problems. I found Pandas is an amazing library that contains extensive capabilities and features for working with date and time.
In this article, we will cover the following common datetime problems and should help you get started with data analysis.
- Convert strings to datetime
- Assemble a datetime from multiple columns
- Get year, month and day
- Get the week of year, the day of week, and leap year
- Get the age from the date of birth
- Improve performance by setting date column as the index
- Select data with a specific year and perform aggregation
- Select data with a specific month and a specific day of the month
- Select data between two dates
- Handle missing values
Please check out my Github repo for the source code.
1. Convert strings to datetime
Pandas has a built-in function called to_datetime()
that can be used to convert strings to datetime. Let’s take a look at some examples
With default arguments
Pandas to_datetime()
is able to parse any valid date string to datetime without any additional arguments. For example:
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
'value': [2, 3, 4]})df['date'] = pd.to_datetime(df['date'])
df
Day first format
By default, to_datetime()
will parse string with month first (MM/DD, MM DD, or MM-DD) format, and this arrangement is relatively unique in the United State.
In most of the rest of the world, the day is written first (DD/MM, DD MM, or DD-MM). If you would like Pandas to consider day first instead of month, you can set the argument dayfirst
to True
.
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
'value': [2, 3, 4]})df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df
Alternatively, you pass a custom format to the argument format
.
Custome format
By default, strings are parsed using the Pandas built-in parser from dateutil.parser.parse
. Sometimes, your strings might be in a custom format, for example, YYYY-DD-MM HH:MM:SS. Pandas to_datetime()
has an argument called format
that allows you to pass a custom format:
df = pd.DataFrame({'date': ['2016-6-10 20:30:0',
'2016-7-1 19:45:30',
'2013-10-12 4:5:1'],
'value': [2, 3, 4]})df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%S")
df
Speed up parsing with infer_datetime_format
Passing infer_datetime_format=True
can often speed up a parsing if its not an ISO8601 format exactly but
in a regular format. According to [1], in some cases, this can increase
the parsing speed by 5–10x.
# Make up 3000 rows
df = pd.DataFrame({'date': ['3/11/2000', '3/12/2000', '3/13/2000'] * 1000 })%timeit pd.to_datetime(df['date'], infer_datetime_format=True)
100 loops, best of 3: 10.4 ms per loop%timeit pd.to_datetime(df['date'], infer_datetime_format=False)
1 loop, best of 3: 471 ms per loop
Handle parsing error
You will end up with a TypeError if the date string does not meet the timestamp format.
df = pd.DataFrame({'date': ['3/10/2000', 'a/11/2000', '3/12/2000'],
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])
to_datetime()
has an argument called errors
that allows you to ignore the error or force an invalid value to NaT
.
df['date'] = pd.to_datetime(df['date'], errors='ignore')
df
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df
In addition, if you would like to parse date columns when reading data from a CSV file, please check out the following article
2. Assemble a datetime from multiple columns
to_datetime()
can be used to assemble a datetime from multiple columns as well. The keys (columns label) can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same.
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})df['date'] = pd.to_datetime(df)
df
3. Get year, month, and day
dt.year
, dt.month
and dt.day
are the inbuilt attributes to get year, month , and day from Pandas datetime object.
First, let’s create a dummy DateFrame and parse DoB to datetime.
df = pd.DataFrame({'name': ['Tom', 'Andy', 'Lucas'],
'DoB': ['08-05-1997', '04-28-1996', '12-16-1995']})
df['DoB'] = pd.to_datetime(df['DoB'])
And to get year, month, and day
df['year']= df['DoB'].dt.year
df['month']= df['DoB'].dt.month
df['day']= df['DoB'].dt.day
df
4. Get the week of year, the day of week and leap year
Similarly, dt.week
, dt.dayofweek
, and dt.is_leap_year
are the inbuilt attributes to get the week of year, the day of week, and leap year.
df['week_of_year'] = df['DoB'].dt.week
df['day_of_week'] = df['DoB'].dt.dayofweek
df['is_leap_year'] = df['DoB'].dt.is_leap_year
df
Note that Pandas dt.dayofweek
attribute returns the day of the week and it is assumed the week starts
on Monday, which is denoted by 0 and ends on Sunday which is denoted by
6. To replace the number with full name, we can create a mapping and
pass it to map()
:
dw_mapping={
0: 'Monday',
1: 'Tuesday',
2: 'Wednesday',
3: 'Thursday',
4: 'Friday',
5: 'Saturday',
6: 'Sunday'
}
df['day_of_week_name']=df['DoB'].dt.weekday.map(dw_mapping)
df
5. Get the age from the date of birth
The simplest solution to get age is by subtracting year:
today = pd.to_datetime('today')
df['age'] = today.year - df['DoB'].dt.yeardf
However, this is not accurate as people might haven't had their birthday this year. A more accurate solution would be to consider the birthday
# Year difference
today = pd.to_datetime('today')
diff_y = today.year - df['DoB'].dt.year
# Haven't had birthday
b_md = df['DoB'].apply(lambda x: (x.month,x.day) )
no_birthday = b_md > (today.month,today.day)df['age'] = diff_y - no_birthday
df
6. Improve performance by setting date column as the index
A common solution to select data by date is using a boolean maks. For example
condition = (df['date'] > start_date) & (df['date'] <= end_date)
df.loc[condition]
This solution normally requires start_date
, end_date
and date
column to be datetime format. And in fact, this solution is slow when
you are doing a lot of selections by date in a large dataset.
If
you are going to do a lot of selections by date, it would be faster to
set date column as the index first so you take advantage of the Pandas
built-in optimization. Then, you can select data by date using df.loc[start_date:end_date]
. Let take a look at an example dataset city_sales.csv, which has 1,795,144 rows data
df = pd.read_csv('data/city_sales.csv',parse_dates=['date'])
df.info()RangeIndex: 1795144 entries, 0 to 1795143
Data columns (total 3 columns):
# Column Dtype
--- ------ -----
0 date datetime64[ns]
1 num int64
2 city object
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 41.1+ MB
To set the date column as the index
df = df.set_index(['date'])
df
7. Select data with a specific year and perform aggregation
Let’s say we would like to select all data in the year 2018
df.loc['2018']
And to perform aggregation on the selection for example:
Get the total num in 2018
df.loc['2018','num'].sum()1231190
Get the total num for each city in 2018
df['2018'].groupby('city').sum()
8. Select data with a specific month and a specific day of the month
To select data with a specific month, for example, May 2018
df.loc['2018-5']
Similarly, to select data with a specific day of the month, for example, 1st May 2018
df.loc['2018-5-1']
9 Select data between two dates
To select data between two dates, you can usedf.loc[start_date:end_date]
For example:
Select data between 2016 and 2018
df.loc['2016' : '2018']
Select data between 10 and 11 o'clock on the 2nd May 2018
df.loc['2018-5-2 10' : '2018-5-2 11' ]
Select data between 10:30 and 10:45 on the 2nd May 2018
df.loc['2018-5-2 10:30' : '2018-5-2 10:45' ]
And to select data between time, we should use between_time()
, for example, 10:30 and 10:45
df.between_time('10:30','10:45')
10 Handle missing values
We often need to compute window statistics such as a rolling mean or a rolling sum.
Let’s compute the rolling sum over a 3 window period and then have a look at the top 5 rows.
df['rolling_sum'] = df.rolling(3).sum()
df.head()
We can see that it only starts having valid values when there are 3 periods over which to look back. One solution to handle this is by backfilling of data.
df['rolling_sum_backfilled'] = df['rolling_sum'].fillna(method='backfill')
df.head()
For more details about backfilling, please check out the following article
reference - https://www.tutorialspoint.com/python-pandas-get-the-current-date-and-time-from-timestamp-object
https://towardsdatascience.com/working-with-datetime-in-pandas-dataframe-663f7af6c587
Comments
Post a Comment